Hi everyone.
I am following this guide to run rgcn node classification without any changes on ogbn-mag
in distributed mode. I am using the latest DGL with PyTorch 2.3.1
. I got the following error:
.....
start training...
[rank0]: Traceback (most recent call last):
[rank0]: File "~/workspace/dgl/examples/distributed/rgcn/node_classification.py", line 925, in <module>
[rank0]: main(args)
[rank0]: File "~/workspace/dgl/examples/distributed/rgcn/node_classification.py", line 790, in main
[rank0]: run(
[rank0]: File "~/workspace/dgl/examples/distributed/rgcn/node_classification.py", line 624, in run
[rank0]: assert len(logits) == 1
[rank0]: AssertionError
[rank1]: Traceback (most recent call last):
[rank1]: File "~/workspace/dgl/examples/distributed/rgcn/node_classification.py", line 925, in <module>
[rank1]: main(args)
[rank1]: File "~/workspace/dgl/examples/distributed/rgcn/node_classification.py", line 790, in main
[rank1]: run(
[rank1]: File "~/workspace/dgl/examples/distributed/rgcn/node_classification.py", line 624, in run
[rank1]: assert len(logits) == 1
[rank1]: AssertionError
.....
Error location is this:
# forward
logits = model(blocks, feats)
assert len(logits) == 1 <----- This is where AssertionError occurs
logits = logits["paper"]
Has anyone met the same issue?