DistDGL on rgcn AssertionError

pubu · July 14, 2024, 12:18pm

Hi everyone.

I am following this guide to run rgcn node classification without any changes on ogbn-mag in distributed mode. I am using the latest DGL with PyTorch 2.3.1. I got the following error:

.....
start training...
[rank0]: Traceback (most recent call last):
[rank0]:   File "~/workspace/dgl/examples/distributed/rgcn/node_classification.py", line 925, in <module>
[rank0]:     main(args)
[rank0]:   File "~/workspace/dgl/examples/distributed/rgcn/node_classification.py", line 790, in main
[rank0]:     run(
[rank0]:   File "~/workspace/dgl/examples/distributed/rgcn/node_classification.py", line 624, in run
[rank0]:     assert len(logits) == 1
[rank0]: AssertionError
[rank1]: Traceback (most recent call last):
[rank1]:   File "~/workspace/dgl/examples/distributed/rgcn/node_classification.py", line 925, in <module>
[rank1]:     main(args)
[rank1]:   File "~/workspace/dgl/examples/distributed/rgcn/node_classification.py", line 790, in main
[rank1]:     run(
[rank1]:   File "~/workspace/dgl/examples/distributed/rgcn/node_classification.py", line 624, in run
[rank1]:     assert len(logits) == 1
[rank1]: AssertionError
.....

Error location is this:

# forward
logits = model(blocks, feats)
assert len(logits) == 1   <----- This is where AssertionError occurs
logits = logits["paper"]

Has anyone met the same issue?

BarclayII · July 17, 2024, 11:47am

For distributed GNNs, would you mind trying GraphStorm? GitHub - awslabs/graphstorm: Enterprise graph machine learning framework for billion-scale graphs for ML scientists and data scientists.

system · August 16, 2024, 11:47am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.