Unable to run RGCN examples in DGL 0.4.3

I just installed the latest DGL (dgl-cu101 v0.4.3.post2) on my computer, and tried to run the examples related to RGCN here. However, I got the following error when running entity classification:

Traceback (most recent call last):                                                                                                     
  File "entity_classify.py", line 19, in <module>
    from dgl.data.rdf import AIFBDataset, MUTAGDataset, BGSDataset, AMDataset
ImportError: cannot import name 'AIFBDataset' from 'dgl.data.rdf' (/home/vamship/.conda/envs/dgl_sample/lib/python3.7/site-packages/dgl/data/rdf.py)    

This is because of a difference in how the dataset classes are defined in dgl/data/rdf.py, and how they are referenced in entity_classify.py. For e.g., AIFB vs AIFBDataset.

Also, I got the following error when running the link prediction example:

Traceback (most recent call last): 
  File "link_predict.py", line 22, in <module>
    from dgl.data.knowledge_graph import load_data
ModuleNotFoundError: No module named 'dgl.data.knowledge_graph'

Again, there is no knowledge_graph.py in dgl/data/.

I am guessing that the example scripts have not been updated for the DGL version. Any help in resolving this issue will be great. Thanks!

Hi,

The master branch is subject to change and we plan to release new version next week. For now, you can use the old example at 0.4.x branch https://github.com/dmlc/dgl/tree/0.4.x/examples/pytorch/rgcn, with the 0.4.3 dgl

Thanks, @VoVAllen! I was able to get the entity classification script to work, but the prediction accuracy was much lesser than the numbers reported in the README for two datasets.

AIFB: <90%
MUTAG: <70%

Another issue is that the link prediction still does not work. I got the following error:

Traceback (most recent call last):
  File "link_predict.py", line 258, in <module>
    main(args)
  File "link_predict.py", line 117, in main
    num_nodes, num_rels, train_data)
  File "/home/vamship/dgl/examples/pytorch/rgcn/utils.py", line 153, in build_test_graph
    return build_graph_from_triplets(num_nodes, num_rels, (src, rel, dst))
  File "/home/vamship/dgl/examples/pytorch/rgcn/utils.py", line 139, in build_graph_from_triplets
    g.add_nodes(num_nodes)
  File "/home/vamship/.conda/envs/dgl_sample/lib/python3.7/site-packages/dgl/heterograph.py", line 313, in add_nodes
    raise DGLError('Mutation is not supported in heterograph.')
dgl._ffi.base.DGLError: Mutation is not supported in heterograph.

I was able to make the link_predict.py work.
At first, I just modified link_predict.py to match the old version and I had the same error as yours.

So I reset the changes and checked out the old git version
git reset --hard
git rebase master
git checkout 6c7c403

After that, run link_predict.py but omit the last command line argument (raw/filtered) like this.

python3 link_predict.py -d FB15k-237 --gpu 0

You should see this

Thanks, @lerachel9900! The link prediction example is working now, but got an out of memory error during evaluation:

Epoch 0500 | Loss 0.0971 | Best MRR 0.0000 | Forward 1.9687s | Backward 5.1627s                                                        │
start eval                                                                                                                             │
Traceback (most recent call last):                                                                                                     │
  File "link_predict.py", line 258, in <module>                                                                                        │
    main(args)                                                                                                                         │
  File "link_predict.py", line 188, in main                                                                                            │
    embed = model(test_graph, test_node_id, test_rel, test_norm)                                                                       │
  File "/home/vamship/.conda/envs/dgl_sample/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl          │
    result = self.forward(*input, **kwargs)                                                                                            │
  File "link_predict.py", line 67, in forward                                                                                          │
    return self.rgcn.forward(g, h, r, norm)                                                                                            │
  File "/home/vamship/dgl/examples/pytorch/rgcn/model.py", line 47, in forward                                                         │
    h = layer(g, h, r, norm)                                                                                                           │
  File "/home/vamship/.conda/envs/dgl_sample/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl          │
    result = self.forward(*input, **kwargs)                                                                                            │
  File "/home/vamship/.conda/envs/dgl_sample/lib/python3.7/site-packages/dgl/nn/pytorch/conv/relgraphconv.py", line 185, in forward    │
    g.update_all(self.message_func, fn.sum(msg='msg', out='h'))                                                                        │
  File "/home/vamship/.conda/envs/dgl_sample/lib/python3.7/site-packages/dgl/graph.py", line 3238, in update_all                       │
    Runtime.run(prog)                                                                                                                  │
  File "/home/vamship/.conda/envs/dgl_sample/lib/python3.7/site-packages/dgl/runtime/runtime.py", line 11, in run                      │
    exe.run()                                                                                                                          │
  File "/home/vamship/.conda/envs/dgl_sample/lib/python3.7/site-packages/dgl/runtime/ir/executor.py", line 204, in run                 │
    udf_ret = fn_data(src_data, edge_data, dst_data)                                                                                   │
  File "/home/vamship/.conda/envs/dgl_sample/lib/python3.7/site-packages/dgl/runtime/scheduler.py", line 972, in _mfunc_wrapper        │
    return mfunc(ebatch)                                                                                                               │
  File "/home/vamship/.conda/envs/dgl_sample/lib/python3.7/site-packages/dgl/nn/pytorch/conv/relgraphconv.py", line 147, in bdd_message│
_func                                                                                                                                  │
    node = edges.src['h'].view(-1, 1, self.submat_in)                                                                                  │
  File "/home/vamship/.conda/envs/dgl_sample/lib/python3.7/site-packages/dgl/utils.py", line 285, in __getitem__                       │
    return self._fn(key)                                                                                                               │
  File "/home/vamship/.conda/envs/dgl_sample/lib/python3.7/site-packages/dgl/frame.py", line 655, in <lambda>                          │
    return utils.LazyDict(lambda key: self._frame[key][rows], keys=self.keys())                                                        │
  File "/home/vamship/.conda/envs/dgl_sample/lib/python3.7/site-packages/dgl/frame.py", line 97, in __getitem__                        │
    return F.gather_row(self.data, user_idx)                                                                                           │
  File "/home/vamship/.conda/envs/dgl_sample/lib/python3.7/site-packages/dgl/backend/pytorch/tensor.py", line 156, in gather_row       │
    return th.index_select(data, 0, row_index)                                                                                         │
RuntimeError: [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 1088460000 byte│
s. Error code 12 (Cannot allocate memory)

I just finished running it with a GPU. If you want here is the output.

Thanks for sharing the output! Could you share your GPU configuration?
Also, did you need to change anything in the script to run the evaluation bit?

CPU: Intel Core i9-9920X CPU @ 3.50GHz
Socket: 1
Core per socket: 12
Thread per core: 2
Total max threads: 24
RAM: 128GB
GPU

  • 1 x NVIDIA RTX 2080 ti
  • 1 x NVIDIA QUADRO RTX 8000

No I just run the script like mentioned earlier (omitting --filtered/–raw)

Okay, thanks! That was helpful. Seems like RAM is a big factor for evaluation.