Unable to run RGCN examples in DGL 0.4.3

vamships · August 11, 2020, 4:21pm

I just installed the latest DGL (dgl-cu101 v0.4.3.post2) on my computer, and tried to run the examples related to RGCN here. However, I got the following error when running entity classification:

Traceback (most recent call last):                                                                                                     
  File "entity_classify.py", line 19, in <module>
    from dgl.data.rdf import AIFBDataset, MUTAGDataset, BGSDataset, AMDataset
ImportError: cannot import name 'AIFBDataset' from 'dgl.data.rdf' (/home/vamship/.conda/envs/dgl_sample/lib/python3.7/site-packages/dgl/data/rdf.py)

This is because of a difference in how the dataset classes are defined in dgl/data/rdf.py, and how they are referenced in entity_classify.py. For e.g., AIFB vs AIFBDataset.

Also, I got the following error when running the link prediction example:

Traceback (most recent call last): 
  File "link_predict.py", line 22, in <module>
    from dgl.data.knowledge_graph import load_data
ModuleNotFoundError: No module named 'dgl.data.knowledge_graph'

Again, there is no knowledge_graph.py in dgl/data/.

I am guessing that the example scripts have not been updated for the DGL version. Any help in resolving this issue will be great. Thanks!

VoVAllen · August 12, 2020, 5:25am

Hi,

The master branch is subject to change and we plan to release new version next week. For now, you can use the old example at 0.4.x branch https://github.com/dmlc/dgl/tree/0.4.x/examples/pytorch/rgcn, with the 0.4.3 dgl

vamships · August 13, 2020, 3:14pm

Thanks, @VoVAllen! I was able to get the entity classification script to work, but the prediction accuracy was much lesser than the numbers reported in the README for two datasets.

AIFB: <90%
MUTAG: <70%

Another issue is that the link prediction still does not work. I got the following error:

Traceback (most recent call last):
  File "link_predict.py", line 258, in <module>
    main(args)
  File "link_predict.py", line 117, in main
    num_nodes, num_rels, train_data)
  File "/home/vamship/dgl/examples/pytorch/rgcn/utils.py", line 153, in build_test_graph
    return build_graph_from_triplets(num_nodes, num_rels, (src, rel, dst))
  File "/home/vamship/dgl/examples/pytorch/rgcn/utils.py", line 139, in build_graph_from_triplets
    g.add_nodes(num_nodes)
  File "/home/vamship/.conda/envs/dgl_sample/lib/python3.7/site-packages/dgl/heterograph.py", line 313, in add_nodes
    raise DGLError('Mutation is not supported in heterograph.')
dgl._ffi.base.DGLError: Mutation is not supported in heterograph.

lerachel9900 · August 13, 2020, 4:22pm

I was able to make the link_predict.py work.
At first, I just modified link_predict.py to match the old version and I had the same error as yours.

So I reset the changes and checked out the old git version
git reset --hard
git rebase master
git checkout 6c7c403

After that, run link_predict.py but omit the last command line argument (raw/filtered) like this.

python3 link_predict.py -d FB15k-237 --gpu 0

You should see this

vamships · August 13, 2020, 6:55pm

Thanks, @lerachel9900! The link prediction example is working now, but got an out of memory error during evaluation:

Epoch 0500 | Loss 0.0971 | Best MRR 0.0000 | Forward 1.9687s | Backward 5.1627s                                                        │
start eval                                                                                                                             │
Traceback (most recent call last):                                                                                                     │
  File "link_predict.py", line 258, in <module>                                                                                        │
    main(args)                                                                                                                         │
  File "link_predict.py", line 188, in main                                                                                            │
    embed = model(test_graph, test_node_id, test_rel, test_norm)                                                                       │
  File "/home/vamship/.conda/envs/dgl_sample/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl          │
    result = self.forward(*input, **kwargs)                                                                                            │
  File "link_predict.py", line 67, in forward                                                                                          │
    return self.rgcn.forward(g, h, r, norm)                                                                                            │
  File "/home/vamship/dgl/examples/pytorch/rgcn/model.py", line 47, in forward                                                         │
    h = layer(g, h, r, norm)                                                                                                           │
  File "/home/vamship/.conda/envs/dgl_sample/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl          │
    result = self.forward(*input, **kwargs)                                                                                            │
  File "/home/vamship/.conda/envs/dgl_sample/lib/python3.7/site-packages/dgl/nn/pytorch/conv/relgraphconv.py", line 185, in forward    │
    g.update_all(self.message_func, fn.sum(msg='msg', out='h'))                                                                        │
  File "/home/vamship/.conda/envs/dgl_sample/lib/python3.7/site-packages/dgl/graph.py", line 3238, in update_all                       │
    Runtime.run(prog)                                                                                                                  │
  File "/home/vamship/.conda/envs/dgl_sample/lib/python3.7/site-packages/dgl/runtime/runtime.py", line 11, in run                      │
    exe.run()                                                                                                                          │
  File "/home/vamship/.conda/envs/dgl_sample/lib/python3.7/site-packages/dgl/runtime/ir/executor.py", line 204, in run                 │
    udf_ret = fn_data(src_data, edge_data, dst_data)                                                                                   │
  File "/home/vamship/.conda/envs/dgl_sample/lib/python3.7/site-packages/dgl/runtime/scheduler.py", line 972, in _mfunc_wrapper        │
    return mfunc(ebatch)                                                                                                               │
  File "/home/vamship/.conda/envs/dgl_sample/lib/python3.7/site-packages/dgl/nn/pytorch/conv/relgraphconv.py", line 147, in bdd_message│
_func                                                                                                                                  │
    node = edges.src['h'].view(-1, 1, self.submat_in)                                                                                  │
  File "/home/vamship/.conda/envs/dgl_sample/lib/python3.7/site-packages/dgl/utils.py", line 285, in __getitem__                       │
    return self._fn(key)                                                                                                               │
  File "/home/vamship/.conda/envs/dgl_sample/lib/python3.7/site-packages/dgl/frame.py", line 655, in <lambda>                          │
    return utils.LazyDict(lambda key: self._frame[key][rows], keys=self.keys())                                                        │
  File "/home/vamship/.conda/envs/dgl_sample/lib/python3.7/site-packages/dgl/frame.py", line 97, in __getitem__                        │
    return F.gather_row(self.data, user_idx)                                                                                           │
  File "/home/vamship/.conda/envs/dgl_sample/lib/python3.7/site-packages/dgl/backend/pytorch/tensor.py", line 156, in gather_row       │
    return th.index_select(data, 0, row_index)                                                                                         │
RuntimeError: [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 1088460000 byte│
s. Error code 12 (Cannot allocate memory)

lerachel9900 · August 13, 2020, 8:02pm

I just finished running it with a GPU. If you want here is the output.

vamships · August 13, 2020, 8:14pm

Thanks for sharing the output! Could you share your GPU configuration?
Also, did you need to change anything in the script to run the evaluation bit?

lerachel9900 · August 13, 2020, 9:11pm

CPU: Intel Core i9-9920X CPU @ 3.50GHz
Socket: 1
Core per socket: 12
Thread per core: 2
Total max threads: 24
RAM: 128GB
GPU

1 x NVIDIA RTX 2080 ti
1 x NVIDIA QUADRO RTX 8000

No I just run the script like mentioned earlier (omitting --filtered/–raw)

vamships · August 13, 2020, 9:19pm

Okay, thanks! That was helpful. Seems like RAM is a big factor for evaluation.