Hi,
I have been trying out some of the examples on R-GCN and ran into an issue with link prediction. Particularly, I followed the example scripts here, and was able to successfully complete all the examples under entity classification. However, when running the link prediction example on the FB15K-237 dataset, I got the following error in the evaluation phase.
RuntimeError: [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 1088460000 bytes. Error code 12 (Cannot allocate memory)
I am running using a GPU machine with CUDA 10.1. Any help with resolving this issue will be greatly appreciated. A similar issue was reported in the forum a couple of months ago.
Below is the complete log of output from the run:
Namespace(dataset='FB15k-237', dropout=0.2, edge_sampler='uniform', eval_batch_size=500, evaluate_every=500, gpu=0, grad_norm=1.0, graph_batch_size=30000, graph_split_size=0.5, lr=0.01, n_bases=100, n_epochs=6000, n_hidden=500, n_layers=2, negative_sample=10, regularization=0.01)
# entities: 14541
# relations: 237
# edges: 272115
Test graph:
start training...
Epoch 0100 | Loss 0.2027 | Best MRR 0.0000 | Forward 0.2048s | Backward 0.4030s
Epoch 0200 | Loss 0.1262 | Best MRR 0.0000 | Forward 0.2030s | Backward 0.4027s
Epoch 0300 | Loss 0.1040 | Best MRR 0.0000 | Forward 0.2026s | Backward 0.4020s
Epoch 0400 | Loss 0.0927 | Best MRR 0.0000 | Forward 0.2038s | Backward 0.4028s
Epoch 0500 | Loss 0.0859 | Best MRR 0.0000 | Forward 0.2037s | Backward 0.4041s
start eval
Traceback (most recent call last):
File "link_predict.py", line 348, in <module>
main(args)
File "link_predict.py", line 239, in main
embed = model(test_graph, test_node_id, test_rel, test_norm)
File "/home/vamship/.conda/envs/sample-graph-analysis/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "link_predict.py", line 92, in forward
return self.rgcn.forward(g, h, r, norm)
File "/home/vamship/sample-graph-analysis/sample_graph_analysis/rgcn_sample/model.py", line 57, in forward
h = layer(g, h, r, norm)
File "/home/vamship/.conda/envs/sample-graph-analysis/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
result = self.forward(*input, **kwargs)
File "/home/vamship/.conda/envs/sample-graph-analysis/lib/python3.7/site-packages/dgl/nn/pytorch/conv/relgraphconv.py", line 180, in forward
g.update_all(self.message_func, fn.sum(msg='msg', out='h'))
File "/home/vamship/.conda/envs/sample-graph-analysis/lib/python3.7/site-packages/dgl/graph.py", line 2747, in update_all
Runtime.run(prog)
File "/home/vamship/.conda/envs/sample-graph-analysis/lib/python3.7/site-packages/dgl/runtime/runtime.py", line 11, in run
exe.run()
File "/home/vamship/.conda/envs/sample-graph-analysis/lib/python3.7/site-packages/dgl/runtime/ir/executor.py", line 204, in run
udf_ret = fn_data(src_data, edge_data, dst_data)
File "/home/vamship/.conda/envs/sample-graph-analysis/lib/python3.7/site-packages/dgl/runtime/scheduler.py", line 949, in _mfunc_wrapper
return mfunc(ebatch)
File "/home/vamship/.conda/envs/sample-graph-analysis/lib/python3.7/site-packages/dgl/nn/pytorch/conv/relgraphconv.py", line 144, in bdd_message_func
node = edges.src['h'].view(-1, 1, self.submat_in)
File "/home/vamship/.conda/envs/sample-graph-analysis/lib/python3.7/site-packages/dgl/utils.py", line 285, in __getitem__
return self._fn(key)
File "/home/vamship/.conda/envs/sample-graph-analysis/lib/python3.7/site-packages/dgl/frame.py", line 655, in <lambda>
return utils.LazyDict(lambda key: self._frame[key][rows], keys=self.keys())
File "/home/vamship/.conda/envs/sample-graph-analysis/lib/python3.7/site-packages/dgl/frame.py", line 97, in __getitem__
return F.gather_row(self.data, user_idx)
File "/home/vamship/.conda/envs/sample-graph-analysis/lib/python3.7/site-packages/dgl/backend/pytorch/tensor.py", line 152, in gather_row
return th.index_select(data, 0, row_index)
RuntimeError: [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 1088460000 bytes. Error code 12 (Cannot allocate memory)