Hi, I have a dgl.batched_graph.BatchedDGLGraph
with 130000 nodes 16 million edges . Each node has a 100-dimension feature. And I only used builtin functions. Here’s the problem I met during BP:
File "/root/try/bishe/parser/cmds/cmd.py", line 92, in train
loss.backward()
File "/root/Anacondas/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 195, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/root/Anacondas/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py", line 99, in backward
allow_unreachable=True) # allow_unreachable flag
File "/root/Anacondas/anaconda3/lib/python3.7/site-packages/torch/autograd/function.py", line 77, in apply
return self._forward_cls.backward(self, *args)
File "/root/Anacondas/anaconda3/lib/python3.7/site-packages/dgl/backend/pytorch/tensor.py", line 355, in backward
grad_rhs = grad_out.new_empty((rhs_data_nd.shape[0],) + feat_shape)
RuntimeError: CUDA out of memory. Tried to allocate 6.27 GiB (GPU 0; 7.93 GiB total capacity; 2.40 GiB already allocated; 3.62 GiB free; 2.60 GiB reserved in total by PyTorch)
I wonder how to solve this problem. Any suggestions or tips would be appreciated.