when i train with graph about ten billion edge, i get the error blow.
env:
centos7
python 3.6.8
torch 1.9.0
dgl 0.7.0
loss.backward()
File "/usr/local/lib64/python3.6/site-packages/torch/_tensor.py", line 255, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/usr/local/lib64/python3.6/site-packages/torch/autograd/__init__.py", line 149, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: [/sources/pytorch/third_party/gloo/gloo/transport/tcp/unbound_buffer.cc:136] Timed out waiting 1800000ms for send operation to complete