RuntimeError: CUDA out of memory

Hi everyone:

Im following this tutorial and training a RGCN in a GPU: 5.3 Link Prediction — DGL 0.6.1 documentation

My graph is a batched one formed by 300 subgrahs and the following total nodes and edges:

Graph(num_nodes={‘ent’: 31167},
num_edges={(‘ent’, ‘rel_1’, ‘ent’): 29290, (‘ent’, ‘rel_2’, ‘ent’): 142290, (‘ent’, ‘rel_3’, ‘ent’): 20280})

When running training the model on the full graph I get this error:

model.to(device)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 852, in to
    return self._apply(convert)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 530, in _apply
    module._apply(fn)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 530, in _apply
    module._apply(fn)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 530, in _apply
    module._apply(fn)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 552, in _apply
    param_applied = fn(param)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 850, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA out of memory. Tried to allocate 3.62 GiB (GPU 0; 11.91 GiB total capacity; 7.60 GiB already allocated; 3.48 GiB free; 7.62 GiB reserved in total by PyTorch)

I think the problem is that the hidden_dimension is, in this case, around 30000, since it is the number of nodes.

When training the model just in a 100 subrgraphs batched graph instead of on the full 300 I get another error:

Epoch 1/20: 
Traceback (most recent call last):
  File "rgcn.py", line 232, in <module>
    loss.backward()
  File "/opt/conda/lib/python3.7/site-packages/torch/_tensor.py", line 255, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/opt/conda/lib/python3.7/site-packages/torch/autograd/__init__.py", line 149, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

The limit appears to be 80 of this subgraphs, then I get no error.

Since this tutorial doesnt include batching during training, how could I use it? Could batching fix this? Is there any other way to fix it?

Since my graph is a Batched Graph, would it help the memory issues if I use this batches to train the model on every subgraph iteratively instead on the big graph? Since the hidden dimension must be the number of nodes, this could be a problem, since every subgraph has a different number of nodes.

The model parameters look like this:

model = Model(embeddings_dimensions, num_nodes, num_nodes, g.etypes)

Where the embeddings dimension is a 700 dimension tensor, and the num_nodes is 30000 for the full graph. Could this be the problem?

I installed the following cuda version: pip install dgl-cu101 -f https://data.dgl.ai/wheels/repo.html

Could install it from source help with this?

Thank you all.

This seems to be a follow-up of RGCN parameters: in_features, hidden_features, out_features - #8 by BarclayII. Let’s continue our discussion there?

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.