DGL with multi GPU devices at pytorch backend

Hi I’m new to DGL.

I should handle pytorch CNN layers and DGL graph layers simultaneously. And I have a problem to parallelize my model. I tried DataParallel and DistributedDataParallel both.

class MyModel(nn.Module): 
    def __init__(...):
        ....
        self.gconv=MPNNGNN(Nf_NODE, Nf_EDGE,node_out_feats=N_NODE_OUT)
        ....
    def forward(self,x,bg):
        ###bg is graph
        bg_n=Graph.ndata['x']
        bg_e=torch.nn.functional.one_hot(bg.edata['x'].long()).float()
        frag_out=self.gconv(bg,bg_n,bg_e)
        ......
        return out

I want each GPU to go through the above “forward pass” individually. And every GPU must see the same graph. In other words, I don’t want my graph to split during the parallelization process.

So I make my code as below:

model = MyModel().cuda(device_ids[0])
model=torch.nn.parallel.DistributedDataParallel(model,find_unused_parameters=True,device_ids=device_ids)
.....
for epoch in range(start_epoch, end_epoch):
    for data,a_dx in training_generator:
        data=data.to(device=device,non_blocking=True) #" Not for graph, nevermind"
        a_dx=a_dx.to(device=device,non_blocking=True)#"Not for graph, never mind"
        #"bg is graph"
        bg=Glib.return_graph() #"call already prepared graph"
        bg=bg.to(device=device,non_blocking=True) 
        energy = model(coord,bg)

However, I got an error message

    energy = model(coord,bg)
  File "/home/dngusdnr1/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/dngusdnr1/anaconda3/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 799, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/dngusdnr1/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "ddp_test.py", line 522, in forward
    frag_out=self.gconv(bg,bg_n,bg_e)
  File "/home/dngusdnr1/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/dngusdnr1/hmap/develop/se3/ddp/prac_graph.py", line 74, in forward
    node_feats = self.project_node_feats(node_feats) # (V, node_out_feats)
  File "/home/dngusdnr1/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/dngusdnr1/anaconda3/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/home/dngusdnr1/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/dngusdnr1/anaconda3/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 96, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/dngusdnr1/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 1847, in linear
    return torch._C._nn.linear(input, weight, bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:3 and cuda:0! (when checking arugment for argument mat1 in method wrapper_addmm)

I have same error message when I use nn.DataParallel. “RuntimeError: Expected all tensors to be on the same device …”

What should I do…?

It is possible that either bg, coord or your model has a different device. Could you take a look?

The general pipeline of using DistributedDataParallel should follow this tutorial for graph classification and this tutorial for node classification. DGL currently does not support DataParallel.

Hi I was wandering if I could support the following scenario in DGL. (A beginner so question maybe a bit vague).
Input: A graph which is stored across machines .
I figured out that there is a way for distributed graph partitioning using METIS.
Once I have these partitions across machines can I enable multi machine multi gpu training ?
(I have seen dgl tutorials which support single machine multi gpu training and another one on distributed node classification(this uses different machines but no gpu) . So was wandering if I could extend multi machine multi gpu graph training as well)

You can look at the example here: https://github.com/dmlc/dgl/tree/master/examples/pytorch/graphsage/experimental.

1 Like