Run distributed training on GPU

I’m currently working on distributed training on OGB-Products dataset with 4GPUs following the tutorial here: Distributed Node Classification — DGL 0.7.2 documentation.

However, I just found that the graph and model are on CPU. When I tried to put them to GPU using .to(“cuda”), I got the error that DistGraph don’t support GPU. I wonder if there is a good way to do graph distributed training on GPU?

Just fixed the bug. Basically you need to follow the code here: dgl/examples/pytorch/graphsage/experimental at 4889c5782290f1990c924fbea14ba904a3248231 · dmlc/dgl · GitHub.
Also you need to change the code here: dgl/ at 4889c5782290f1990c924fbea14ba904a3248231 · dmlc/dgl · GitHub

batch_inputs = blocks[0].srcdata['features'].to(device)

@ruisizhang123 this is not a bug.

DistDGL v1 was proposed to deal with the case that the whole graph cannot fit into GPU memory, it only loads sampled subgraph and corresponding node/edge features to GPU, and the whole node embeddings and graph structures were stored on CPU and were updated in an async way.

If you are working on a single machine multi-GPU setting you are supposed to follow this tutorial where you don’t need to use DistGraph.

Thanks for your reply! I’m working on distributed training. The example code in the tutorial didn’t put the subgraph to CUDA and I was having a hard time fixing the problem. I think the tutorial code in the distributed training here Distributed Node Classification — DGL 0.7.2 documentation might be a little confusing.

Please follow example at dgl/ at master · dmlc/dgl · GitHub