Single Machine with many gpus

I am a novice in the field of distributed graph training. I am not sure if I can use DGL to do this. I have a computer with 4 GPUs. I want to perform graph partitioning on the graph dataset using DGL, for example, with Metis. After obtaining the partitions, I want each GPU to load one partition, with each GPU retaining a complete model and training simultaneously. During this process, there might be a need to fetch neighbor node features across GPUs. After each epoch, gradients should be synchronized to achieve the goal of single-machine multi-GPU training.

I’d be grateful if someone could provide the source code for the above scenario

Have you tried this example? It is not quite what you described it can train on multiple GPUs without replicating neither features or the graph.

In fact, this example is dividing the graph dataset, not applying a graph partitioning algorithm, which I had to use since my current research area is optimising the efficiency of distributed graph training with graph partitioning algorithms

Because I checked and found out that the distributed sampler here is going to divide the dataset based on the number of GPUs, but that’s not graph partitioning

I’m not sure if I’m misunderstanding, because the official documentation doesn’t specify this scenario

The distributed sampler partitions the training set across the GPUs, but there is no graph partitioning or feature partitioning performed. All GPUs access the same graph and features in memory when you use the “pinned-cuda” mode. For “cuda-cuda” mode, the graph and features are replicated across GPUs.

So can you please tell me if I want to complete the usage scenario in my question, is it possible to do it through DGL, because I don’t quite understand, the example given in the official docs is multi-machine distributed, with one partition loaded per machine, but I want to do it on a single machine with multiple GPUs, with one GPU loaded with a single graph partitioned data,thank u

DistDGL has metis partitioning capabilities. You might want to look into the distributed examples. I am sure you can run them even on a single machine.

But I think the GraphBolt example above will be much more efficient than the distributed examples.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.