Examples of PyTorch for "Run multi-processing training" in "Large-Scale Training of Graph Neural Networks" Tutorial

It seems that the tutorial “Large-Scale Training of Graph Neural Networks” runs only on MxNet. Is there any code for “run_store_server.py” in PyTorch?

Moreover, I did not find the script of ``…/incubator-mxnet/tools/launch.py``` in the repo, which is used by the “Run multi-processing training” Session of https://github.com/dmlc/dgl/tree/master/examples/mxnet/sampling. Also, there is no pytorch version for the “Run multi-processing training” .

There are several examples in the DGL repo with multigpu support:

Hello @classicsong,

Do these examples work for multi node settings?
Since they use DistributedDataParallel, minor changes can be made regarding the coordination, but I’m not sure about the data itself.
Do I need to do further data processing?
For example, our testbed has 8 nodes, each with one GPU.

The distributed training in DGL is under development. We will release tools for distributed training in 0.5 release.

@classicsong what is the expected timeline for 0.5 release?

0.5 release will be in early August.

1 Like