Full batch training on multi-GPU for node classification

Hi!

I’m a beginner in Graph Neural Networks. After I followed the instructions in the tutorial. I’m curious about full graph training on distributed platforms. All the tutorials used graph sampler to reduce memory consumption. However, if I have an abundant amount of device memory, is there any method to put all the graph on GPUs to shorten the communication overhead? (For instance, partition the graph and load each partition to GPUs)

It seems like the difference between full-batch training and mini-batch training. All the sample code right now is mini-batch training. I’m wondering whether full-batch training on distributed platform is supported in DGL or not. If so, is there any sample code or API references available?

I have tried to write a toy code. But it seems not full batch training, just partition the graph and train them separately.

Thank you very much for your time in reading my question.

As far as I know there is no distributed full-graph training solution with DGL.

If anybody else has any idea please let us know. Thanks!

1 Like

Thanks for your reply!

I’ll try to find another way.

Sure it has . But it is a little different with official guide. just see a paper named BNS-GCN in MLSys 2022. The code for this paper has already been open-source on GitHub. I think that is what you want.

This is exactly what I was looking for!

The author’s code style is very standardized and very informative. The code has been run perfectly. And I learned a lot from it.

Thank you so much for your reply!

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.