Full batch training on multi-GPU for node classification

hemeng · November 27, 2023, 1:34pm

Hi!

I’m a beginner in Graph Neural Networks. After I followed the instructions in the tutorial. I’m curious about full graph training on distributed platforms. All the tutorials used graph sampler to reduce memory consumption. However, if I have an abundant amount of device memory, is there any method to put all the graph on GPUs to shorten the communication overhead? (For instance, partition the graph and load each partition to GPUs)

It seems like the difference between full-batch training and mini-batch training. All the sample code right now is mini-batch training. I’m wondering whether full-batch training on distributed platform is supported in DGL or not. If so, is there any sample code or API references available?

I have tried to write a toy code. But it seems not full batch training, just partition the graph and train them separately.

Thank you very much for your time in reading my question.

BarclayII · November 30, 2023, 2:07am

As far as I know there is no distributed full-graph training solution with DGL.

If anybody else has any idea please let us know. Thanks!

hemeng · November 30, 2023, 2:30am

Thanks for your reply!

I’ll try to find another way.

gamdwk · December 2, 2023, 7:48am

Sure it has . But it is a little different with official guide. just see a paper named BNS-GCN in MLSys 2022. The code for this paper has already been open-source on GitHub. I think that is what you want.

hemeng · December 4, 2023, 1:48pm

This is exactly what I was looking for!

The author’s code style is very standardized and very informative. The code has been run perfectly. And I learned a lot from it.

Thank you so much for your reply!

system · January 3, 2024, 1:49pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.