The GPU memory is different when I use GraphDataLoader in DGL and DataLoader in Pytorch

dh97 · January 18, 2024, 6:45am

Hi, DGL team, now I use the DGL to do a graph classification, the training data is too large, so I decided to use the GraphDataLoader class in a DDP way, like setting the use_ddp as True in the GraphDataLoader. When I use this setting, I find that the training time is decreasing, but the GPU memory is increasing. Compared to the example code which uses the DataLoader class, the original code can have 4 batch sizes per GPU, but in the new code, it can only have 1 batch size per GPU, I want to know why. what causes it?

dh97 · January 19, 2024, 8:09am

Is anyone who meets the similar questions?

peizhou001 · January 25, 2024, 2:57am

To increase the speed, DGL batches graphs to a batched graph, which is a deep copy. So I guess the memory increase comes from here.
And have you seen memory leak or just memory increase?

dh97 · January 25, 2024, 9:33am

I think it depends on the pytorch and the dgl version, since I used the pytorch==1.10.1 with dgl==0.8.1, but now when I use the pytorch==1.7.1 with dgl==0.6.1, each GPU can have 4 batch sizes, I do not know the reason.

system · February 24, 2024, 9:34am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.