Hi, DGL team, now I use the DGL to do a graph classification, the training data is too large, so I decided to use the GraphDataLoader class in a DDP way, like setting the use_ddp as True in the GraphDataLoader. When I use this setting, I find that the training time is decreasing, but the GPU memory is increasing. Compared to the example code which uses the DataLoader class, the original code can have 4 batch sizes per GPU, but in the new code, it can only have 1 batch size per GPU, I want to know why. what causes it?
Is anyone who meets the similar questions?
To increase the speed, DGL batches graphs to a batched graph, which is a deep copy. So I guess the memory increase comes from here.
And have you seen memory leak or just memory increase?
I think it depends on the pytorch and the dgl version, since I used the pytorch==1.10.1 with dgl==0.8.1, but now when I use the pytorch==1.7.1 with dgl==0.6.1, each GPU can have 4 batch sizes, I do not know the reason.
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.