Distributed data loader

jwyao · August 5, 2021, 10:29pm

Based on the latest implementation of NodeDataLoader and EdgeDataLoader, it seems that both classes already support DistGraph sampling, including node sampling, edge sampling, and negative sampling. Why do we have a separate class DistDataLoader? Is there anything special about DistDataLoader?

VoVAllen · August 6, 2021, 6:15am

DistDataloader is the replacement of PyTorch’s Dataloader. However, currently DGL requires all the clients including the sampler subprocess to join the cluster together at the beginning. Therefore the logic is different from PyTorch’s dataloader. Thus we customize it as our own DistDataloader.

jwyao · August 6, 2021, 4:26pm

Thanks for the clarification. After reading the master branch, it seems both NodeDataLoader has already used DistDataLoader in its implementation.

system · September 5, 2021, 4:27pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.