Distributed data loader

Based on the latest implementation of NodeDataLoader and EdgeDataLoader, it seems that both classes already support DistGraph sampling, including node sampling, edge sampling, and negative sampling. Why do we have a separate class DistDataLoader? Is there anything special about DistDataLoader?

DistDataloader is the replacement of PyTorch’s Dataloader. However, currently DGL requires all the clients including the sampler subprocess to join the cluster together at the beginning. Therefore the logic is different from PyTorch’s dataloader. Thus we customize it as our own DistDataloader.

Thanks for the clarification. After reading the master branch, it seems both NodeDataLoader has already used DistDataLoader in its implementation.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.