A question about distributed sampler

Thanks for your excellent work.
And I have a question about distributed sampler. It seems that DGL stores a graph on one machine, but uses samplers on different machines to sample it. I don’t understand why it can improve performance, and I think that maybe it will require lots of network communication?

Our practice suggests the bottleneck of large-scale graph neural network training is not at computation but at graph sampling, and the distributed sampler aims to accelerate this part.

@zhengda1936 could you please provide more details?

Yes, I agree with you.
But my question is that because DGL stores graph on one machine, so distributed samplers need to do sampling on this machine, and then send the sample to the training machine. And these processes will produce lots of network communications?

sorry for the late reply. The sampled results are usually just graph structures (stored in the CSR format). Its storage size is pretty size. We found that transferring the graph structure is usually not the bottleneck.