Question about DistDGL

In DistDGL[1], I am curious about whether target training nodes are local shuffling (regardless of computation load balance) ?

If this is the case, would the training accuracy be affected considering the training sequence is not fully randomized? A similar paper PaGraph[2] said local shuffling would slow convergence.

[1] DistDGL: Distributed Graph Neural Network Training for Billion-Scale Graphs, Zheng, Da and Ma, Chao and Wang, Minjie and Zhou, Jinjing and Su, Qidong and Song, Xiang and Gan, Quan and Zhang, Zheng and Karypis, George
[2] PaGraph: Scaling GNN Training on Large Graphs via
Computation-aware Caching

Yes, in DistDGL, training nodes are distributed according to the partition assignment to maximize data locality. That is said, the training nodes in the same worker are more likely to cluster with each other (and are more likely to share the same label if the graph is homophily), which may impact convergence as mentioned in PaGraph. Interestingly, this is not a big issue in practice because the final gradient is an average across all the workers, which can be viewed as a mixing of samples from different distribution.

1 Like

Oh, I see. Thanks for your prompt reply, have a nice day. :slight_smile: