Hello, I have a question related to distributed GNN training using DistDGL, and I hope you may give me some advice if I was wrong or not.
- My understanding of distributed GNN training: To my knowledge, after partitioning the graph, each worker should have its own partition with all its k-hop neighbors, where k is the num of GNN layers, together with its own partition’s node/edge embeddings. And then during training, workers use
DistTensorto request embeddings from other hosts according to their own partitions.
- Thus, I wonder why does the partition example not assign
num_hopsaccording to the GNN
num_layersbut use the default value 1, while the
num_layersin training scripts is by default 2. I wonder if this is correct for training.
- Besides, if I change the
num_layersin the training scripts, is the training process still correct in this example?