Communication mechanism about DistDGL

yaqi · September 6, 2023, 4:09am

I have some doubts about the communication mechanism in DistDGL. For example, let’s say we have a graph G={(3->1),(2->1)}, and we use the DGL partition method to split the graph into two parts. The first part is V1={3,2}, and the second part is V2={1} (with each part stored on machine 1 and machine 2, respectively). During training, vertex 3 in machine 1 needs vertex 1 in machine 2, and vertex 2 in machine 1 also needs vertex 1 in machine 2. In this scenario, will DistDGL communicate the data for vertex 1 once or twice?

Rhett-Ying · September 6, 2023, 11:48pm

do you mean fetch the edge between node 2/3 and 1, or the node feature data?

in graph partition(usually metis), graph is partitioned according to in_edges. And if G=(1->3, 1->2, …) and 1 is partitioned to 2nd part while 2, 3 on 1st part. edges 1->3, 1->2 will also lie in 1st part alongside node 2/3. So no need to fetch edges from remote. If we do need to access node feature of node 1, it will be fetched from remote via DistTensor.

yaqi · September 7, 2023, 1:10am

Yes, I am referring to node features. Does DistDGL communicate twice directly? Or does it communicate once and then duplicate the data into two copies?

Rhett-Ying · September 7, 2023, 1:47am

once for each call of g.ndata['feat'][torch.LongTensor([1])

system · October 7, 2023, 1:47am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.