I have some doubts about the communication mechanism in DistDGL. For example, let’s say we have a graph G={(3->1),(2->1)}, and we use the DGL partition method to split the graph into two parts. The first part is V1={3,2}, and the second part is V2={1} (with each part stored on machine 1 and machine 2, respectively). During training, vertex 3 in machine 1 needs vertex 1 in machine 2, and vertex 2 in machine 1 also needs vertex 1 in machine 2. In this scenario, will DistDGL communicate the data for vertex 1 once or twice?
do you mean fetch the edge between node 2/3 and 1, or the node feature data?
in graph partition(usually metis), graph is partitioned according to in_edges
. And if G=(1->3, 1->2, …) and 1
is partitioned to 2nd part while 2, 3
on 1st part. edges 1->3, 1->2
will also lie in 1st part alongside node 2/3. So no need to fetch edges from remote. If we do need to access node feature of node 1, it will be fetched from remote via DistTensor
.
Yes, I am referring to node features. Does DistDGL communicate twice directly? Or does it communicate once and then duplicate the data into two copies?
once for each call of g.ndata['feat'][torch.LongTensor([1])
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.