Partition graph with deprecated reshuffle = False

superhaiou · May 3, 2021, 12:07am

Hi there, I noticed that the comments of function partition_graph in python/dgl/distributed/partition.py say that:
“293 If reshuffle=False, node IDs and edge IDs of a partition do not fall into contiguous
294 ID ranges. In this case, DGL stores node/edge mappings (from
295 node/edge IDs to partition IDs) in separate files (node_map.npy and edge_map.npy).
296 The node/edge mappings are stored in numpy files.
298 … warning::
299 this format is deprecated and will not be supported by the next release.”

In our graph, keeping the original node/edge ids as opposed to maintaining a mapping between the local contiguous ids and global ids would reduce much code change, so can I use the deprecated flag (reshuffle=False) for the partitioning? BTW, what were the reasons behind the scenes to introduce local contiguous ids by additionally maintaining a map? Thanks

superhaiou · May 8, 2021, 12:49am

Are the below statements still true? Say in DistGraph, “For heterogeneous graphs, users need to convert them into DGL graphs with one node type and one edge type” ?

Note
    ----
    ``DistGraph`` currently only supports graphs with only one node type and one edge type.
    For heterogeneous graphs, users need to convert them into DGL graphs with one node type and
    one edge type and store the actual node types and edge types as node data and edge data.

superhaiou · May 8, 2021, 12:51am

One more question from me was posted in here, thanks!

zhengda1936 · May 10, 2021, 3:26pm

I don’t think reshuffle=False can still be used in the latest version. The whole reason that we want to shuffle node/edge IDs is to avoid maintaining vectors of N elements (N is the number of nodes/edges in the global graph) in each trainer process that map between node/edge IDs and the partition IDs as well as between node/edge IDs and the local node/edge IDs. This will be memory consuming for a large graph.

zhengda1936 · May 10, 2021, 3:29pm

superhaiou:

Are the below statements still true? Say in DistGraph, “For heterogeneous graphs, users need to convert them into DGL graphs with one node type and one edge type” ?
Note
    ----
    ``DistGraph`` currently only supports graphs with only one node type and one edge type.
    For heterogeneous graphs, users need to convert them into DGL graphs with one node type and
    one edge type and store the actual node types and edge types as node data and edge data.

can you show me where the statement is? in the latest version, DistGraph supports heterogeneous graph. We need to update it.

superhaiou · May 10, 2021, 11:40pm

Here you go: dgl/dist_graph.py at master · dmlc/dgl · GitHub
There are also asserts as follows, can you explain? Thanks
assert len(self.ntypes) == 1, “ndata only works for a graph with one node type.”
dgl/dist_graph.py at master · dmlc/dgl · GitHub
dgl/dist_graph.py at master · dmlc/dgl · GitHub
dgl/dist_graph.py at master · dmlc/dgl · GitHub

superhaiou · May 12, 2021, 6:39am

Can the above asserts be removed if not valid anymore? @zhengda1936 Thanks

rocketrose99 · June 1, 2021, 7:11pm

Is this the page that explains how DistGraph now supports heterogeneous graphs, or does it need to be updated too?:

https://docs.dgl.ai/en/0.6.x/guide/distributed-hetero.html#guide-distributed-hetero

It seems very similar to the heterogeneous-to-homogeneous conversion workaround explained in the note. Either way, it looks like the end result is a homogeneous graph. Is that true? Or is there a way to do distributed learning without converting to homogeneous?

zhengda1936 · June 2, 2021, 5:47am

Thanks for pointing me to the out-of-date messages.
The asserts are still valid because ndata and edata can only be used on homogeneous graphs. DGLGraph has the same assert.

zhengda1936 · June 2, 2021, 5:50am

For distributed training, we still use homogeneous graph formats to store a heterogeneous graph. however, the Python class DistGraph now supports the APIs designed for heterogeneous graphs. For example, you now can access node data of a particular type as follows:
g.nodes['node_type'].data['h']. this is useful if the heterogeneous graph has different node data and edge data. Previous version doesn’t support.

system · July 2, 2021, 5:50am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.