DGL Graph Data Structure

shadowforce · August 25, 2022, 5:10pm

Hi

I am very new to DGL. I am studying the underlying c++ source code but its hard for me to understand. I have some questions about very technical stuff like finding out the in-memory and disk structure of DLG graph format, cross partition node access etc. A similar question has been asked here but it doesn’t answer my question.

Can any of the authors help me understand:

The concrete structure of dgl graph. does it store adjacency matrix?
How nodes/edges/features are represented in memory and what is the data structure? Tensor?
How the distributed tensors work? Is a distributed tensor spanned across all the worker machines? if so, how is the data divided among them and how is it accessed?
In distributed mode, how are the halo nodes stored in a partition?
How are the nodes corresponding to halo nodes accessed? For example, in partition 1 on machine 1, there is a halo node whose actual node is in partition 2 on machine 2. How is it accessed from machine 1?
Is the dgl graph serialized in binary format to disk?
Is the key/value store used as a parameter server?
What data is stored in the key/value store and in what format?

Sorry for the long list of questions. Its spinning my head.

Rhett-Ying · August 29, 2022, 7:56am

Yes. coo/csc/csr are created if necessary.

zerocopy is achieved via NDArray which points to the tensor memory.

DistTensor relies on client/server framework, kv store and Partition policy. It’s distributed across machines. Clients access via shared memory or pull from remote servers. See more details in dist_tensor.py.

stored in DGLGraph and flags like inner_node are used to distinguish inner node and halo nodes.

send requests to remote server and fetch response.
pls refer to dgl/graph_services.py at 97b2ab53e27c6ba864312040d1fc9c78a49dac7d · dmlc/dgl · GitHub for more details.

Yes. load/save functions are defined for serialize. for example: dgl/unit_graph.cc at 97b2ab53e27c6ba864312040d1fc9c78a49dac7d · dmlc/dgl · GitHub

client/server

graph data, dist tensor. key-value format which are string and tensor data respectively.

shadowforce · September 3, 2022, 1:48pm

Thank you @Rhett-Ying for the detailed explanation and much appreciated.

system · October 3, 2022, 1:49pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.