Any of authors could briefly introduce the lowlevel graph representations in DGL, something like the storage mechanism of edges (adjacency list or adjacency matrix), data structure it used, and how to effectively fetch nodes / edges from lowlevel storage, is multithread or multiprocessing techniques used?
Lowlevel graph representations (C++ stuffs)
Hi,

graph.h defines the interface of data structure in standard DGLGraph, immutable_graph.h defines the interface of data structure in readonly DGLGraph(for readonly graphs, we could maintain csr/csc representations and other things that are extremely useful for some applications, especially on large graphs).

graph.cc and immutable_graph.cc implements the functions defined in the headers mentioned above.

nodeflow.cc defines a paradigm that is useful for training on subgraphs, there would be a blog post explaining this recently.

sampler.cc this implements frequentlyused graph samplers, which is useful for training gcn on large graphs.
multithreading and multiprocessing are not used for most cases; However, for distributed training, These are required because it seems sampling graph is a bottleneck, and we create multiply threading/process to sample subgraphs in this scenario.
aksnzhy is working on this, if you are interested in our progress on this or you would like to contribute, feel free to contact him.
I also want to learn the C++ source code, but i find it hard to understand the variable meaning, could you add more annotations? For example:
indptr
, indices
and edge_ids
defines a sparse matrix(this format is also known as csr representation), our variable names are the same as scipy, please see wiki: sparse matrix, csr_matrix in scipy and this blog for more details.
Thx! I think it is better to reveal these materials in the source code as the form of annotations.