How to read the source code?

I want to learn the implementation of DGL. Can anyone give me a guide? Please tell me the motivation of DGL and how to optimize it? THX

We will give a detail code walkthrough in the form of blog post probably this month. You could subscribe to our blogs for updates. In the time being, here is a brief summary of what is going on in source tree:

|-- conda   # conda related install script
|-- docker  # docker script
|-- docs  # all the document codes using sphinx
|-- examples
 |-- mxnet  # mxnet examples
 |-- pytorch  # pytorch examples
|-- include  # C++ lib headers
 |-- dgl
  |-- runtime  # headers for CFFI solution
   |-- ...
  |-- graph.h  # Graph data structure using adjlist
  |-- graph_interface.h  # The base interface class
  |-- immutable_graph.h  # graph data structure using CSR
  |-- graph_op.h  # graph traversal, transformation, etc.
  |-- scheduler.h  # C routines used by scheduler
|-- python
  |-- dgl
   |-- _ffi   # CFFI python side codes
   |-- backend  # mxnet/pytorch specific backend codes
   |-- contrib  # codes in the stage of contribution
   |-- data  # ready-to-use dataset package such as CoraDataset
   |-- function  # builtin message/reduce functions
   |-- nn  # pre-defined GNN layers (e.g. GCNLayer)
   |-- runtime  # IR and execution logic for message passing
   |--  # for batching multiple graphs
   |--  # data structure for storing node/edge featuers
   |--  # DGLGraph (~= GraphIndex + Frame)
   |--  # graph structure class (no feature storage)
   |--  # feature initializers
   |--   # internal ndarray wrapper used by DGL
   |--  # propagation APIs (e.g. topo_nodes)
   |--  # subgraph data structure
   |--  # graph transformation APIs (e.g. line_graph)
   |--  # graph traversal APIs.
   |--  # UDF related data structure (e.g. NodeBatch, EdgeBatch)
   |--  # Graph views
|-- src   # C++ source codes
  |-- graph
  |-- runtime
  |-- scheduler
|-- tests   # unittests
|-- third_party  # all external dependencies
|-- tutorials  # python script for the tutorials on our doc site.

@minjie Could you elaborate on how the runtime and scheduling works? I have been looking through the code and its very clear and well commented, however I think the overall methodology is being lost on me. Could you explain what the IR stands for and what its purpose is? And could you perhaps give a quick list of the high level steps involved in taking a send or recv call and ultimately transforming it into kernels to run on the GPU?
