Random walk sampling on a giant graph

Awesome library, thank you for the good work!

I have a question about working on giant graph. There are about ten billion nodes in my heterogeneous graph, and I want using the metapath based method to embed them. However, before calling the metapath_random_walk, it shoud use the hetero_from_relations methods to construct the relationship, which consume large memory. Could you give me some suggestions to solve this problem? Thank you very mach!

Best Regards!

Currently we might not have very good built-in support for your scenario. Here is a possible solution:

  1. Sample subgraphs from each relation graph.
  2. Perform hetero_from_relations on the sampled relation subgraphs.

Is there an example to sample on subgraphs? or the library support the sampling on subgraphs by multi-worker? Thank you very much~~~

We have a built-in support for sampling subgraphs and construct data structures called NodeFlow. The sampling and construction of them is quite efficient. Unfortunately it has a different API from DGLGraph, making things a bit more complicated. You may find this tutorial helpful.

Hi, I think that ten billion nodes needs distributed solution on DGL, and we are pushing it. Please notice our release news !

Out of curiosity, what is your hardware configuration (memory, # CPUs, disk space, etc.)? Do you expect a single-large-machine solution, or a distributed solution, or a single-machine on-disk solution, etc.?