Random walk impl with COO format (not CSR)?

The current DGL random walk implementations require a CSR graph format. For my use case, the CSR format requires a much larger memory footprint than its COO format:
For 68B edges (5 edge types), DGL(COO) is 1099GB cpu memory, and DGL(CSR) is 2470GB cpu memory.

Do you think it’d be possible to implement a random walk implementation that uses the COO format and is also just as performant as the CSR-based implementation? The high CPU memory overhead of CSR is causing complications with serving.


Is the number 2470G only from CSR or from both CSR and COO? In general COO storage is O(2 * E) while CSR is O(2 * E + N). It shouldn’t differ that much. You could try directly creating a graph from CSR and see again.

In general I think it’s hard to implement a COO-based random walk because you need to find the successors of a vertex, unless the COO array is sorted (which is equivalent to a CSR).

The 2470G is only from CSR (I had used graph = graph.formats(['csr']) to force CSR only). This CPU memory discrepancy is indeed strange, I need to investigate what’s going on there, as for our graphs num_edges >> num_nodes.

Thanks for the info!

One thing I realized recently is that our COO to CSR conversion is not optimal; while the output consumption is O(2 * E + N), the peak consumption is actually more than that, probably 4 * E, because we maintain some O(E) arrays for sorting.

You could directly construct a CSR graph by

g = dgl.graph(('csr', (indptr, indices, data)))