Speed up DGLGraph.subgraphs (or dgl.unbatch)?


#1

Is it possible to speed up either of these operations? I am trying to split about 100k nodes into 10k subgraphs and this operation takes up 95% of my code’s runtime. I have tried to run DGLGraph.subgraphs in parallel with multiprocessing (this would give sufficient speedup), but am getting the following error:

multiprocessing.pool.MaybeEncodingError: 
Error sending result: '[[<dgl.subgraph.DGLSubGraph object at 0x7f07fc620278>]]'. 
Reason: 'NotImplementedError('SubgraphIndex pickling is not supported yet.',)'

#2

Are these 10K subgraphs connected? Or it’s more like an unbatch case where each subgraph is one component?


#3

No connections, just like the unbatch case. I actually have no edges at all; DGL just makes it much easier to compute over unequally-sized groups of nodes.

I wonder if I could make a child class of SubgraphIndex which supports pickling?


#4

I realized since I’m not currently using edges that I could just create new dgl.DGLGraph objects with the number of nodes I need for each subgraph. DGLGraphs can be serialized, but it seems there is a linear-time overhead when doing this which makes multiprocessing prohibitive (running without multiprocessing gives similar performance DGLGraph.subgraphs).

Still looking for a solution (ideally with DGLGraph.subgraphs.) if possible.