Speed up DGLGraph.subgraphs (or dgl.unbatch)?

Is it possible to speed up either of these operations? I am trying to split about 100k nodes into 10k subgraphs and this operation takes up 95% of my code’s runtime. I have tried to run DGLGraph.subgraphs in parallel with multiprocessing (this would give sufficient speedup), but am getting the following error:

multiprocessing.pool.MaybeEncodingError: 
Error sending result: '[[<dgl.subgraph.DGLSubGraph object at 0x7f07fc620278>]]'. 
Reason: 'NotImplementedError('SubgraphIndex pickling is not supported yet.',)'

Are these 10K subgraphs connected? Or it’s more like an unbatch case where each subgraph is one component?

No connections, just like the unbatch case. I actually have no edges at all; DGL just makes it much easier to compute over unequally-sized groups of nodes.

I wonder if I could make a child class of SubgraphIndex which supports pickling?

I realized since I’m not currently using edges that I could just create new dgl.DGLGraph objects with the number of nodes I need for each subgraph. DGLGraphs can be serialized, but it seems there is a linear-time overhead when doing this which makes multiprocessing prohibitive (running without multiprocessing gives similar performance DGLGraph.subgraphs).

Still looking for a solution (ideally with DGLGraph.subgraphs.) if possible.

Just to provide a visualization:

Note the number of calls to node_subraph vs node_subgraphs. I ran 644 iterations, so about 8298/644=13 calls to subgraphs per minibatch. Any ideas for speedup / parallelization greatly appreciated.

It looks like most of the time spent in unbatch is related to frame (where the node/edge features are stored). Do you have node/edge features? Splitting a 100k tensor into 10k tensors (of size 10) might be the reason why it is slow.

For subgraph, how do you create the parent graph?

Ah, this is interesting. Yes I have node features, but perhaps there is a way I don’t need to. Will look into this.

The parent graph contains only nodes, no edges. For this module I use DGL to compute over a variable number of variable-sized groups of nodes. To create the parent graph, in the dataloader I create a DGLGraph with the total number of nodes, then call g.subgraphs to break the nodes into groups (node order is not changed, just how they’re grouped). Next batch those subgraphs together with dgl.BatchedDGLGraph, and finally I add node tensors to the batched graph.