Is there a way to avoid copying the graph "batch_size" times?

I am currently using DGL with great success for my research - my problem setup is as follows:
I have 1 graph and a batch of node features - however to use DGL, I use dgl.batch(graphs) which is just a waste because it is 1 graph getting copying batch_size times.

I have a large graph with 100k nodes and 500k+ edges and I am unable to use DGL because of this inefficiency in how I am implementing it.

Is there a better way?

EDIT: Also, any advice on how to estimate GPU requirement of just the graph part?

Why do you want to batch multiple big graphs? If the graph is big enough, batching may not help speed and may result in out of memory.

The main memory consumption is feature tensors and intermediate result, the dgl graph usually consumes little

From your problem statement it sounds like you have a dataset with the same graph but different node features.

An easy way would be creating a single dgl.graph() object with that structure, and at each iteration simply assigning a batch of node features to it with g.ndata.

Please feel free to follow up.


Hi @BarclayII, you are correct - I just have one big graph.

Can you talk more about assigning a batch of node features? Right now I am unable to set features with g.ndata because of a minibatch of size ‘b’ and graph has n nodes so my mini batch has bn sized dataset

Let’s say you have a batch of node features data with size (dataset_size, number_of_nodes, feature_size). Since g.ndata requires the features to have the same number of rows as number of nodes, you can simply do

g.ndata['features'] = data.permute(1, 0, 2)