Own dataset and heterograph

So, I could successfully convert my dataset into individual DGL graphs.
If I want to use heterograph, (https://docs.dgl.ai/api/python/batch_heterograph.html), the documentation says combine small graphs into one large one for message passing etc.

That raises two questions:

  1. How large can this “small graphs” be? For example, a few have 5 nodes, others have 10,000 nodes.
  2. Am I on the right path? I basically want to test different GNN architectures on my DGL graphs and measure their performance.

Thanks so much in advance.

1 Like
  1. In theory the “small graphs” can be arbitrary large, so we should probably call them “smaller graphs” instead.
  2. What do you mean by “on the right path?”

Hello Mufei,

You can ignore that question.
I was not sure if having 500 individual graph objects made sense.
I guess the only overhead is I’ll have to load_graph 499 times more.

Thanks very much.

Is it to possible add individual graphs to an existing binary via save_graph? Basically, we could boot-strap an existing binary with new data if need be if need be.

You will need to save all graphs at once, or just save them into different files.

You can save a list of graphs at once, that says save_graphs can take a list of graphs as input.

Hi Mufei,

Is it possible to add my graphs to a dataloader or load_data of dgl.data?
Is there an example or a tutorial to show this?
So instead of calling cora or citeseer, I could load my 500 graphs.

It would be so helpful if I could get my graphs into a dataloader and then call a GCN/GraphSage on them.

By the way, thanks so much for your responses and timely suggestions.

No worries.

You can simply create a PyTorch dataset as follows:

class Dataset():
    def __init__(self):
        self.graphs = [...]

    def __getitem__(id):
        return self.graphs[id]

    def __len__(self):
        return len(self.graphs)