Own dataset and heterograph

balakkvj · July 8, 2020, 7:11am

So, I could successfully convert my dataset into individual DGL graphs.
If I want to use heterograph, (https://docs.dgl.ai/api/python/batch_heterograph.html), the documentation says combine small graphs into one large one for message passing etc.

That raises two questions:

How large can this “small graphs” be? For example, a few have 5 nodes, others have 10,000 nodes.
Am I on the right path? I basically want to test different GNN architectures on my DGL graphs and measure their performance.

Thanks so much in advance.

mufeili · July 8, 2020, 1:59pm

In theory the “small graphs” can be arbitrary large, so we should probably call them “smaller graphs” instead.
What do you mean by “on the right path?”

balakkvj · July 8, 2020, 2:37pm

Hello Mufei,

You can ignore that question.
I was not sure if having 500 individual graph objects made sense.
I guess the only overhead is I’ll have to load_graph 499 times more.

Thanks very much.

balakkvj · July 8, 2020, 2:41pm

Is it to possible add individual graphs to an existing binary via save_graph? Basically, we could boot-strap an existing binary with new data if need be if need be.

mufeili · July 9, 2020, 3:58am

You will need to save all graphs at once, or just save them into different files.

mufeili · July 9, 2020, 3:58am

You can save a list of graphs at once, that says save_graphs can take a list of graphs as input.

balakkvj · July 9, 2020, 4:29am

Hi Mufei,

Is it possible to add my graphs to a dataloader or load_data of dgl.data?
Is there an example or a tutorial to show this?
So instead of calling cora or citeseer, I could load my 500 graphs.

It would be so helpful if I could get my graphs into a dataloader and then call a GCN/GraphSage on them.

By the way, thanks so much for your responses and timely suggestions.

mufeili · July 9, 2020, 4:22pm

No worries.

You can simply create a PyTorch dataset as follows:

class Dataset():
    def __init__(self):
        self.graphs = [...]

    def __getitem__(id):
        return self.graphs[id]

    def __len__(self):
        return len(self.graphs)