Link prediction on multiple graphs (minimal working example)

ozeki · August 24, 2021, 10:55pm

I am trying to build a basic link prediction model on a dataset that contains multiple graphs (e.g. molecules). Since EdgeDataLoader iterates over a single graph, what is the recommended way to proceed? I have the feeling that doing a collate on the whole dataset before passing it to EdgeDataLoader is a bad idea memory-wise, and I don’t think EdgeDataLoader collates internally when given multiple graphs.

A minimal working example would be much appreciated.

Thanks!

VoVAllen · August 25, 2021, 4:17am

You can consider create the EdgeDataLoader for each molecule and iterate over those EdgeDataloader also.

mufeili · August 25, 2021, 6:04am

How about use a data loader for getting batched graphs over iterations first and then create an EdgeDataLoader for a batched graph in each iteration?

ozeki · August 25, 2021, 4:38pm

Thanks for the help, here is a MWE based on the answers:

import dgl

dataset = dgl.data.QM7bDataset()

g_loader = dgl.dataloading.pytorch.GraphDataLoader(dataset,
                                                   batch_size=3
                                                   )

for batch,_ in g_loader:
    sampler = dgl.dataloading.MultiLayerNeighborSampler([15, 10, 5])
    e_loader = dgl.dataloading.pytorch.EdgeDataLoader(batch, range(batch.number_of_edges()), sampler)

mufeili · August 30, 2021, 7:26am

Also, it might be possible to batch all graphs if your dataset is not very large.

system · September 29, 2021, 7:27am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.