Link prediction on multiple graphs (minimal working example)

I am trying to build a basic link prediction model on a dataset that contains multiple graphs (e.g. molecules). Since EdgeDataLoader iterates over a single graph, what is the recommended way to proceed? I have the feeling that doing a collate on the whole dataset before passing it to EdgeDataLoader is a bad idea memory-wise, and I don’t think EdgeDataLoader collates internally when given multiple graphs.

A minimal working example would be much appreciated.

Thanks!

You can consider create the EdgeDataLoader for each molecule and iterate over those EdgeDataloader also.

How about use a data loader for getting batched graphs over iterations first and then create an EdgeDataLoader for a batched graph in each iteration?

Thanks for the help, here is a MWE based on the answers:

import dgl

dataset = dgl.data.QM7bDataset()

g_loader = dgl.dataloading.pytorch.GraphDataLoader(dataset,
                                                   batch_size=3
                                                   )

for batch,_ in g_loader:
    sampler = dgl.dataloading.MultiLayerNeighborSampler([15, 10, 5])
    e_loader = dgl.dataloading.pytorch.EdgeDataLoader(batch, range(batch.number_of_edges()), sampler)

Also, it might be possible to batch all graphs if your dataset is not very large.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.