Link prediction on multiple graphs (minimal working example)

I am trying to build a basic link prediction model on a dataset that contains multiple graphs (e.g. molecules). Since EdgeDataLoader iterates over a single graph, what is the recommended way to proceed? I have the feeling that doing a collate on the whole dataset before passing it to EdgeDataLoader is a bad idea memory-wise, and I don’t think EdgeDataLoader collates internally when given multiple graphs.

A minimal working example would be much appreciated.


You can consider create the EdgeDataLoader for each molecule and iterate over those EdgeDataloader also.

How about use a data loader for getting batched graphs over iterations first and then create an EdgeDataLoader for a batched graph in each iteration?

Thanks for the help, here is a MWE based on the answers:

import dgl

dataset =

g_loader = dgl.dataloading.pytorch.GraphDataLoader(dataset,

for batch,_ in g_loader:
    sampler = dgl.dataloading.MultiLayerNeighborSampler([15, 10, 5])
    e_loader = dgl.dataloading.pytorch.EdgeDataLoader(batch, range(batch.number_of_edges()), sampler)

Also, it might be possible to batch all graphs if your dataset is not very large.

