Running link prediction on disconnected nodes using `EdgeDataLoader`

I am training a link prediction model using the tutorial in the docs based on the EdgeDataLoader which is working well. As in the tutorial, the model is trained with a negative sampler.

However, now I want to apply the model to find new connections between specific nodes. This means passing pairs of nodes (u,v) \notin E to the model which are not connected in the original graph. As far as I can tell, the EdgeDataLoader can’t handle this since it selects from existing edges (or edges in a negative graph).

Does it make sense to instead use a loader which iterates on pairs of nodes or is there something I am missing?

I suppose I could build a negative graph containing (u,v) for score computation, and select edges incident to u and v to send to the EdgeDataLoader to do the message passing. However, this seems sloppy.

Thanks for the help!

There are two ways to address this.

First, EdgeDataLoader has a g_sampling option. If you give that EdgeDataLoader will only sample neighbors from g_sampling instead of g. So you can build a test graph that contains the test edges for EdgeDataLoader to iterate over, and put the training graph in g_sampling to sample neighbors.

Another way is to first train the model with the normal EdgeDataLoader (i.e. without g_sampling), and then compute the representations of all the nodes before evaluation (either using NodeDataLoader or using exact offline inference like in the GraphSAGE example). Those node representations are then used for computing the link prediction scores for all the test edges.

1 Like