Should we make sure bidirectional edges end up in same set in Link prediction?

For models, where the input graphs are made undirected by adding bidirectional edges, should we keep both of those edges in the same set such as either train or val or test?
or is it normal to keep one edge in one set and other in another set during splits?
I mean is it considered a cheating since if the two nodes with an edge is used in training and then the other edge (with reverse direction) is then used in testing, as this will inevitably increase the accuracy if the model is learning to embed those nodes closer in the embedding space right?

Typically the train/val/test set will only contain edges of one direction while the graph used for message passing will contain both.

Thank you.
Is there a way to properly split all the eids of bi-directed graph into two such mutually exclusive sets based on direction?

And is there also a way to get eid of the reverse edge of an eid?

You could try which will add proper split and other information to adapt a dataset for link prediction. However, it won’t exclude reverse edges for you. To get the eid of a reversed edge, use DGLGraph.edge_ids. It supports getting edge ID of one node pair or a pair of node ID tensors.

(I found that when providing a pair of node ID tensors while some node IDs do not form a valid edge, the error message is confusing. I will create a issue and improve it later.)

1 Like

what I mean is get this split to use for reverse_eids in as_edge_prediction_sampler, I guess one should iterate over all edges in numpy and map the pairs iteratively?

EDIT: got the idea from graphsage example

1 Like