How to split data in edge multi-label classification

felipemello · December 30, 2020, 12:27am

Hi, I am in the process of splitting my graph into train, val, test for link prediction. My model is a GCN…

Let’s suppose I have 100 edges in a heterograph, and I set that 10 are for validation and 10 for test, which means that my training graph has 80 remaining edges.

How do I train with these 80?

Do I remove, for example, 30 of them, and train on a graph with 50 edges, learning to predict the other 30 (+ negative examples)?
Or do I train with the 80 edges and predict the 80 edges (+ negative examples), but risk teaching the model to memorize, instead of learning?

Thanks in advance!

mufeili · January 4, 2021, 4:39am

Are you working on link prediction or edge multilabel classification?

For link prediction, it’s typically the second case as you described. For edge classification, you can assume that all edges are available during training and you just want to classify them as in node classification.

Have you checked the user guide on edge classification and link prediction?