How to evaluate link prediction?

vino5211 · August 19, 2021, 2:24pm

how many evalution method can we use for link prediction?
Is the evaluation method relevent to the loss function? for example, when use binary cross entropy loss, the evaluate method is auc, and use margin loss ,the evaluate method should be some thing else?
In my practice, when I change my loss function from cross entropy to margin loss, the auc score is greater than 0.99, much higher than using cross entropy loss

margin loss code:

auc score code:

BarclayII · August 20, 2021, 3:22am

Generally the evaluation metric should depend on your task and dataset, but not the loss function. Common choices include AUC for unbalanced data, AUPRC for extremely unbalanced data, MRR/HITS/MAP for recommender systems and information retrieval, etc. These links might be helpful:

vino5211 · August 20, 2021, 3:36am

thank you very much for your reply.
I calculate the accuracy as you answered in another question:
torch.mean((torch.cat(pos_scores)>0.5).float())
when using margin loss the posscore tend to greater than 0.5, so the accuracy always close to 1, is this a problem?

BarclayII · August 20, 2021, 4:14am

(1) For link prediction, you will need to be very careful when using accuracy as an evaluation metric, because that heavily depends on how you choose the negative test examples. For instance, if you have zero negative test examples, then even if the accuracy is 1 it doesn’t say anything about your model, because the model can simply predict a positive number no matter what.
(2) 0.5 might not be the best threshold value, and changing that value might give you different accuracy/precision/recall/F1 score number. AUC, AUPRC and ranking-based metrics does not have this problem.

vino5211 · August 20, 2021, 8:32am

thanks a lot!
I build a large hetero graph for link prediction, contains two node types : author and research interests; four relation types (author coauthor author) ( author coauthor_rev author)–reverse type since dgl is direct. (author hasinterest interest) (interest researchedby author).
I conduct link prediction training only on edge type (author coauthor author) , I split the (author coauthor author)edge ids in training graph to training set and validation set. should I remove validation edges and their reverse edges from the training graph?
while doing validation, should I use the original graph( which contains all training and validation edges ) for g_sampling or the graph after I remove validation edges and their reverse edges ?
thanks in advance!

I constructed another test graph, which cantains coauthorship in the next year of the training graph,for example training graph is constructed for coauthorship in year 2019, and test graph cantains coauthorship in 2020, train graph and test graph contain exact same set of authors.

and the training graph is splited into train and valid graph. first I chose 10% coauthor edge, use edge_subgraph to extract the subgraph for validation, then I remove the valid edge ids, so valid edge ids are invisible during training.

my problem is both test edges and valid edges are invisible during training, but the AUC value on valid edges is 0.9646,and 0.6468 on test edges.

vino5211 · August 21, 2021, 8:47am

hi @BarclayII , can you help me solve this problem. thanks!

BarclayII · August 23, 2021, 3:49am

Yes, you should.

For link prediction you should still remove the validation edges for g_sampling because we usually assume that we don’t know if the edges exist.

This may also come from the distribution shift from year 2019 to year 2020. If the data generating distribution between the two years are quite different (which is possible - OGB products is a typical example) then you will see quite different numbers between training/validation and test. If you would like to see if it’s a problem in training/testing pipeline, you could do a uniform split instead of temporal split like you did now and see if the two numbers still differ a lot.

system · September 22, 2021, 3:50am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.