Negative edges labels for link prediction

Hello everyone! Now, I’m started to study GNN’s and DGL.
I’m developing a recommender system with link prediction in a bipartite graph.

I’ve learned and replicated this tutorial: KDD20-Hands-on-Tutorial/3_link_predict.ipynb at master · dglai/KDD20-Hands-on-Tutorial · GitHub

But, in this section, I had some doubts:

The author created the train and test labels with:

  • “0”, for positive edges
  • “1”, for negative edges.

Wouldn’t It be the inverse? Must positive edges be labeled with “1” for indicate that edge exists?

In the “Blitz Introduction to DGL” (Link Prediction using Graph Neural Networks — DGL 0.9.0 documentation), I think that the author did the inverse (1 for positives, and 0 for negatives):

I wanna understand what way is correct, please.

Another thing that I have doubt, is because the Blitz tutorial have some differents ways to implement the model.

For example: To compute the score of each edge, by dot product, Blitz tutorial creates the DotPredictor class, and in the KDD20-Hands-on-Tutorial, the author only mutiplies the vectors pair.

I think that the “Blitz Tutorial” is more recommended, because It’s more recent (KDD20 was presented in 2020 year). I’m correct?

  1. labeling difference. I think both are ok if train_label and test_label are labeled in same way.
  2. score generation. Actually it depends on your choice. you’re free to define your own function to compute the score/loss such as DotPredictor, MLPPredictor. And the computation logic in KDD: (logits[train_u] * logits[train_v]).sum(dim=1) is same as u dot v, I think.
1 Like

Hello Rhett, thank you for your reply!

1. About labeling difference
Ok, I understood. But, I made some tests (once with a small dataset), using the labeling with ones for positives, and zeros for negatives. By these tests, looks like that this labeling setup delivered more appropriated results. But, I could be wrong.

I thought that this labeling would be more correct because the “ones” is a more significant value then “zeros”, and It can impact the neural network traning.

2. About score generation
Ok, thank you for your explanation. First, I’ll use the method shown by KDD tutorial, because, currently, It’s more clearer to me about how I’ll use the logits matrix when predicting links in the production enviroment, after training the model.

Thank you for your time, Rhett!.
Currently, I’m learning from scratch about graph neural networks and about DGL, to develop this recommender system. Probably, I’ll have more doubts in the next days.