How to Utilize a Trained Link Prediction Model

I want to create a recommendation system using link Prediction with e-commerce data, where users and items are represented as nodes, and interactions like click, cart, like, and buy are represented as edges. The reference I used is available at here

My goal is to train a link prediction model that predicts interactions (i.e., like, buy) between users and items that have not previously occurred.

Following the tutorial mentioned above, I divided my data into a positive graph and a negative graph and applied them to model, obtaining positive scores and negative scores. I then trained the model using hinge loss. While training the model using this approach was straightforward, I’m unsure about how to utilize the trained model effectively.

I have the following questions:

  1. Should I select the top k recommended items based on the Positive Score?
    I believe using positive scores is not aligned with my goal, as the positive graph consists of interactions that have already occurred between users and items. Using positive scores doesn’t seem appropriate. On the other hand, relying on negative scores also doesn’t seem suitable.

  2. Is it possible to use Positive Scores and Negative Scores to predict interactions that haven’t occurred before, which is my goal? If it’s possible, I would appreciate guidance on how to approach this effectively.

Thanks for the help

1 Like

Hi @pimpumpam,

  1. GNN link prediction tasks normally do not support selecting the top-k neighbors for a node.
  2. Yes. This is totally the test cases of GNN link prediction.
  1. Use the pos scores and negative scores in your loss function to optimise the embeddings.
  2. Use the final optimised embeddings to get the nearest neighbours. For instance, you could use ANN libraries like annoy, faiss to index the item and their embedding. Then use the index to fetch the ANNs
  1. Could you please elaborate more on this? I am doing something close to this but is there an example of annoy or faiss scripts? By embedding, you mean embedding for nodes or edges?
    Thank you :slight_smile:

The node embeddings/vectors/features, however you want to term them - ‘h’ is what they term in most of the tutorials.
Once you have the trained embeddings, you could build an Annoy index like this

from annoy import AnnoyIndex
index_s = AnnoyIndex(64, 'dot')

for idx,idx_embedding in enumerate(best_emb_s.squeeze(1)):
    index_s.add_item(idx, idx_embedding)
    
index_s.build(10)

which can then be used to find ANNs given an item

nns, distances = index_s.get_nns_by_item(item_id, n_neighbors+1, search_k=-1, include_distances=True)
1 Like

Thank you very much for the helpful reply. My last questions,

  1. I am using node embeddings not edge embeddings (edge embeddings are the concatenation of two node embeddings). but the problem I have is that there are two different optimized embeddings sets: one from the negative graph optimization and the other from the positive graph. which one should I use?
  2. could this embedding be of different shapes? ([mybatch_size, size of embedding]) or I should add a layer to make it [mybatch_size, 1]?

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.