How to do inference on new unseen nodes?

Hello all, after reviewing the DGL documentation I have a quick question about inference on unseen nodes.

I have a scenario where I plan to train a model on a graph e.g. {a -> [b, c], b -> [c], c -> []}, but at inference time I will need to predict on a new node d (that has incoming edges from b and c, say). The node d simply doesn’t exist at training time (it is in the future), but when node d occurs i have access to the incoming edge structure for d, and the predecessor nodes of d are all available while the graph trains. I.e., the predecessor nodes of d are guaranteed to exist as part of the graph at training time, though node d itself does not exist at that time.

Phrased differently: when node d’s node-level features are known to me in the future, i want to generate a prediction by (explicitly or implicitly) adding node d to the graph along with the edges from b and c to this new node d, and then computing a y_hat for d. At that point the graph or implied graph is:
{a -> [b, c], b -> [c, d], c -> [d], d -> []}
(with new edges from [b and c] to d, and a new node d.)

What I definitely can not afford to do is retrain the model from scratch every time a new node (for which
I need a prediction) arrives.

Is there a cookbook or example anywhere for how this may be accomplished? Or recommendations on the most idiomatic way to approach this? A small code snippet sketching the outline would also be valuable.

Thanks all!

Hi, I think this depends a lot on the particular scenario and graphs. I would suggest trying something simple and fast first, for example training a GraphSAGE on the old graph and then see how it generalizes with d (and related edges) added.

Hello, and thank you for your note. I definitely follow the idea, which is consistent with my expectations about how this would be accomplished. However, I’m honestly not familiar enough with the api to have a great sense of what this approach would look like at the nuts-and-bolts level. Are you aware of any existing code snippet (in the docs or elsewhere) that shows the following sequence:

(1) training a graph, (2) adding a node/edges, (3) making a node level prediction for said new node?

Even a rough (approximate) sketch would be quite valuable!

I found the docs extremely helpful overall; I think that a small section on this topic would be a great addition, as it is a common usecase to train a model with the intent of evaluating future unseen data points.

Thanks again!

Did you check our example on GraphSAGE? It already does (1). Adding nodes/edges should be clear from the documentation. Making predictions on new nodes is same as in training, except that you now have some more labels in the batch corresponding to new nodes.

Thanks for the pointer!

1 Like

Hi @mufeili, I’m still confused on how you perform link prediction inference with new unseen nodes. Since there is no clear tutorial or explanation in the DGL docs, could you please detail a bit with some code snippets?

The code snippet below corresponds to the scenario @mstewart141 described. See if it is clear to you.

import dgl
import torch
import torch.nn as nn
from dgl.nn import GraphConv

# Construct the training graph.
g = dgl.graph(([0, 0, 0, 1], [0, 1, 2, 2])) # 0 for a, 1 for b, 2 for c
# We add the self-loop for the first node so that it preserves its original information after message passing.

# Generate random node features for demonstration purpose.
feat_size = 2
node_feats = torch.randn(g.num_nodes(), feat_size)

# GNN encoder to compute updated node representations
hidden_size = 3
gnn_enc = GraphConv(feat_size, hidden_size)

# Decoder to score pairs of node representations
class LinkPredictor(nn.Module):
    def __init__(self, hidden_size):
        super().__init__()

        self.linear = nn.Linear(hidden_size, 1)

    def forward(self, x_i, x_j):
        x = x_i * x_j
        return self.linear(x)

link_decoder = LinkPredictor(hidden_size)

# ...
# Now assume you have trained gnn_enc and link_predictor.
# Given a new node 3 (d), and new edges (1, 3), (2, 3), first update the graph.
g = dgl.graph(([0, 0, 0, 1, 1, 2], [0, 1, 2, 2, 3, 3]))

# Generate random feature for node 3.
node_feats = torch.cat([node_feats, torch.randn(1, feat_size)], dim=0)

# Compute updated node representations with the new graph.
with torch.no_grad():
    h = gnn_enc(g, node_feats)
    # Score node pairs (0, 3), (3, 3)
    src = torch.LongTensor([0, 3])
    dst = torch.LongTensor([3, 3])
    scores = link_decoder(h[src], h[dst])

Hi, thanks a lot @mufeili for your reply. I still have a couple of things to ask:

  • in your answer you considered the scenario where we have a new node and two new edges, but what about if we have a new disconnected node (with no known edges to draw yet)? This is typically the case when we are in the context of products co-purchase prediction and we have a new product (cold-start).

  • I’m following the link prediction example of the blitz tutorial. I would like to integrate there an equivalent of your code snippet hint. If I understand well, your LinkPredictor is something equivalent to the DotPredictor or MLPPredictor in the link prediction example with the main difference that is not acting on a graph. Here, in the case of a single disconnected new node, should I add it to the train or test graph together with all possible edges to get scores on them? Or should I modify the predictor in a similar way you did to get a link decoder acting on node embeddings rather than on a graph?

in your answer you considered the scenario where we have a new node and two new edges, but what about if we have a new disconnected node (with no known edges to draw yet)? This is typically the case when we are in the context of products co-purchase prediction and we have a new product (cold-start).

If the nodes in your dataset are associated with features carrying rich information, then there is a good chance that your model will work to some extent. It might help to augment the original graph with similarity-based edges during training so that you can connect new nodes with the old graph using similarity-based edges during the test.

  • I’m following the link prediction example of the blitz tutorial. I would like to integrate there an equivalent of your code snippet hint. If I understand well, your LinkPredictor is something equivalent to the DotPredictor or MLPPredictor in the link prediction example with the main difference that is not acting on a graph. Here, in the case of a single disconnected new node, should I add it to the train or test graph together with all possible edges to get scores on them? Or should I modify the predictor in a similar way you did to get a link decoder acting on node embeddings rather than on a graph?

I generally think it’s a better idea to compute the node embeddings first. You can do whatever you want after having them.

1 Like

Hi again :slight_smile: , a technical question

To train this predictor I must set an apply_edges method with a graph local_scope like in the case for MLPPredictor or DotPredictor or am I missing something?

Is there a thorough example that I can follow to catch details? Thanks

Otherwise I came up with a solution of this kind to integrate in the training loop

pos_score = link_decoder(h[train_pos_g.edges()[0]], h[train_pos_g.edges()[1]]).squeeze()
neg_score = link_decoder(h[train_neg_g.edges()[0]], h[train_neg_g.edges()[1]]).squeeze()
loss = compute_loss(pos_score, neg_score)

Does it seem correct to you?

The solution you came up with looks correct to me.

Since I’m in a recommender use-case, I was looking also for computing HitRate@k and MRR@k in this context. I came up with a rather awkward solution:

# Test Features
with torch.no_grad():
    h_test = model(test_pos_g, g.ndata["feat"])

# Compute link prediction for each test node with all test nodes (long!)

with torch.no_grad():
    pred_tensor = torch.swapaxes(pred(h_test[0], h_test), 0, 1)
    for i in range(1, len(h_test)):
        res = torch.cat(
                  (pred_tensor, torch.swapaxes(pred(h_test[i], h_test), 0, 1))
              )
        pred_tensor = res

from torchmetrics import RetrievalHitRate, RetrievalMRR
hr5 = RetrievalHitRate(k=5)
hr5(pred_tensor, target, indexes=indexes)

where target is the indicator matrix of the ground truth positive edges in the test graph and indexes the query indicator matrix.

I wonder if there is a more optimal solution for the calculation of these metrics in this case (the computation of link prediction score for all test nodes with all test nodes takes quite a long time!), where pred is the LinkPredictor that you proposed @mufeili. Do you think that I should restrict the computation of link prediction only to a subset of the test nodes or should I modify the predictor?

Will it still be valid for the metric computation if you restrict to a subset of the test nodes? Also, will it help to do some mini-batch computation?

In my opinion no, but I read some other threads like this one and I was wondering if that made sense to restrict to N+M positive and negative nodes respectively.

I don’t know, I though mini-batch was mainly used for training on large graphs. Could that improve the speed computation of those metrics?

If I understand correctly, the implementation below iterates over all nodes, one at a time.

with torch.no_grad():
    pred_tensor = torch.swapaxes(pred(h_test[0], h_test), 0, 1)
    for i in range(1, len(h_test)):
        res = torch.cat(
                  (pred_tensor, torch.swapaxes(pred(h_test[i], h_test), 0, 1))
              )
        pred_tensor = res

This can be quite inefficient. Is it possible to derive a batched implementation for it?

Yes I know this can be extremely inefficient but I didn’t figure out yet a solution with a batched implementation, do you have any clue?

See if this gives you some hint.

import torch

a = torch.tensor([[0., 1.], [2., 3.]])
# The line below gives the result of element-wise multiplication of all pairs of rows
a * a.unsqueeze(1)