How to do inference on new unseen nodes?

Hello all, after reviewing the DGL documentation I have a quick question about inference on unseen nodes.

I have a scenario where I plan to train a model on a graph e.g. {a -> [b, c], b -> [c], c -> []}, but at inference time I will need to predict on a new node d (that has incoming edges from b and c, say). The node d simply doesn’t exist at training time (it is in the future), but when node d occurs i have access to the incoming edge structure for d, and the predecessor nodes of d are all available while the graph trains. I.e., the predecessor nodes of d are guaranteed to exist as part of the graph at training time, though node d itself does not exist at that time.

Phrased differently: when node d’s node-level features are known to me in the future, i want to generate a prediction by (explicitly or implicitly) adding node d to the graph along with the edges from b and c to this new node d, and then computing a y_hat for d. At that point the graph or implied graph is:
{a -> [b, c], b -> [c, d], c -> [d], d -> []}
(with new edges from [b and c] to d, and a new node d.)

What I definitely can not afford to do is retrain the model from scratch every time a new node (for which
I need a prediction) arrives.

Is there a cookbook or example anywhere for how this may be accomplished? Or recommendations on the most idiomatic way to approach this? A small code snippet sketching the outline would also be valuable.

Thanks all!

Hi, I think this depends a lot on the particular scenario and graphs. I would suggest trying something simple and fast first, for example training a GraphSAGE on the old graph and then see how it generalizes with d (and related edges) added.

Hello, and thank you for your note. I definitely follow the idea, which is consistent with my expectations about how this would be accomplished. However, I’m honestly not familiar enough with the api to have a great sense of what this approach would look like at the nuts-and-bolts level. Are you aware of any existing code snippet (in the docs or elsewhere) that shows the following sequence:

(1) training a graph, (2) adding a node/edges, (3) making a node level prediction for said new node?

Even a rough (approximate) sketch would be quite valuable!

I found the docs extremely helpful overall; I think that a small section on this topic would be a great addition, as it is a common usecase to train a model with the intent of evaluating future unseen data points.

Thanks again!

Did you check our example on GraphSAGE? It already does (1). Adding nodes/edges should be clear from the documentation. Making predictions on new nodes is same as in training, except that you now have some more labels in the batch corresponding to new nodes.

Thanks for the pointer!