As GraphSAGE is able to perform prediction on new graph, is it able to perform PinSAGE prediction on new nodes?
PinSAGE is also an inductive model like GraphSAGE. However, my implementation made it transductive by assigning each node a learnable embedding vector.
That being said, we currently do not have examples on either GraphSAGE or PinSAGE (and for other models) on inductive learning scenario specifically. To implement inductive learning you need to train on a subgraph that only contains the training nodes, and also remove the node-specific learnable embedding vectors.
Can you offer some tips to do it in an inductive way? And one concern is that is it able to perform inductive learning if I don’t have node features?
To do it in an inductive way, say that you have
test_nodes for training, validation, and test set. You can create a subgraph for training with something like:
train_g = g.subgraph(torch.cat([train_nodes, val_nodes]))
And in the training loop, instead of running GraphSAGE with
g, you run it with
You keep the graph to be
g in evaluation.
For you second question. If you don’t have node features at all, doing inductive learning may not be trivial. Some works like IGMC works around it by handcrafting node features solely from graph topology relative to the training node itself. In fact, the cold-start problem in recommender systems is closely related (although not exactly the same), and I would recommend borrowing ideas from papers about cold-start as well.
Thank you very much. If I succeed, I will update a link on this post for the complete code.
Many thanks for your effort and we welcome your contribution! And please feel free to follow-up whenever you encounter any obstacle!
And another concerned problem is that the added nodes (new nodes) should put an effect on the original nodes. How can I guarantee the prediction results of the original nodes (in training set) to remain unchanged after adding new nodes?
I think it should be OK for the prediction results to change under certain settings - when new information comes in with new nodes, it is natural to update the existing nodes’ information as well. However, if keeping the prediction of training set nodes intact is absolutely a concern, then you can probably choose to only predict the nodes in the test set, without evaluating on the training set.
Hope this answers your question.