Inference on new nodes using Link Prediction

kramer · April 19, 2023, 8:21am

Hi, I am training a GraphSAGE model using this tutorial WWW20-Hands-on-Tutorial/3_link_predict.ipynb at master · dglai/WWW20-Hands-on-Tutorial · GitHub. However, I don’t understand how inference on new nodes would work. Those nodes that are not included in the test or train set. Should the new nodes already be included in the graph ‘g’ before making inference? Can someone help.
The model predicts on the the test set in the following way:

    logits = net(g, inputs)
    pred = torch.sigmoid((logits[train_u] * logits[train_v]).sum(dim=1))

How can I use this on my new nodes?
Also,

node_embed = nn.Embedding(g.number_of_nodes(), 5)  # Every node has an embedding of size 5.
inputs = node_embed.weight                         # Use the embedding weight as the node features.
nn.init.xavier_uniform_(inputs)

Does the inputs value include information from the node features, in this case ‘club’ and ‘club_onehot’?

mufeili · April 20, 2023, 3:35am

Do these new nodes have known edges?

kramer · April 20, 2023, 9:21am

No, not as such. I have a homogenous graph where you have to predict the successor of each node, so essentially predict the edge.

mufeili · April 21, 2023, 3:28am

Then the benefit of GNNs will be rather limited as you can not leverage any structures. One possibility is to manually construct similarity-based edges both during training and inference.

kramer · April 25, 2023, 8:37am

Oh, okay. I will check it out, Thanks. Although my idea was more like using features to calculate the node representation and this could be used for new nodes.

Can you clarify for me if the inputs variable in the tutorial gets some information from the features implicitly? Because I can’t find any explicit use of features anywhere in it.

node_embed = nn.Embedding(g.number_of_nodes(), 5)  # Every node has an embedding of size 5.
inputs = node_embed.weight                         # Use the embedding weight as the node features.
nn.init.xavier_uniform_(inputs)

If I use the features like here, how can I include features that are tensors(or arrays).
Can the csv look like this with the feat column where second feature is [1,2,3]
"0.5477868606453535, [1,2,3], 0.936706701616337".
For example, i want to use a one hot encoded feature, how can I do that?

mufeili · April 26, 2023, 2:59am

Can you clarify for me if the inputs variable in the tutorial gets some information from the features implicitly? Because I can’t find any explicit use of features anywhere in it.

From the code, it seems that this graph does not have input node features. Instead, the model learns node embeddings from scratch.

If I use the features like here, how can I include features that are tensors(or arrays).
Can the csv look like this with the feat column where second feature is [1,2,3]
"0.5477868606453535, [1,2,3], 0.936706701616337".
For example, i want to use a one hot encoded feature, how can I do that?

What do you mean by including features? You don’t necessarily need to use this data format. You can have some PyTorch tensors on disk and load them into memory.

kramer · April 26, 2023, 8:41am

From the code, it seems that this graph does not have input node features. Instead, the model learns node embeddings from scratch.

Yes, That’s what i thought. I was confused because they had used one hot encoding for a column.

What do you mean by including features?

I have some columns in a csv that I would like to use as features for the nodes.

You don’t necessarily need to use this data format. You can have some PyTorch tensors on disk and load them into memory.

Oh, okay. That would mean I will have to change the model as well. Can you share a code for that?

The other tutorial that uses the features of nodes, they use a single ndata['feat'] with shape (no.of nodes, no.features), and i am trying to do something similar for myself as well.

mufeili · April 27, 2023, 2:55am

This is equivalent to learning an embedding table as the first layer is equivalent to slicing rows of the first learnable weight matrix.

Oh, okay. That would mean I will have to change the model as well. Can you share a code for that?

The other tutorial that uses the features of nodes, they use a single ndata['feat'] with shape (no.of nodes, no.features), and i am trying to do something similar for myself as well.

Do you mean you want to both use node features and learn node embeddings from scratch? You can simply use an MLP to project the input node features and concatenate the results with the node embeddings.

system · May 27, 2023, 2:55am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.