Hi everyone,
I have succesfully trained a R-GCN in link prediction tasks following this tutorial: 5.3 Link Prediction — DGL 0.6.1 documentation. I trained my model on 300 graphs. Graphs and nodes embeddings are generated from Natural Language texts from a corpus. Training edges are human annotated. Graphs look like this:
Graph(num_nodes={‘ent’: 84},
num_edges={(‘ent’, ‘link1’, ‘ent’): 67, (‘ent’, ‘link2’, ‘ent’): 289, (‘ent’, ‘link3’, ‘ent’): 62}
As in the guidelines, Score Predictor code:
class HeteroDotProductPredictor(nn.Module):
def forward(self, graph, h, etype):
with graph.local_scope():
graph.ndata['h'] = h['ent'] # h = node representations
graph.apply_edges(fn.u_dot_v('h', 'h', 'score'), etype=etype)
return graph.edges[etype].data['score']
Model code:
class Model(nn.Module):
def __init__(self, in_features, hidden_features, out_features, rel_names):
super().__init__()
self.sage = RGCN(in_features, hidden_features, out_features, rel_names)
self.pred = HeteroDotProductPredictor()
def forward(self, g, neg_g, x, etype):
h = self.sage(g, x)
return self.pred(g, h, etype), self.pred(neg_g, h, etype)
My problem now is that I need to predict links in new graphs not seen during training. This new graphs are also generated from a text and contain embeddings extracted from them, you know. This new graphs are obviously not annotated with the links I need to predict (link1 and link3). A new unseen graph would look like this:
Graphs: Graph(num_nodes={‘ent’: 122},
num_edges={(‘ent’, ‘link1’, ‘ent’): 0, (‘ent’, ‘link2’, ‘ent’): 486, (‘ent’, ‘link3’, ‘ent’): 0})
Now, if I need to predict links in this new graph which doesnt have positive examples, what should I do? I cannot use the same method as during training, since this graphs doesn’t contain positive examples.
I guess I need to add something to my code/model, but Im not sure what that is. My guess is I need to add an “inference function” that allows me to score every pair of nodes of the graph and then take those with a score over some threshold as 'the ones that should be connected by an edge. My problem is I dont know how to use the trained model for this purpose. Should I change my score predictor? No, right? Should I add an “inference function” in my model (maybe inside my “model” class?), that applies the score predictor to every pair of nodes so I cant take those over a thershold? How could I do this?
The output Im looking for is the classic list of src and dst nodes for a given edge, something like: tensor([1,2,3,…]), tensor([2,3,1,…])
I really need to figure this out since it is the last step of my project, so any help would be really appreaciated.
Thank you all so much.