I was finally able to put everything together and everything works perfect, however the predictions are not really accurate compared to the annotated ones used during training.
For link3, which is symmetrics, I got the following after 5 epochs:
Train Loss: 7.385879099369049
Eval AUC: 0.8863544534424268
Then, as I told you, I take a new annotated graph with, for example, this shape:
Graph(num_nodes={'ent': 70},
num_edges={('ent', 'link1', 'ent'): 74, ('ent', 'link2', 'ent'): 310, ('ent', 'link3', 'ent'): 40})
I save the original links in a list and then clean them in the graph, so i get the following:
link3 = [[0, 61], [38, 63], [25, 40], [41, 42], [42, 43], [44, 64], [44, 45], [50, 66],...]
and
Graph(num_nodes={'ent': 70},
num_edges={('ent', 'link1', 'ent'): 0, ('ent', 'link2', 'ent'): 310, ('ent', 'link3', 'ent'): 0})
Then I do what you told me, I succesfully update the node embeddings with:
feats = grafo.ndata['Feats']
updated_feats = model.sage(grafo, {'ent': feats})
and generate the edge specific embeddings, which in fact are different for every type of link. I store them in a tuple with their index:
nodos_embeddings = []
embeddings = model.pred.etype_project[('ent', 'link3', 'ent')](updated_feats)
for i in range(len(embeddings)):
nodos_embeddings.append((i, embeddings[i]))
This is when it gets weird. I calculate the similarity among all pairs of embeddings
scores = []
for x in nodos_embeddings:
for y in nodos_embeddings:
if x[0] != y[0]:
score = torch.dot(x[1], y[1])
scores.append([x[0], y[0], score])
and take those over a thershold:
thershold = 4
final = []
for tupla in scores:
if tupla[2] > thershold:
final.append(tupla)
To check the quality of the predictions I compare them with the list of original annotated links I saved at the begining (I order them so the first element of the tuple is always smaller than the second one, in order to avoid confussion).
I have tried with different thresholds, but when comparing this final pairs with higher similarity and those originally annotated pairs the precision and recall are always low.
In theory my model predictions were good, since the AUC for this link where around 90%, meaning that positive edges had a higher similarity score, but now it looks like those connected edges end up dissimilar.
For example, if I use 4.0 as threshold:
Total tagged: 46
Total predicted: 10
Intersection: 2
Precision: 0.2
Recall: 0.04
F-score: 0.07
If I lower the theshold recall gets obviously better but precission doesn’t improve, so I guess real pairs are completely disperse over the similarity range.
I must be doing something wrong, since those pairs embeddings were for sure similar during training.