Hi everyone!
I’m trying to train SAGE in unsupervised setting. I took the code from this example and changes the loss function. In the example they use binary_cross_entropy_with_logits
, but I personally feel more comfortable using cross_entropy
, so I used this loss:
class CrossEntropyLoss2(nn.Module):
def __init__(self, in_features, out_classes):
super().__init__()
self.W = nn.Linear(in_features * 2, out_classes)
def apply_edges(self, edges):
h_u = edges.src['h']
h_v = edges.dst['h']
score = self.W(th.cat([h_u, h_v], 1))
return {'score': score}
def forward(self, block_outputs, pos_graph, neg_graph):
with pos_graph.local_scope():
pos_graph.ndata['h'] = block_outputs
pos_graph.apply_edges(self.apply_edges)
pos_score = pos_graph.edata['score']
with neg_graph.local_scope():
neg_graph.ndata['h'] = block_outputs
neg_graph.apply_edges(self.apply_edges)
neg_score = neg_graph.edata['score']
score = th.cat([pos_score, neg_score], dim=0)
label = th.cat([th.ones(len(pos_score)), th.zeros(len(neg_score))]).long()
loss = F.cross_entropy(score, label)
return loss
I trained the model on the Cora dataset. I want to know what nodes the model predicts to be connected for the node I select, so I write the function which takes one source
node and computes score between this node and all the other nodes:
def get_top2(g, embeddings, W, source, top_k):
if source is None:
source = np.random.choice(g.num_nodes())
print(source)
u = th.full((g.num_nodes(), ), source)
v = th.arange(g.num_nodes())
h_u = embeddings[u]
h_v = embeddings[v]
edge_probas = W(th.cat([h_u, h_v], dim=1)).softmax(dim=1)
class1_probas = edge_probas[:, 1]
h_p, mp_n = class1_probas.topk(top_k, largest=True)
return h_p, mp_n
The result is for every node I choose the model predicts the same nodes with highest values. The values change from node to node, but the order keeps the same .
emb = model.inference(g, nfeat, device, 10000, 1)
get_top2(g, emb, loss_fcn.W, None, 5)
>>> (tensor([0.4923, 0.4912, 0.4902, 0.4896, 0.4894], grad_fn=<TopkBackward>), # this values can change
>>> tensor([ 863, 604, 446, 200, 1535])) # the values and order here is always the same
I got confused and started thinking maybe this is how the model works, but then I found out if I use the loss from the example (the one with binary_cross_entropy_with_logits
) the model gives me different predictions for different nodes.
Any ideas why there two losses work in such a different way?
Thanks everyone.
PS: I removed some lines from the original code, but I believe the problem is not here. Here is the full example