SAGE model predicts the same nodes to be connected when using CrossEntropyLoss

Hi everyone!

I’m trying to train SAGE in unsupervised setting. I took the code from this example and changes the loss function. In the example they use binary_cross_entropy_with_logits, but I personally feel more comfortable using cross_entropy, so I used this loss:

class CrossEntropyLoss2(nn.Module):
    def __init__(self, in_features, out_classes):
        self.W = nn.Linear(in_features * 2, out_classes)
    def apply_edges(self, edges):
        h_u = edges.src['h']
        h_v = edges.dst['h']
        score = self.W([h_u, h_v], 1))
        return {'score': score}
    def forward(self, block_outputs, pos_graph, neg_graph):
        with pos_graph.local_scope():
            pos_graph.ndata['h'] = block_outputs
            pos_score = pos_graph.edata['score']
        with neg_graph.local_scope():
            neg_graph.ndata['h'] = block_outputs
            neg_score = neg_graph.edata['score']

        score =[pos_score, neg_score], dim=0)
        label =[th.ones(len(pos_score)), th.zeros(len(neg_score))]).long()
        loss = F.cross_entropy(score, label)
        return loss

I trained the model on the Cora dataset. I want to know what nodes the model predicts to be connected for the node I select, so I write the function which takes one source node and computes score between this node and all the other nodes:

def get_top2(g, embeddings, W, source, top_k):
    if source is None:
        source = np.random.choice(g.num_nodes())
    u = th.full((g.num_nodes(), ), source)
    v = th.arange(g.num_nodes())
    h_u = embeddings[u]
    h_v = embeddings[v]
    edge_probas = W([h_u, h_v], dim=1)).softmax(dim=1)
    class1_probas = edge_probas[:, 1]
    h_p, mp_n = class1_probas.topk(top_k, largest=True)
    return h_p, mp_n

The result is for every node I choose the model predicts the same nodes with highest values. The values change from node to node, but the order keeps the same .

emb = model.inference(g, nfeat, device, 10000, 1)
get_top2(g, emb, loss_fcn.W, None, 5)
>>> (tensor([0.4923, 0.4912, 0.4902, 0.4896, 0.4894], grad_fn=<TopkBackward>), # this values can change
>>> tensor([ 863,  604,  446,  200, 1535])) # the values and order here is always the same

I got confused and started thinking maybe this is how the model works, but then I found out if I use the loss from the example (the one with binary_cross_entropy_with_logits) the model gives me different predictions for different nodes.

Any ideas why there two losses work in such a different way?

Thanks everyone.

PS: I removed some lines from the original code, but I believe the problem is not here. Here is the full example


  1. did you check the loss values when calling loss = F.cross_entropy(score, label)? any NaN values?
  2. why do you prefer to use cross_entropy than binary_cross_entropy_with_logits?
  3. cross_entropy applies LogSoftmax internally. And you called W().softmax() then. why do you call the latter one?

Hi, in this case, cross_entropy cannot be used. score is N*1, while cross_entropy requires the input to be N*C (C is 2 in this case).