Unsupervised GraphSAGE loss not being minimised

I am trying to implement the loss function found in the GraphSAGE paper (equation 1). Here is my code I am using:

def loss_fn(graph, embeddings, length, node_ids, size_neg_graph):
    # get positive scores first
    pos_ids = dgl.sampling.random_walk(graph, node_ids, length=length)[0][:, length - 1] # gets the node ids of nodes that occur on RW 
    pos_score = torch.sum(embeddings * embeddings[pos_ids], dim=1) # performs inner product
    pos_score = torch.nn.functional.logsigmoid(pos_score) # log sigmoid

    # get negative scores
    neg_graph = construct_negative_graph(graph, size_neg_graph) # constructs a negative graph by corrupting destination nodes by sampling randomly from all nodes in graph
    neg_score = dp(neg_graph, embeddings) # dot product function to compute dot prod along all edges of the embeddings
    neg_score = torch.nn.functional.logsigmoid(-neg_score) 

    # return loss
    loss = -torch.sum(pos_score) - torch.sum(neg_score)
    return loss / len(node_ids)

When I run this, the loss never minimises. It always gets stuck at a relatively high number. Is there something wrong with my code?

The DP function and Negative graph sampling function are defined as follows:

class DotProductPredictor(nn.Module):
    def forward(self, graph, h):
        # h contains the node representations computed from the GNN
        with graph.local_scope():
            graph.ndata['h'] = h
            graph.apply_edges(fn.u_dot_v('h', 'h', 'score'))
            return graph.edata['score']

def construct_negative_graph(graph, k):
    src, dst = graph.edges()

    neg_src = src.repeat_interleave(k)
    neg_dst = torch.randint(0, graph.number_of_nodes(), (len(src) * k,))
    return dgl.graph((neg_src, neg_dst), num_nodes=graph.number_of_nodes())
2 Likes

Hi, the code looks correct to me. For the loss issue, it is usually very case-specific so I can only provide some high-level suggestions:

  • Try tune the hyperparameters (e.g., learning rate, batch size, walk length in your case). If you set walk length to one, it should become a normal loss function defined on edges.
  • Pay attention to the parameter initializer. Sometimes a different initializer will work much better.
  • Try print out the gradient of each layer to check whether there are any gradient explosion problem.
  • Try debug with simple models first. For example, if the encoder is only an MLP rather than a GNN, does the problem still exist?

You could search for similar questions on the forum to see whether they give you the answer.


1 Like

Thanks! I’m glad I was able to get the message passing right.

I figured my loss might not actually be that bad. As the loss function is essentially Q * the expectation of the negative score, my loss always hang around a number lower than Q, so its actually not too bad. For instance, I was training with 10 negative samples and my loss hangs around 6-7, which isn’t too bad at all as it means that the expected value of the negative scores is less than 1 (because the positive scores end up contribution 0 to the loss).

Hi @davidireland3 and @minjie : I am trying to compute embeddings for my graph in an unsupervised approach. I am trying to use the loss function here. I have a concern for computing the DorProductPredictor(). Unlike for positive score, why do we do the dot product with itself because as in equation it should be between z_u and z_v. Is it specific to the edge prediction problem or this loss function above can be used for extracting node-level embeddings. I have a homogenous graph.

If you are referring to u_dot_v('h', 'h', 'score'), then it is actually doing what you are suggesting. The API name u_dot_v means it will get the first feature from source nodes (‘u’) and the second feature from destination nodes (‘v’). Since they are both ‘h’, it conducts exactly h_u dot-product h_v.