Unsupervised GraphSAGE loss not being minimised

davidireland3 · October 18, 2020, 10:52pm

I am trying to implement the loss function found in the GraphSAGE paper (equation 1). Here is my code I am using:

def loss_fn(graph, embeddings, length, node_ids, size_neg_graph):
    # get positive scores first
    pos_ids = dgl.sampling.random_walk(graph, node_ids, length=length)[0][:, length - 1] # gets the node ids of nodes that occur on RW 
    pos_score = torch.sum(embeddings * embeddings[pos_ids], dim=1) # performs inner product
    pos_score = torch.nn.functional.logsigmoid(pos_score) # log sigmoid

    # get negative scores
    neg_graph = construct_negative_graph(graph, size_neg_graph) # constructs a negative graph by corrupting destination nodes by sampling randomly from all nodes in graph
    neg_score = dp(neg_graph, embeddings) # dot product function to compute dot prod along all edges of the embeddings
    neg_score = torch.nn.functional.logsigmoid(-neg_score) 

    # return loss
    loss = -torch.sum(pos_score) - torch.sum(neg_score)
    return loss / len(node_ids)

When I run this, the loss never minimises. It always gets stuck at a relatively high number. Is there something wrong with my code?

The DP function and Negative graph sampling function are defined as follows:

class DotProductPredictor(nn.Module):
    def forward(self, graph, h):
        # h contains the node representations computed from the GNN
        with graph.local_scope():
            graph.ndata['h'] = h
            graph.apply_edges(fn.u_dot_v('h', 'h', 'score'))
            return graph.edata['score']

def construct_negative_graph(graph, k):
    src, dst = graph.edges()

    neg_src = src.repeat_interleave(k)
    neg_dst = torch.randint(0, graph.number_of_nodes(), (len(src) * k,))
    return dgl.graph((neg_src, neg_dst), num_nodes=graph.number_of_nodes())

minjie · October 19, 2020, 10:32am

Hi, the code looks correct to me. For the loss issue, it is usually very case-specific so I can only provide some high-level suggestions:

Try tune the hyperparameters (e.g., learning rate, batch size, walk length in your case). If you set walk length to one, it should become a normal loss function defined on edges.
Pay attention to the parameter initializer. Sometimes a different initializer will work much better.
Try print out the gradient of each layer to check whether there are any gradient explosion problem.
Try debug with simple models first. For example, if the encoder is only an MLP rather than a GNN, does the problem still exist?

You could search for similar questions on the forum to see whether they give you the answer.

davidireland3 · October 19, 2020, 4:56pm

Thanks! I’m glad I was able to get the message passing right.

I figured my loss might not actually be that bad. As the loss function is essentially Q * the expectation of the negative score, my loss always hang around a number lower than Q, so its actually not too bad. For instance, I was training with 10 negative samples and my loss hangs around 6-7, which isn’t too bad at all as it means that the expected value of the negative scores is less than 1 (because the positive scores end up contribution 0 to the loss).

darpitdave · July 26, 2023, 8:30pm

Hi @davidireland3 and @minjie : I am trying to compute embeddings for my graph in an unsupervised approach. I am trying to use the loss function here. I have a concern for computing the DorProductPredictor(). Unlike for positive score, why do we do the dot product with itself because as in equation it should be between z_u and z_v. Is it specific to the edge prediction problem or this loss function above can be used for extracting node-level embeddings. I have a homogenous graph.

minjie · July 27, 2023, 1:57am

If you are referring to u_dot_v('h', 'h', 'score'), then it is actually doing what you are suggesting. The API name u_dot_v means it will get the first feature from source nodes (‘u’) and the second feature from destination nodes (‘v’). Since they are both ‘h’, it conducts exactly h_u dot-product h_v.