Most efficient way to compute cosine similarity between two nodes

Hi,
I am building a Recommender System using DGL, using a link prediction methodology.

To train the model, I use negative sampling. The model needs to predict that a positive pair of nodes has a higher cosine similarity than a negative pair of nodes.

To compute this cosine similarity, I implemented a custom function:

def udf_u_cos_v(edges):
    cos = nn.CosineSimilarity(dim=1, eps=1e-6)
    return {'cos': cos(edges.src['h'], edges.dst['h'])}


class CosinePrediction(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, graph, h):
        """
        graph : graph with edges connecting pairs of nodes
        h : hidden state of every node
        """
        with graph.local_scope():
                graph.ndata['h'] = h
                graph.apply_edges(udf_u_cos_v)
            ratings = graph.edata['cos']
        return ratings

However, this seems to consume lots of memory.

In a DGL discussion post, it was explained that “[when] using a custom message function and a builtin-reduce function, […] DGL will use degree bucketing to parallize the computation, which is not most efficient in many cases.”

Would you have any recommendation as to how to most efficiently implement cosine similarity between two nodes in a graph?

Thanks in advance!

You may try the practice here.

@mufeili Thank you very much for the reference!

For the record: I did implement this code, and it worked very well. The maximum edge batch size to avoid out of memory errors went from 256 to 2048.

1 Like

Will it be useful to have an NN module for cosine similarity computation? If so, would you like to make a contribution for that?

Yes, it would. I would be happy to make a contribution for that.
How does this usually work at DGL? Let me know how I can make the contribution.

You may proceed as follows:

  1. Fork the DGL repo
  2. Create a new branch in your fork for coding
  3. Create a file dist.py under dgl/python/dgl/nn/pytorch/ and define a class for CosineSimilarity in it. You can follow the examples in dgl/python/dgl/nn/pytorch/glob.py
  4. Add import for dist.py in dgl/python/dgl/nn/pytorch/__init__.py
  5. Add a test for it in dgl/tests/pytorch/test_nn.py/
  6. You can then open a PR from the branch to the master branch of DGL and invite me for review

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

DGL now provides built-in support for cosine distance computation with EdgePredictor. You can access it by installing the nightly-built version or installing from source. It will be included in the next release.