Most efficient way to compute cosine similarity between two nodes

jedbl · December 9, 2020, 6:53pm

Hi,
I am building a Recommender System using DGL, using a link prediction methodology.

To train the model, I use negative sampling. The model needs to predict that a positive pair of nodes has a higher cosine similarity than a negative pair of nodes.

To compute this cosine similarity, I implemented a custom function:

def udf_u_cos_v(edges):
    cos = nn.CosineSimilarity(dim=1, eps=1e-6)
    return {'cos': cos(edges.src['h'], edges.dst['h'])}


class CosinePrediction(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, graph, h):
        """
        graph : graph with edges connecting pairs of nodes
        h : hidden state of every node
        """
        with graph.local_scope():
                graph.ndata['h'] = h
                graph.apply_edges(udf_u_cos_v)
            ratings = graph.edata['cos']
        return ratings

However, this seems to consume lots of memory.

In a DGL discussion post, it was explained that “[when] using a custom message function and a builtin-reduce function, […] DGL will use degree bucketing to parallize the computation, which is not most efficient in many cases.”

Would you have any recommendation as to how to most efficiently implement cosine similarity between two nodes in a graph?

Thanks in advance!

mufeili · December 9, 2020, 7:37pm

You may try the practice here.

jedbl · December 11, 2020, 7:43pm

@mufeili Thank you very much for the reference!

For the record: I did implement this code, and it worked very well. The maximum edge batch size to avoid out of memory errors went from 256 to 2048.

mufeili · December 14, 2020, 8:51am

Will it be useful to have an NN module for cosine similarity computation? If so, would you like to make a contribution for that?

jedbl · December 23, 2020, 3:32pm

Yes, it would. I would be happy to make a contribution for that.
How does this usually work at DGL? Let me know how I can make the contribution.

mufeili · December 29, 2020, 5:20pm

You may proceed as follows:

Fork the DGL repo
Create a new branch in your fork for coding
Create a file dist.py under dgl/python/dgl/nn/pytorch/ and define a class for CosineSimilarity in it. You can follow the examples in dgl/python/dgl/nn/pytorch/glob.py
Add import for dist.py in dgl/python/dgl/nn/pytorch/__init__.py
Add a test for it in dgl/tests/pytorch/test_nn.py/
You can then open a PR from the branch to the master branch of DGL and invite me for review

system · January 28, 2021, 5:20pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

mufeili · January 13, 2022, 6:35am

DGL now provides built-in support for cosine distance computation with EdgePredictor. You can access it by installing the nightly-built version or installing from source. It will be included in the next release.