Fastest way to implement pairwise attention score

I’m calculating the pairwise attention score between neighbors. I’m currently using for loops to calculate it, but I don’t think this is quite effective dgl code. I wonder how I can use native functions like update_all to calculate this? The calculation is simple, given two nodes of an edge you just compute score=h_src^TAh_dst. Looking at the docs, I think I can use apply_edges to first populate the edges with scores, but what’s next? Any help is appreciated, thanks!

You can follow the implementation of GAT and use edge_softmax.

But in GAT implementation the scores from different neighbors will eventually be reduced and thus can’t distinguish between neighbors. I need to gather something like an NxN matrix that specifies the score from any pair of connected nodes, node 1 to node 2, etc.

Will using apply_edge and then looping over all edges be significantly slower than something else?

What do you mean by “the scores from different neighbors will eventually be reduced and thus can’t distinguish between neighbors”? The attention scores are computed with a_{ij}=\frac{e^{z_{ij}}}{\sum_{k\in\mathcal{N}(j)}e^{z_{kj}}} and two edges (i_1, j) and (i_2, j) will indeed have different attention scores.

If you just want to retrieve and export the attention scores to a sparse matrix, you can follow the example here.

1 Like

apply_edges compute output edge feature given input source/destination node feature and edge features. To implement Transformer-like dot product attention score, you can use

import dgl.function as fn

g.apply_edges(fn.u_dot_v(...))

and edge_softmax function computes the attention weight normalized by destination/source nodes.

Could you please elaborate more on “distinguish between neighbors” so that I could understand your scenario?

1 Like