I’m calculating the pairwise attention score between neighbors. I’m currently using for loops to calculate it, but I don’t think this is quite effective dgl code. I wonder how I can use native functions like update_all to calculate this? The calculation is simple, given two nodes of an edge you just compute score=h_src^T*A*h_dst. Looking at the docs, I think I can use apply_edges to first populate the edges with scores, but what’s next? Any help is appreciated, thanks!

But in GAT implementation the scores from different neighbors will eventually be reduced and thus can’t distinguish between neighbors. I need to gather something like an NxN matrix that specifies the score from any pair of connected nodes, node 1 to node 2, etc.

Will using apply_edge and then looping over all edges be significantly slower than something else?

What do you mean by “the scores from different neighbors will eventually be reduced and thus can’t distinguish between neighbors”? The attention scores are computed with a_{ij}=\frac{e^{z_{ij}}}{\sum_{k\in\mathcal{N}(j)}e^{z_{kj}}} and two edges (i_1, j) and (i_2, j) will indeed have different attention scores.

If you just want to retrieve and export the attention scores to a sparse matrix, you can follow the example here.

`apply_edges`

compute output edge feature given input source/destination node feature and edge features. To implement Transformer-like dot product attention score, you can use

```
import dgl.function as fn
g.apply_edges(fn.u_dot_v(...))
```

and edge_softmax function computes the attention weight normalized by destination/source nodes.

Could you please elaborate more on “distinguish between neighbors” so that I could understand your scenario?