Hello, I am new to dgl and now try to implement multihead Attention module for graph.
So I got hung up on trying to implement attention scores for edges, formula in pic
So for softmax, I use jsut edge_softmax from dgl.nn.functional, so everything what I came up is:
x_Q = self.Q(x).view(x.shape[0], self.num_heads, self.d_head)
x_K = self.K(x).view(x.shape[0], self.num_heads, self.d_head)
x_V = self.V(x).view(x.shape[0], self.num_heads, self.d_head)
score = ops.u_dot_v(graph, x_Q, x_K)
score = score / np.sqrt(self.d_head)
probs = edge_softmax(graph, score)
is it right?