I wrote variation over GAT layer, where edges are equipped with fully valuable features not only attention scores, it was used in processing protein structures recently in our paper: Rossmann-toolbox: a deep learning-based protocol for the prediction and design of cofactor specificity in Rossmann fold proteins - PubMed . We show there improvement over regular GAT and GCN layers, when we supplied additional features (residue-residue interactions in edges), especially when graph density is increasing.
In my opinion, it may be helpful not only in this particular case but in general. I may contribute it to the DGL repository if it looks helpful for other users. According to DGL docs, firstly I wrote about it here.
layer detailed description (with image ofc :)) is available here: https://academic.oup.com/bib/advance-article/doi/10.1093/bib/bbab371/6375059#supplementary-data
source code is based on dgl.nn.pytorch.conv.graphconv — DGL 0.8 documentation
egat = MultiHeadEGATLayer(in_node_feats=num_node_feats, in_edge_feats=num_edge_feats, #input edge feature size out_node_feats=10, #output edge feature size out_edge_feats=10, num_heads=3, activation=th.nn.functional.leaky_relu) #add activation if needed
new_node_feats, new_edge_feats = egat(graph, node_feats, edge_feats) #new_node_feats.shape = (*, num_heads, out_node_feats) #new_edge_feats.shape = (*, num_heads, out_edge_feats)
Params naming, and source code is based on dgl GATConv implementation.
Looking forward to your comments.