[Blog] Understand Graph Attention Network

From Graph Convolutional Network (GCN), we learned that combining local graph structure and node-level features yields good performance on node classification task. However, the way GCN aggregates is structure-dependent, which may hurt its generalizability.


This is a companion discussion topic for the original entry at https://www.dgl.ai/blog/2019/02/17/gat.html

Complete code and notebook of this tutorial is available here: https://docs.dgl.ai/tutorials/models/1_gnn/9_gat.html.

Just wanted to know whether that above GAT can handle multiple node features.

Sounds like you want to deal with heterogeneous graphs. We are currently working on a user friendly support for them. Nevertheless you should be able to work with different node features with some hack. Feel free if you want to have more concrete ideas, in which case you may provide more details.

1 Like

Yes exactly but how to instrument GAT for handling multiple features. Refer the Blog initiated by me Multiple Node & Edge features where i got some answer but not exactly.

We are designing a Graph Representation learning model to spot patterns in detecting fraudulent transactions and also for defect detection/identification in finance domain.Here the patterns are detected/identified through connected nodes/edges of various critical parameters/indicators. Each nodes will have many features with corresponding values in it and even the Nodes/Edge label are also one of the factor in the representation learning.

Let’s say we have two kind of node features n1, n2, and two kind of edge features e1, e2, what can be the detailed node/edge feature update you would like to perform (e.g. in the form of math equations)? I’ll try to prototype it with dgl.

1 Like

Please find attached an example with basic table to identify error patterns with two parameters “Err Ind” and “Err Weightage” which are derived from various other features … “Autual”, “Recorded”, “Diff”, “Diff in Value”… An UDF will be opt for this kind of scenarios and the expression may be specific to nodes/edges. Pl suggest

The table you present suggests that you have multiple node features. Then the simplest thing I may try would be to concatenate all node features first and then perform a GNN node feature update as usual. I may also do:

  1. Add residual connection to preserve original node feature.
  2. Replace a linear layer by a MLP in GNN node feature update.
1 Like

Thanks for your quick update. Few queries from your inference
#1 Whether deep learning on node / edge label is factored in your recommendation
#2 How to handle a node feature which has string as values and sometimes string & numeric value together
#3 Is there any provision for leanings on directed graph representation in DGL ? Still I believe we may have work around to store it as one of node/edge features.

  1. Just to get started, you may treat edge labels as additional messages which can be sent with dgl.function.copy_edge. You can concatenate edge labels with the features of source nodes and perform a similar operation as my previous proposal.
  2. I will maintain a mapping between them and one-hot encodings/embeddings so that we can get rid of strings.
  3. You can use dgl for directed graphs. Either you learn on raw directed graphs or you can add edges for the other direction and perform different operations on the raw edges and the reverse edges, for which our reverse transformation might be helpful.
1 Like

Refer to your recommendation
*** Add residual connection to preserve original node feature.**
What is residual connection mean? it is just adding some more features to a node.
*** Replace a linear layer by a MLP in GNN node feature update.**
I believe you refer here as feature extraction concepts. We know message passing techniques by reduce and apply nodes. Do you refer something else? Can you please help me with some code snippets / link on feature update / MLP layer in DGL.

  1. For residual connection, you can concatenate the updated node features with raw node features so that no information loss happens.
1 Like

Does the GAT take into account edge features/attributes? If not, how to include them?

How to create the kind of graph visualization shown in the link ?

No, GAT does not take edge feature into account.
It’s possible to extend GAT so that it could handle edge information, for example:


where r_{uv}^K and r_{uv}^V are two set of edge features determined by edge type. (Reference: https://www.aclweb.org/anthology/N18-2074).

Of course you could try other methods too.

1 Like

See How to plot the attention weights...?