Understanding Attention in Graph Networks for Chemical Structures

Hi everyone,

I’ve done a lot of deep learning work for chemical synthesis using SMILES representations and transformer models. These have worked quite well, but I want to work with graph-based networks as well to compare performance. I haven’t quite wrapped my head around GCNN/GATs yet, and I wanted to ask some clarifying questions.

Source vectors for attention

As I understand it, attention within a node is applied over all feature vectors sent to that node from direct neighbors. If this is correct, is this technique restricted to direct neighbors, or can a node gather and attend to all features in a graph?

Going off that, does the GAT structure allow a node to attend to a different disconnected graph? The specific example I’m looking at is the case of two chemical reactants, like these:

In graph form, these two molecules are separate graphs. Is it possible for nodes on one molecule graph to attend to or otherwise consider nodes on the other molecule graph? (For reference, the ability for one chemical token to attend to all tokens in the reaction is a major reason why transformers work extremely well on this task).

One thought I had when dealing with this was to run a transformer type attention layer over the entire feature vector after a message passing step, but this gets into my next question.

Handling batching when multiple examples contain disconnected graphs

Going off the documentation, batching graphs is handled by merging graphs like so:

Batching implies that each input example is a single discrete graph. Now consider two chemical examples:

Here we have two examples. Two reactant sets. Each reactant set contains two separate molecule graphs. If I packaged both examples into a graph batch, it would look the same as four separate examples. How do I represent this kind of data in batched form that preserves the notion that sometimes separate graphs belong to the same example?


Source vectors for attention

In practice, this is only applied to direct neighbors. To make a node gather and attend to all features, you can convert molecular graphs into complete ones, where each atom is connected to all atoms.

Similarly, for cross-graph attention you can also merge the two graphs into one graph and add connections between them.

Handling batching when multiple examples contain disconnected graphs

My rough impression is that you want to distinguish the reactant membership in the batched graph. Ideally, the membership can be part of graph data and handled by dgl.batch automatically. Unfortunately this has not been supported yet. As a workaround, since we know that the batching operation preserves the order of graphs in the list to dgl.batch, we can maintain a list for their membership separately.

1 Like

With respect to creating a complete graph, there is an advantage to maintaining bond specific information. For example in this paper:


They make use of convolutions over bonds and convolutions over the entire molecule space. Is this something that could be done with a heterogenous graph? For example there could be one edge type for bonds that connect locally, and one edge type for global attention.

That’s quite fair. I believe it’s possible to perform this with a heterogeneous graph. Let me know if you encounter issues in implementation.

1 Like