GATConv: mismatch between nodes and features


a newbie question here: I’m trying to define a simple GAT network for batched graph classification using the GATConv module from DGL:

class GATClassifier(nn.Module):
    def __init__(self, in_dim, out_dim, n_heads, n_classes):
        super(GATClassifier, self).__init__()

        # Add GATConv layers
        self.gat_1 = GATConv(in_feats=in_dim, out_feats=out_dim,
        self.gat_2 = GATConv(in_feats=out_dim, out_feats=out_dim,

        # Add linear classifier
        self.classify = nn.Linear(out_dim, n_classes)

    def forward(self, g, features):

        x = features

        x = self.gat_1(g, x)
        x = self.gat_2(g, x)

        g.ndata['h'] = x

        hg = dgl.mean_nodes(g, 'h')

        return self.classify(hg)

This returns the following error message:

dgl._ffi.base.DGLError: Expect number of features to match number of nodes (len(u)). Got 406 and 203 instead.

Each node in my batched graph has a 9-dimensional feature vector, hence the input size is 9 x 203.

What am I doing wrong?

Probably this is due to an inappropriate handle of attention heads? For example, maybe you used two attention heads, after the first GATConv, you got something like a list of two tensors, each with shape (203, 9). And then you by accident concatenate them in a wrong dimension, yielding a tensor of shape (406, 9)?

Sounds plausible, as I defined two attention heads for the model.

How should I account for this in the model definition?

See the examples here. The output of a GATConv is of shape (N, H, M), N for the number of nodes, H for the number of heads and M for the output size of each head. Basically you need to flatten the output or take an average of the output over the multi-head results. If you choose to flatten the output, then the input size to the next layer need to be multiplied by the number of heads in the current layer.

1 Like

As always, thanks for your help @mufeili! These examples make a lot of sense – I was looking at the code for the GAT tutorial, which lacks the averaging / flattening during the forward pass.

1 Like

Glad that helps. In fact the tutorial does the trick with the code block below:

class MultiHeadGATLayer(nn.Module):
    def __init__(self, g, in_dim, out_dim, num_heads, merge='cat'):
        super(MultiHeadGATLayer, self).__init__()
        self.heads = nn.ModuleList()
        for i in range(num_heads):
            self.heads.append(GATLayer(g, in_dim, out_dim))
        self.merge = merge

    def forward(self, h):
        head_outs = [attn_head(h) for attn_head in self.heads]
        if self.merge == 'cat':
            # concat on the output feature dimension (dim=1)
            return, dim=1)
            # merge using average
            return torch.mean(torch.stack(head_outs))

Unfortunately, the nn modules were not ready by the time of the tutorial.

1 Like