Many thanks Mufeili 
I am trying to implement it and I am having some issues!
I have a batch size of 20 graphs and each graph has 8 initial node features, the hidden dimension for the graph is 4.
If I use something like my regular classifier substituting my previous pooling method by the global attention pooling (e.g. below) I get the error:
TypeError: expected Tensor as element 0 in argument 0, but got Global Attention Pooling
class Classifier2(nn.Module):
def __init__(self, in_dim, hidden_dim,added_dim, n_classes):
super(Classifier2, self).__init__()
self.conv1 = GraphConv(in_dim, hidden_dim)
self.conv2 = GraphConv(hidden_dim, hidden_dim)
self.dense1 = nn.Linear(hidden_dim+added_dim,hidden_dim+added_dim)
self.classify = nn.Linear(hidden_dim+added_dim, n_classes)
def forward(self, g,extra_feats):
# Use node degree as the initial node feature. For undirected graphs, the in-degree
# is the same as the out_degree.
h = g.ndata['h_n'].float()
# Perform graph convolution and activation function.
h = F.relu(self.conv1(g, h))
h = F.relu(self.conv2(g, h))
g.ndata['h'] = h
# Calculate graph representation by averaging all the node representations.
hg = GlobalAttentionPooling(g)
d1 = self.dense1(torch.cat((hg, extra_feats), dim=1))
return self.classify(d1)
I’ve tried to see what do I get if I just return hg as above and the size is 20x1 (batch size x 1) and I wonder how to interpret that
I’ve also tried to check what I get from hg = GlobalAttentionPooling.forward(g) and it has dimensions 13x1 which I don’t understand.
Could you help me a bit here please? I’d like to have, as you were mentioning before, the weights to use as significance for the nodes 
I’m also curious if something similar can be done with the set2set pooling?