Hi all,

I have a set of 400 graphs, all of the same size (n=25). There are no node features.

The graphs all have similar degree distributions (drawn from the same distribution), but half of them have high clustering-coefficient (a node’s neighbors are likely to be neighbors) and the other half have low clustering coefficient. Another way of putting this: I have two sets of graphs with the same degree sequences but one has strong community structure and the other does not.

I want to train a GNN to classify the graphs, but this has been surprisingly difficult.

I started with the standard approach of giving each node its degree as a node feature, but having tried a variety of combinations using GraphConv layers, mean/max pooling, intermediate readout etc etc, but nothing has worked.

When I say nothing works, I mean that the loss does not decrease **at all**.

What does work, trivially, is giving each node a clustering feature (like number of triangles it is part of) instead of degree, but that is not very interesting.

I have read in the literature that the graph convolution architecture cannot distinguish some simple structures, and is at most as powerful as the WL-algorithm. Is this the explanation why it is performing so poorly? And do you have any suggestions on what architecture would be able to pick up on graph clustering?

Thank you for your input.