Help! Cannot get GCN model to decrease

I was training on a self-collected dataset using GCN model.
Specifically, I constructed many small graphs batched together during training, with batch_size=100.
For each node, I extracted Resnet 2048-dim feature, the goal is to do binary classification for each graph.
Loss: BCELoss

GCN model:
Connection: bidirectional, fully connected edges for each graph
Model: 3 layers: 2048->128->1, all are GCN layer

MLP model:
For verification, I trained MLP (2048->128->128->1, ReLU in between) on this dataset, it quickly overfits:

whereas GCN model cannot:

Any suggestions? Thanks in advance.

Fully connected edge will result in all node has the same feature after one GraphConv (if you sum/mean over all the neighbors). You may want to try different typologies.

Oh, that’s true! Thanks for your reply! In my problem, there’s one node that’s crucial for determining the label of the whole graph while the rest nodes can be seen as a whole. Therefore, I chose the topology by fully connecting all the nodes with each other, because I figured the rest nodes may also need to share some information internally. Besides changing graph operator and aggregation operator, as you proposed, are there other topologies that you suggest worth trying?

@jialidua In your case maybe you don’t need graph neural networks at all. If the features of the crucial nodes are indicative enough, then simply performing a weighted sum of node features with gating will be pretty good. If you really want to use graph neural networks, you may start with similarity graph where a pair of nodes is connected only when the norm of their feature difference is smaller than a pre-specified threshold.

Thank you@mufeili for the suggestion! I may have misled a bit by saying “one node is crucial for determining the graph label”. Actually, this node may be replaced by another node, and thus changing the graph label by concerning its relation with the remaining nodes.

No weighting nor order information about the nodes are given. In the MLP experiment, I simply added up the features of nodes in the graph across dimensions and it seems to work not so well on the validation set.

Assuming we have a good graph construction approach, typically GNN based graph level prediction works as follows:

  1. 2-3 layers of graph convolution/message passing
  2. A weighted sum of the updated node representation where weights can be computed by first projecting node representations to a scalar and then apply a sigmoid

Various tricks can be tried like normalization of node representations, skip connection, etc. As pointed out by How Powerful are GNNs you may want to use sum for neighbor feature aggregation.

To avoid overfitting, you can use dropout and early stop on validation set.

1 Like

You can try GAT/Transformer like model, to see what weight is learned from your features.

1 Like

Also connected graph may be enough. Fully connected graph doesn’t help when using GCN

1 Like

Awesome! Thanks for the suggestions!

Thanks for pointing it out!

I tried using GAT during training and during debugging I was confused about some phenomenon:

The code snippet is here: basically what it does is that I created a bunch of graphs on the fly and I tried batching them together so that I can get updated gat feature with only single pass.

The debugged variable “batch” contains two graphs, and 18 connections.

. There are features in node “92” before propagation:

However, after propagation, the feature disappeared (using the same gat.py used in the github repo except that graph g is moved to input param of forward function (https://gist.github.com/davidsonic/8cd693ecf75408b656a4e48e87c8eac2)):

What’s more confusing is I tried propagation again (No backward pass was performed yet).


This time, the feature appeared for this node!

What if I only compute gat propagation on one graph only? I get one more result.

I understand why one more propagation is different. But why is the result of forwarding batch and graphs[0] different? If this is the case, then it means that batch training and individual training will get different result?

Putting that aside, I still can’t get the loss to decrease.

What’s the topology you used for these graphs?

They are complete graphs with bidirectional edges. Self-loop is also added. Eg:
coo_row: 1 1 2 2 3 3 1 2 3
coo_col: 2 3 1 3 1 2 1 2 3

The total number of nodes are fixed for all graphs (100), but there are only connections among part of the nodes like above.

I’m not sure about why it’s zero. One possible scenario is that the corresponding node doesn’t have incoming edges. Therefore it’s value would be updated to zero.

I think there might be some other reasons since each node is fully connected. I will debug in more details into source codes later. BTW, I have figured out a way to get the loss decreasing using a variant of GCN with current topology. Thank you @VoVAllen for the discussion!