Accuracy drops when i add Graph Conv layer in graph calssification?

I have built a similar graph classification model like this one :
https://docs.dgl.ai/en/0.4.x/tutorials/basics/4_batch.html

on my own graph dataset, the problem is that when i remove all those graphConv layers in the example above and only have a nn.linear layer, i get a good accuracy even after 100 epochs, but when i even add a single GraphConv layer, even when the input and output are the same length (so you would think it would at least learn the identity function…) then i cant even overfit on training samples, the loss goes a lot higher on training samples even after 5000 epochs! so even with 5000 epochs the loss on TRAINING samples is higher than when i only have a linear layer (i did 5000 epoch to make sure its not because it hasn’t learned the good parameters)

so why is that? also where is the message pasing part of GCN in their graph classification example? does the GraphConv take care of that ? if not then why didn’t they include it in their example?

  1. Can you elaborate more about your graph dataset? What kind of graphs are you dealing with? What kind of topology do they have? What do you use for the initial node features?
  2. Have you tried tuning the hyperparameters?
  3. GraphConv implements message passing here.
  1. there is no fixed topology, just imagine many graphs from 5 nodes to 4000, which might have some isolated nodes, its a social network graph (custom dataset), and i created them using dgl.DGLGraph (didnt convert from nx). the initial node features are actually good and there is no problem with them because even when i average them and give them to a simple decision tree the accuracy is good but was hoping GCN would make it better. (initial node features are word2vec vectors)

  2. yes, i tried changing the hidden layer size, number of layers, learning rate, batch size.

the only difference between my model and the example in dgl website is that mine is a binary classification, so do you suggest that i use the CrossEntropyLoss loss function or something else? also should i add a softmax for the output of self.classify(hg) ?

  1. If your graphs have isolated nodes, you may want to add self loops (i.e. an edge from a node to itself) to preserve the old features of the nodes.
  2. For binary classification, you can use BCEWithLogitsLoss.