What happens if there are isolated nodes in the input of graph classification with GCN?

Lets say that we want to classify graphs using GCN (the same as the provided example in dgl website)

  1. what happens if the graph has multiple isolated nodes? will this cause any problem in the training, should i remove isolated nodes? does it really matter? considering that if there is a isolated node then it wouldn’t participate in the message passing, but at the end where we average the nodes, it will affect the output so its not wasted as well, so there shouldn’t be any problem right?

  2. Lets say we give it a graph that all of its nodes are isolated but do have features, what will happen if we pass it to GCN? will it basically just give each node’s features to a simple 1 layer neural net (GraphConv) and then at the end just average the outputs?

and if it is needed to remove isolated nodes, how can i do it if i generated the graphs using DGLGraph (so they are not networkx graphs)

1 Like

isolated nodes would not cause trouble, when applying message passing on the graph with dgl, the result feature of isolated nodes would be filled with all zero thus does not affect training.

But wouldn’t they affect the final prediction when we mean over all the node features?

Hmmm yes you are right if you just use the output tensor of message passing for classification(or some other tasks).

However, if your graph have node features (say h_self), you can add the feature tensor and the message passing output (say h_neigh) as the final result: h = h_self + h_neighbor (like what we do in graphsage), for isolated nodes h_neighbor is all zeros thus h equals h_self, which makes sense as only the node feature matters for isolated nodes in node classification task.

The common practice is to add self-loop to the graph, which avoids isolated nodes

1 Like

In the latest dgl v0.5 release, the behavior of GCN module is changed, which will raise error if there’s isolated nodes. It might result in worse final results

1 Like

I have a similar problem to train Sage model. For isolated nodes which do have other textual features, what’s the right way to keep them in the training without having to filtering them out beforehand?

You can add self loop to the graph thus every nodes have edges. Or some nn modules(i.e. NN Modules (PyTorch) — DGL 0.7.2 documentation) have options allow_zero_in_degree. You can set it to True in your cases

1 Like

Following this discussion, I was wondering how does a GCN behave in a graph-level classification if some of the nodes are isolated but with self-loop (only connected with themselves). Are those isolated nodes with self loop taken into account but there’s no message passing happening in them? Thanks in advance!

1 Like

Yes. The isolated nodes also contribute to the final graph readout as those node embeddings are also included in aggregation for the final predicted class.

These isolated nodes will be taken into account in the final prediction where we combine the representations of all nodes for computing graph-level representations. They just do not exchange information with other nodes in the message passing phase.

This makes a lot of sense, thank you! @neo @mufeili