The effect of noise feature for learning pure structure information at the node level

I’ve been experimenting with some basic ideas to test GNNs, learning independently. One of my fundamental questions was whether GNNs could predict the degree in a node-level regression task. At first glance, this seems straightforward theoretically—using ones as dummy features for nodes should make it easy to learn degrees with sum aggregation.

However, I considered two scenarios regarding initial node features: they might either be relevant to the target problem or entirely irrelevant. To explore this, I tested with random noise as node features. Surprisingly, a single GNN layer couldn’t learn effectively with noisy inputs, and I started to wonder why.

Through experimentation, I found that achieving a perfect loss required two GNN layers with a single linear layer in each and, crucially, a bias term. For example, using GraphConv without bias didn’t converge, and even adding self-loops didn’t help as expected. However, with bias enabled, the model could adapt to the noisy features and converge. This suggests that the bias term may help to offset the noise, allowing the model to focus more on the structural information.

This finding made me question why libraries like DGL and PyG often disable biases in message-passing layers. Adding a bias term helped here by introducing extra flexibility for the model to filter noise and focus on the graph structure. Some might argue that biases are redundant with self-loops, but my results showed otherwise.

What are your thoughts on this? Do you have any alternative explanations? My initial question—whether GNNs can learn purely from structure versus initial features—has led to some interesting insights. Specifically, it seems the bias term is valuable when the model needs to disregard noisy features and learn primarily from structure.