Graph / Node / Edge Normalization Techniques

I am trying to get a better understanding of normalization in graph neural networks. Traditionally, when creating multi layer perceptrons, it is recommended that you normalize the data to prevent an exploding gradient problem. If using a tabular dataset, this means to normalize each column individually of every other column in the dataset.

I am getting a little confused when reading some normalization functions with DGL. For example, dgl.transforms. RowFeatNormalizer seems to make the sum of features of each node add up to 1. This doesn’t make sense to me. What if each value of the array has a representation of completely different subjects?

  1. If we use a sports example, each node represents a player. I have a 5 value feature array for a player [1, 675, 3, 7, 0], and the values represent [seasons played, minutes played, goals, assists, fouls]. From my understanding, the RowFeatNormalizer would transform this data so the values would add up to 1, correct? Wouldn’t you want to normalize each value to the respected column, not to the other values in the array?

  2. For heterogeneous graphs that have multiple nodes with different size feature vectors (that also represent different subjects), it is recommended to do a linear transformation on the features so they have the same size for message passing to work. Should normalization happen prior to the linear transformation, or after the linear transformation? For example, if a viewer node has 3 values [25, 2, 10] that represent [age, income level, weeks watched] and an item has 5 values that are one hot encoded to represent the genre [0,0,1,0,0]. Should the linear projection happen on the viewer feature from 3 to 5 values and then normalize? Or should it be normalized first and then projected to a 5 value feature array?

I’m a beginner with graphs so I’m trying to learn the main differences between traditional machine learning data pre-processing and how its utilized in graphs. Thank you!

Your concerns make perfect sense.

For your first question, indeed pre-packaged normalizers such as RowFeatNormalizer usually have their own assumptions on the data (for instance RowFeatNormalizer assumes that the semantics of different feature dimensions are not too different). You could take a look at scikit-learn’s normalizers for more information.

For your second question, from my experience it’s better to normalize the features first as a preprocessing step so that the values have roughly the same scale to better facilitate training.

1 Like