Graph / Node / Edge Normalization Techniques

I am trying to get a better understanding of normalization in graph neural networks. Traditionally, when creating multi layer perceptrons, it is recommended that you normalize the data to prevent an exploding gradient problem. If using a tabular dataset, this means to normalize each column individually of every other column in the dataset.

I am getting a little confused when reading some normalization functions with DGL. For example, dgl.transforms. RowFeatNormalizer seems to make the sum of features of each node add up to 1. This doesn’t make sense to me. What if each value of the array has a representation of completely different subjects?

  1. If we use a sports example, each node represents a player. I have a 5 value feature array for a player [1, 675, 3, 7, 0], and the values represent [seasons played, minutes played, goals, assists, fouls]. From my understanding, the RowFeatNormalizer would transform this data so the values would add up to 1, correct? Wouldn’t you want to normalize each value to the respected column, not to the other values in the array?

  2. For heterogeneous graphs that have multiple nodes with different size feature vectors (that also represent different subjects), it is recommended to do a linear transformation on the features so they have the same size for message passing to work. Should normalization happen prior to the linear transformation, or after the linear transformation? For example, if a viewer node has 3 values [25, 2, 10] that represent [age, income level, weeks watched] and an item has 5 values that are one hot encoded to represent the genre [0,0,1,0,0]. Should the linear projection happen on the viewer feature from 3 to 5 values and then normalize? Or should it be normalized first and then projected to a 5 value feature array?

I’m a beginner with graphs so I’m trying to learn the main differences between traditional machine learning data pre-processing and how its utilized in graphs. Thank you!

1 Like

Your concerns make perfect sense.

For your first question, indeed pre-packaged normalizers such as RowFeatNormalizer usually have their own assumptions on the data (for instance RowFeatNormalizer assumes that the semantics of different feature dimensions are not too different). You could take a look at scikit-learn’s normalizers for more information.

For your second question, from my experience it’s better to normalize the features first as a preprocessing step so that the values have roughly the same scale to better facilitate training.

2 Likes

What about when we first need to load the graphs to get the data? For example, I am using node centralities as features. @BarclayII

Could you elaborate more on your use case? Like, you want to normalize the node features after loading the graphs?

I have two different sources of features. One of them is from a CSV which I can normalize prior to loading the DGL graphs, I decided to do this because I saw the present discussion. Although some issues came up now.

Another source of features is the node centrality which I am calculating after loading the DGL graphs, where I transform the graph from DGL to networkx, calculate the centralities, and finally add to the features. For example for eigenvector:

     if 'eigen' in self.ablationFeatures:
            print( "calculating eigenvector!" )
            aux_graph = self.graph.to_networkx()
            nx_graph  = nx.Graph( aux_graph )
            eigen_scores = nx.eigenvector_centrality_numpy( nx_graph, max_iter = 5000 )
            eigen_scores_list = list( eigen_scores.values() )            
            eigen_tensor = torch.tensor( self.normalize_scores( eigen_scores_list, globalNormMode ) )
            self.graph.ndata[ feat2d ] = dynamicConcatenate( self.graph.ndata, eigen_tensor )
            if 'Eigen' not in self.namesOfFeatures:
                self.namesOfFeatures.append( 'Eigen' )

def dynamicConcatenate( featTensor, tensor2 ):
    if feat2d in featTensor:
        if featTensor[ feat2d ].dim() == 1:
            ret = torch.cat( ( featTensor[ feat2d ].unsqueeze(1), tensor2.unsqueeze(1) ), dim = 1 )
        else:
            ret = torch.cat( ( featTensor[ feat2d ], tensor2.unsqueeze(1) ), dim = 1 )
    else:
        ret = tensor2
    return ret

I am insecure with my code for some reasons:

  1. When normalizing values from the CSV (prior to DGL loading of graphs), there is a challenge that some nodes I wish to remove. Although, I cannot simply remove the rows for the CSV because, for each graph, I also have a CSV for the edges. So I opted for setting the label of nodes I wish to remove as ‘-1’, and when I load the DGL graph I use DGL graph remove_nodes() with the nodes that have ‘-1’ in the label. I assume it is safer to use DGL remove_nodes() than trying to remove nodes myself, considering the pair of CSVs. Is this approach reasonable?

  2. I am using a 2d tensor in ndata to store all the features. For this reason I have to store a parallel list with the names of the features. Considering I have different sources of features, this opens too much space for errors and inconsistencies. Should I be able to have multiple 1d tensors for the input features, with their respective names in DGL graph ndata?’