I’m trying to create an inductive learning model for my own data (separate graphs), and added a confusion matrix to the evaluate() function.

When summing the confusion matrices of all batches, I noticed that the total sum of the matrix is equal to the sum of the square of the number of nodes in the graphs, rather than just the number of graphs.

This is because the evaluate function tests the predictions for the whole graph for each node, rather than just for the ‘subject node’. Therefore, each node is effectively evaluated as many times as there are nodes in its graph.

Why is the validation set up in this way? For inductive node classification, I’d expect that only the label of the ‘subject node’ is relevant, because all other labels in the graph were passed as input?

As a second question, I would like to train the network using only the classification of the other nodes as features. I would attempt to do this by just passing of the the labels other nodes as features. Is this a correct approach? How do ensure the label for the ‘subject node’ is not included?