Minibatching in message-passing GNN training

vymao · October 26, 2020, 5:18pm

Preface by saying I am relatively new to GNNs.

If one only has a large graph as a dataset, how does one do minibatching for message passing GNN training? Do you randomly sample nodes, construct a local subgraph, and then do this repeatedly, so a minibatch consists of several of these subgraphs? If some nodes at the edge of the subgraph were connected to nodes outside the subgraph, wouldn’t you be eliminating some connections at the outer edges then, and thus make message-passing inaccurate?

mufeili · October 27, 2020, 4:29am

The idea you described here is mini-batch training based on graph partition and you are right that there might be some information loss due to the edges between partitions. An example for this is ClusterGCN.

An alternative idea is mini-batch training based on neighbor sampling. To update the representation of a node in a single GNN layer, you can let the node gather messages only from a random sample of its neighbors. An example for this is GraphSAGE.