Recently, I’ve been training a medium-sized heterogeneous graph network. Specifically, the network design starts with updating the node embeddings of all approximately 140,000 nodes in the graph using two layers of
rgcn_layer, based on the relationship types in the heterogeneous graph. Subsequently, node indices for node pairs are obtained from the training set edges (
dst_nodes), and these indices are used to retrieve the corresponding node embeddings. A custom linear layer is then used to predict the relationship type between these node pairs. Finally, cross-entropy loss for the prediction of relationship types is calculated for model updates.
In practice, I’ve also generated negative sample pairs, treating “non-existent edges” as a type of edge for prediction. This means that in reality, my graph has 22 edge types plus one negative sample edge type, and I’ve calculated the cross-entropy loss for all of them together.
However, something unexpected yet somewhat predictable occurred: I found that regardless of the batch_size setting, the GPU memory usage does not change significantly, staying around 20 GB. I have tried to interpret this phenomenon myself. Perhaps it’s because the node embeddings before and after the update are already in the GPU memory, so the memory usage doesn’t change much with the number of node pairs I try to index.
As I don’t have much experience in training graph models before, I would like to know if this approach is reasonable. What potential issues could it bring? For now, the training process seems to be running smoothly…