DGL gpu utilization rate is too low

Hello! I found that my gpu utilization was too low, then I increased my batch size from 128 to 1024, but it seems that the training time remains the same. I feel strange why increasing too much batch size cannot speed up my program? Is it the feature of dgl or something else?

How large is your batched graph for batch size equal to 128 and 1024? How many layers and how large is each layer? Usually it is because that the GNN workload is very light. For example, many GNN models only have two or three layers. Each layer has one sparse operation for computing message passing and one dense operation for linear project, which are all very fast operations on GPU. If the model is not heavy enough, the GPU utilization could be quite low.

I’m using 4 layer GCN models in paper https://arxiv.org/abs/2003.00982 for graph classification

Take the MNIST superpixel graph as an example. Each graph has ~50 nodes so batching 128 and 1024 of them gives roughly 6K and 50K nodes in the batched graph. The layer size is around 150. Hense, the linear project at each layer is a multiplication of a 50Kx150 and a 150x150 matrix, and this is a very small workload especially for a powerful GPU. You can try to increase the hidden layer by 10x or 20x and shall see the utilization going up.