Size of NodeBatch and EdgeBatch


I’m working on Tree-LSTM and I have an implementation like in the tutorial, it works well. But when I’m looking at GPU utility and GPU memory allocation, it’s only about 20% and it doesn’t increase when I make a bigger batch of graphs and, as a result, time of training still doesn’t change too.
As I understand, it is because of the size of NodeBatch in UDF-functions, so my question is:
Why I can’t change the size of NodeBatch and maybe you can give me advice, how I can increase GPU utility? Maybe some hacks?

By default DGL group nodes with the same in-degrees and batch them, and the batch size is the maximum number of nodes that we could reduce together (yes we do not partition them so there is no need for user to specify batch_size, it’s already maximal).

The reason GPU utility is low is because DGL spend most of the time on CPU side (as you called prop_node), the topological sort is executed on CPU and we do not use techniques to accelerate it (such as multi-threading), and we have plan to improve it: