The speed fo nodedataloader

I found that when batchsize is large, set shuffle=False will speed a lot indgl.dataloading.NodeDataLoader.
Is there any method to speed nodedataloader except to set shuffle=False ?

I benchmarked with OGB products on an AWS g3.16x instance and found that the speed difference is not too big.

Could you share your benchmarking code and hardware settings?