Question about adptive_sampling example

I run the example of adaptive_sampling in this link
I found the memory usage increase every epoch and the program soon was killed due to the out of memory problem

to be more specific, my dgl version is dgl-cuda10.0 0.4 and my operation system is linux´╝î
and all my behavior is to use python --batch_size 20 --node_per_layer 40 to run the program and then use free -h to check the memory usage, the memory usage increase very fast every epoch

Hi, could you please put these lines of code
under a with torch.no_grad(): context and see if that helps? thanks.