Question about adptive_sampling example

I run the example of adaptive_sampling in this link https://github.com/dmlc/dgl/tree/master/examples/pytorch/adaptive_sampling
I found the memory usage increase every epoch and the program soon was killed due to the out of memory problem

to be more specific, my dgl version is dgl-cuda10.0 0.4 and my operation system is linux,
and all my behavior is to use python adaptive_sampling.py --batch_size 20 --node_per_layer 40 to run the program and then use free -h to check the memory usage, the memory usage increase very fast every epoch

Hi, could you please put these lines of code https://github.com/dmlc/dgl/blob/master/examples/pytorch/adaptive_sampling/adaptive_sampling.py#L422-L430
under a with torch.no_grad(): context and see if that helps? thanks.