Why delete the gradients average block

I don’t know why to delete the gradients average block. Any one knows about it?
If delete this block, each process will only use one split of the total data to train, and no communicate between processes.

In recent PyTorch versions you don’t have to do the average yourself. The distributed data parallel module does the communication for you.

See also https://pytorch.org/tutorials/intermediate/ddp_tutorial.html