Why delete the gradients average block

slacklife · August 4, 2020, 7:11am

I don’t know why to delete the gradients average block. Any one knows about it?
If delete this block, each process will only use one split of the total data to train, and no communicate between processes.

BarclayII · August 4, 2020, 9:58am

In recent PyTorch versions you don’t have to do the average yourself. The distributed data parallel module does the communication for you.

See also https://pytorch.org/tutorials/intermediate/ddp_tutorial.html