Hi, I have a question about training RGCN models on single-machine with multi-GPU (See code entity_classify_mp.py). It uses DistributedDataParallel
to synchronize gradients on different processes. Why does it still need th.distributed.barrier()
?
I think th.distributed.barrier()
waits for all process job to be completed. Since DistributedDataParallel
only syncs gradients after all process job is done, this step can ensure all process job complete, if so, why still need th.distributed.barrier()
?