Unanswered
Hi, If I Am Starting My Training With The Following Command:
btw I see in the pytorch_distributed_example I see that you average_gradients
, but from pytorch https://pytorch.org/tutorials/beginner/dist_overview.html it says:DDP takes care of gradient communication to keep model replicas synchronized and overlaps it with the gradient computations to speed up training.
133 Views
0
Answers
2 years ago
one year ago
Tags