Unanswered
Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning
@<1578555761724755968:profile|GrievingKoala83> Looks like something inside NCCL now fails which doesn't allow rank0 to start. are you running this inside a docker container? what is the output of nvidia-smi
inside of this container?
65 Views
0
Answers
6 months ago
6 months ago