Unanswered
[Clearml With Pytorch-Based Distributed Training}
Hi Everyone! Is The Combination Of Clearml With
AgitatedDove14 maybe to come at this from a broader angle:
Is ClearML combined with DataParallel
or DistributedDataParallel
officially supported / should that work without many adjustments? If so, would it be started via python ...
or via torchrun ...
? What about remote runs, how will they support the parallel execution? To go even deeper, what about the machines started via ClearML Autoscaler? Can they either run multiple agents on them and/or start remote distributed launches?
144 Views
0
Answers
one year ago
one year ago