Unanswered
[Clearml With Pytorch-Based Distributed Training}
Hi Everyone! Is The Combination Of Clearml With
Is ClearML combined with DataParallel
or DistributedDataParallel
officially supported / should that work without many adjustments?Yes it is suported, and should work
If so, would it be started via python ...
or via torchrun ...
?Yes it should, hence the request for a code snippet to reproduce the issue you are experiencing
What about remote runs, how will they support the parallel execution?Supported, You should see in the "script entry" something like "-m -m torch.distributed.launch --nproc_per_node 2 ..."
To go even deeper, what about the machines started via ClearML Autoscaler?
Should work out of the box, this is considered a single Job/Task no need to spin multiple agents for that
152 Views
0
Answers
one year ago
one year ago