Unanswered
Hi All!
Is There Any Simple Way To Use
Are you saying you "manually" pares args ?
More or less! Maybe there's a simpler solution that I haven't found yet.
I'm using torch.distributed.run to run my training on multiple GPU's.
Since I can't use the torchrun
comand (from my tests, clearml won't use it on the clearm-agent), I went with the following workaround:
distributed_args = torch.distributed.run.parse_args(sys.argv)
distributed_args.nproc_per_node = args.gpus
torch.distributed.run.run(distributed_args)
Which would be the equivalent of calling torchrun train.py arg1 arg2 ...
Except since clearml patches the parse_args
call inside of the torch.distributed.run.parse_args
function, it generates the same arguments i passed to script.py
and gives an error like "error: the following arguments are required: torchrun_arg_1 , torchrun_arg_2 ..."
167 Views
0
Answers
one year ago
one year ago