Hi All! Is There Any Simple Way To Use

Unanswered

Are you saying you "manually" pares args ?

More or less! Maybe there's a simpler solution that I haven't found yet.

I'm using torch.distributed.run to run my training on multiple GPU's.
Since I can't use the torchrun comand (from my tests, clearml won't use it on the clearm-agent), I went with the following workaround:

distributed_args = torch.distributed.run.parse_args(sys.argv)
distributed_args.nproc_per_node = args.gpus
torch.distributed.run.run(distributed_args)

Which would be the equivalent of calling torchrun train.py arg1 arg2 ...

Except since clearml patches the parse_args call inside of the torch.distributed.run.parse_args function, it generates the same arguments i passed to script.py and gives an error like "error: the following arguments are required: torchrun_arg_1 , torchrun_arg_2 ..."

  				
Posted 
	one year ago

					More  		
  Report
		
					PlainSeaurchin97
				
					0
					 × 1

213 Views

0 Answers

one year ago