Hi, I am currently trying to train with  https://github.com/open-mmlab/mmdetection  using ClearML, and executing remotely. The recommended way of training multi-gpu/-process by  mmdetection  is to use  torch.distributed.launch  and I'm using  torch.distributed.run . My training script which connects clearml is here:  https://github.com/levan92/mmdet_clearml/blob/main/tools/dist_train_clearml.py .
torch.distributed.launch / run  takes in the  training_script , and also  training_script_args  as  argparse.REMAINDER    https://github.com/pytorch/pytorch/blob/a31aea8eaa99a5ff72b5d002c206cd68d5467a5e/torch/distributed/run.py#L534-L544 .
I am having an issue with passing the training script args over to the training script  only when using ClearML remote execution , because (I suspect that) ClearML is converting the list of args into string instead, so for example, a list of args  ['--launcher','pytorch']  becomes a string of  "['--launcher','pytorch']"  instead. And therefore, when the  torchrun  script  extends  the args  https://github.com/pytorch/pytorch/blob/a31aea8eaa99a5ff72b5d002c206cd68d5467a5e/torch/distributed/run.py#L683 , it gets treated as a list of characters instead, resulting in the training script getting wrong args.