Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Hi, I Am Currently Trying To Train With

Hi, I am currently trying to train with https://github.com/open-mmlab/mmdetection using ClearML, and executing remotely. The recommended way of training multi-gpu/-process by mmdetection is to use torch.distributed.launch and I'm using torch.distributed.run . My training script which connects clearml is here: https://github.com/levan92/mmdet_clearml/blob/main/tools/dist_train_clearml.py .

torch.distributed.launch / run takes in the training_script , and also training_script_args as argparse.REMAINDER https://github.com/pytorch/pytorch/blob/a31aea8eaa99a5ff72b5d002c206cd68d5467a5e/torch/distributed/run.py#L534-L544 .

I am having an issue with passing the training script args over to the training script only when using ClearML remote execution , because (I suspect that) ClearML is converting the list of args into string instead, so for example, a list of args ['--launcher','pytorch'] becomes a string of "['--launcher','pytorch']" instead. And therefore, when the torchrun script extends the args https://github.com/pytorch/pytorch/blob/a31aea8eaa99a5ff72b5d002c206cd68d5467a5e/torch/distributed/run.py#L683 , it gets treated as a list of characters instead, resulting in the training script getting wrong args.

Posted 2 years ago
Votes Newest

Answers 3

NonchalantDeer14 , Hi! Cool find! How do does it show in the UI when you execute remotely?

Do you have a small snippet to play with?

Posted 2 years ago

you can take a look at the log, that's what I see on the UI

Posted 2 years ago
3 Answers
2 years ago
10 months ago