Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Hello. Anyone Know How To Add "-M Torch.Distributed.Launch" To The Command For Distributed Training. Like In This Document

hello. anyone know how to add "-m torch.distributed.launch" to the command for distributed training. Like in this document

I tried to use like this:

clearml-task --project AAAA --name rtmdet-ins_l_8xb32-300e_coco.py --script tools/train.py --args config=configs/rtmdet/rtmdet-ins_l_8xb32-300e_coco.py launcher=pytorch m=torch.distributed.launch nproc_per_node=4 nnodes=1 node_rank=0 master_addr= master_port=29500 --docker mmdet-3.0 --docker_args="--network=host --gpus all" --queue default

But it complained that [RANK, WORLD_SIZE] variables were not set. It also meaned that the "-m torch.distributed.launch" was not parsed to python command line.

If I manually added [RANK, WORLD_SIZE] values then the training will timeout and failed. the only values that worked is [0, 1], but in this case only the first available GPU was used for training.

Posted one year ago
Votes Newest

Answers 2

hello anyone know about this problem?

Posted one year ago

following because I have a similar question

Posted one year ago
2 Answers
one year ago
one year ago