Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Unanswered
Hi, I Am Trying To Setup Multi-Node Training With Pytorch Distributeddataparallel. Ddp Requres A Launch Script With A Set Of Parameters To Be Run On Each Node. One Of These Parameters Is Master Node Address. I Am Currently Using The Following Scheme:


This task is picked up by first agent; it runs DDP launch script for itself and then creates clones of itself with task.create_function_task() and passes its address as argument to the function

Hi UnevenHorse85
Interesting use case, just for my understanding, the idea is to use ClearML for the node allocation/scheduling and PyTorch DDP for the actual communication, is that correct ?

passes its address as argument to the function

This seems like a great solution.

the queue is polluted with lots of cloned tasks that have to be aborted manually, and the whole job only requires only ...

I wouldn't say the queue pollution is the issue (or the multiple copies of the cloned Tasks), I think the main issue here is that the allocated nodes have to wait until all nodes are allocated, no?
Regrading Task pollution, when the master node is done, it can delete all child/cloned Tasks so it is easier on the eyes. This way if something goes wrong in one of the nodes, you have full visibility, but when everything works, you end up with a clean single copy.
wdyt?

  
  
Posted 2 years ago
88 Views
0 Answers
2 years ago
one year ago