Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, How Can I Make A Stage In A Clearml Pipeline Non-Blocking? The Scenario Is That Stages Downstream Needed Runtime Info From The First Stage, However The First Stage Needs To Continue Running To Act As A Monitor For The Other Downstream Stages.

Hi, how can i make a stage in a clearml pipeline non-blocking?
The scenario is that stages downstream needed runtime info from the first stage, however the first stage needs to continue running to act as a monitor for the other downstream stages.

  
  
Posted one year ago
Votes Newest

Answers 5


Hi @<1523701504827985920:profile|SubstantialElk6>
I would split the first stage into two. The first one passing data to the others, the second as "monitoring ", Wdyt?

  
  
Posted one year ago

The first stage is a rank0 pytorch script. The downstream stages are rankN scripts, they are waiting for the IP address of the first stage. But the first stage doesn’t return, it simply waits for the rankN scripts to connect to it. But in this case, the rankN scripts doesn’t start. So its probably necessary to have just a single stage.

If i were to start a single rank0, and subsequent rankN tasks, it would be rather messy on ClearML Dashboard. Best to have either a single clearml application or clearml pipeline to do this.

  
  
Posted one year ago

Yes it is! But ClearML didn't support multi node training out of the box in a way that it streamline the process. So we are trying to figure out a way to do it.

  
  
Posted one year ago

The downstream stages are rankN scripts, they are waiting for the IP address of the first stage.

Is this like a multi-node training, rather than a pipeline ?

  
  
Posted one year ago

If we run all the rank 0 and rank n tasks individually, it's defeats the purpose of using ClearML.

  
  
Posted one year ago
666 Views
5 Answers
one year ago
one year ago
Tags