Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
I Wanted To Ask, How To Run Pipeline Steps Conditionally? E.G If Step Returns A Specific Value, Exit The Pipeline Or Run Another Step Instead Of The Sequential Step

I wanted to ask, how to run pipeline steps conditionally? E.g if step returns a specific value, exit the pipeline or run another step instead of the sequential step

  
  
Posted 2 years ago
Votes Newest

Answers 27


Is there anything stopping you from using regular conditions life 'if' ?

  
  
Posted 2 years ago

Great to hear!

  
  
Posted 2 years ago

AgitatedDove14

  
  
Posted 2 years ago

CostlyOstrich36 This didn't work, the value is -1 however the pipe didn't stop.

  
  
Posted 2 years ago

Tagging AgitatedDove14 SuccessfulKoala55 For anyone available right now to help out.

  
  
Posted 2 years ago

Another question, in the parents sequence in pipe.add_step, we have to pass in the name of the step right?

  
  
Posted 2 years ago

Wait, so the pipeline step only runs if the pre execute callback returns True? It'll stop if it doesn't run?

  
  
Posted 2 years ago

VexedCat68 , what if you simply add pip.stop() ? Does it not stop the pipeline? Can you maybe add a print to verify that during the run the value is indeed -1? Also looking from your code it looks like you're comparing the 'merged_dataset_id' to -1

  
  
Posted 2 years ago

I did what you said, and got the pipeline DAG and then the executed of the step to use as ID. Thank you it worked fine.

  
  
Posted 2 years ago

VexedCat68 both are valid. In case the step was cached (i.e. already executed) the node.job will be None, so it is probably safer to get the Task based on the "executed" field which stores the Task ID used.

  
  
Posted 2 years ago

If there aren't N datasets, the function step doesn't Squash the datasets and instead just returns -1.

Thus if I get -1, I want the pipeline execution to end or the proceeding task to be skipped.

I have checked in the args, the value is indeed -1. Unless there is some other way for conditional pipeline steps execution.

  
  
Posted 2 years ago

CostlyOstrich36

  
  
Posted 2 years ago

before pipe.add_step(train_model)?

  
  
Posted 2 years ago

since I've either added add_functional_step or add_step

  
  
Posted 2 years ago

After the step which gets the merged dataset, I should use pipe.stop if it returned -1?

  
  
Posted 2 years ago

If I understood this correctly, so in case where we have defined steps in order as a parent child. If the parent had a pre execute callback return False, will all subsequent children nodes/steps not execute or will they just ignore it and still execute?

  
  
Posted 2 years ago

In another answer, I was shown that I can access it like this. How can I go about accessing the value of merged_dataset_id which was returned by merge_n_datasets and stored as an artifact.

  
  
Posted 2 years ago

AgitatedDove14

  
  
Posted 2 years ago

Wait, so the pipeline step only runs if the pre execute callback returns True? It'll stop if it doesn't run?

Only if you have a Callback function, and that callback function returns False, then it will skip it (otherwise it will process it)

Another question, in the parents sequence in pipe.add_step, we have to pass in the name of the step right?

Correct, the step name is a unique identifier for the pipeline

how would I access the artifact of a previous step within the pre execute callback? Can you share an example?

Basically you can do:
def pre_execute_callback_example(a_pipeline, a_node, current_param_override): # type (PipelineController, PipelineController.Node, dict) -> bool nodes = a_pipeline.get_pipeline_dag() parent_task_id = nodes[a_node.parents[0]].executed Task.get_task(parent_task_id).artifacts[...]

Is the only possible way to get a specific node

See the following (it is a dictionary where the key is the step/node unique name)
pipeline.get_pipeline_dag()

  
  
Posted 2 years ago

Looks good

  
  
Posted 2 years ago

Is this correct?

  
  
Posted 2 years ago

Okay so I read the docs and the above questions are cleared now thank you. I just have one other question, how would I access the artifact of a previous step within the pre execute callback? Can you share an example?

  
  
Posted 2 years ago

Not sure myself. I have a pipeline step now, that'll return either clearml dataset id or -1. I want to stop the pipeline execution if I get -1 in the output of that step but I'm not sure how to achieve that

  
  
Posted 2 years ago

Hmmmm this looks like what you're looking for:
https://clear.ml/docs/latest/docs/references/sdk/automation_controller_pipelinecontroller#stop-1
Tell me if this helps 🙂

  
  
Posted 2 years ago

Hi VexedCat68
(sorry I just saw the message)

I wanted to ask, how to run pipeline steps conditionally? E.g if step returns a specific value, exit the pipeline or run another step instead of the sequential step

So do do so you can do:
` def pre_execute_callback_example(a_pipeline, a_node, current_param_override):
# if we want to skip this node (and subtree of this node) we return False
...
# ew decided to skip so we return False
return False

pipe.add_step(name='stage_process', parents=['stage_data', ],
base_task_project='examples', base_task_name='step 2',
pre_execute_callback=pre_execute_callback_example,
...
) `Reference
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_tasks.py

  
  
Posted 2 years ago

regrading the actual artifact access, this is the usual Task.artifacts access: see example here:
https://github.com/allegroai/clearml/blob/master/examples/reporting/artifacts_retrieval.py

  
  
Posted 2 years ago

Is the only possible way to get a specific node, is to use one of the get_running_nodes or get_processed_nodes, and then checking every node in the list to see if the name matches the one we're looking for?

  
  
Posted 2 years ago