I Wanted To Ask, How To Run Pipeline Steps Conditionally? E.G If Step Returns A Specific Value, Exit The Pipeline Or Run Another Step Instead Of The Sequential Step

Answered

I wanted to ask, how to run pipeline steps conditionally? E.g if step returns a specific value, exit the pipeline or run another step instead of the sequential step

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

Votes Newest

Answers 27

Hi VexedCat68
(sorry I just saw the message)

I wanted to ask, how to run pipeline steps conditionally? E.g if step returns a specific value, exit the pipeline or run another step instead of the sequential step

So do do so you can do:
` def pre_execute_callback_example(a_pipeline, a_node, current_param_override):
# if we want to skip this node (and subtree of this node) we return False
...
# ew decided to skip so we return False
return False

pipe.add_step(name='stage_process', parents=['stage_data', ],
base_task_project='examples', base_task_name='step 2',
pre_execute_callback=pre_execute_callback_example,
...
) `Reference
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_tasks.py

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

VexedCat68 , what if you simply add pip.stop() ? Does it not stop the pipeline? Can you maybe add a print to verify that during the run the value is indeed -1? Also looking from your code it looks like you're comparing the 'merged_dataset_id' to -1

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

VexedCat68 both are valid. In case the step was cached (i.e. already executed) the node.job will be None, so it is probably safer to get the Task based on the "executed" field which stores the Task ID used.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Not sure myself. I have a pipeline step now, that'll return either clearml dataset id or -1. I want to stop the pipeline execution if I get -1 in the output of that step but I'm not sure how to achieve that

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

In another answer, I was shown that I can access it like this. How can I go about accessing the value of merged_dataset_id which was returned by merge_n_datasets and stored as an artifact.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

I did what you said, and got the pipeline DAG and then the executed of the step to use as ID. Thank you it worked fine.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

Tagging AgitatedDove14 SuccessfulKoala55 For anyone available right now to help out.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

Okay so I read the docs and the above questions are cleared now thank you. I just have one other question, how would I access the artifact of a previous step within the pre execute callback? Can you share an example?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

After the step which gets the merged dataset, I should use pipe.stop if it returned -1?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

since I've either added add_functional_step or add_step

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

Great to hear!

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

regrading the actual artifact access, this is the usual Task.artifacts access: see example here:
https://github.com/allegroai/clearml/blob/master/examples/reporting/artifacts_retrieval.py

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hmmmm this looks like what you're looking for:
https://clear.ml/docs/latest/docs/references/sdk/automation_controller_pipelinecontroller#stop-1
Tell me if this helps 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Is this correct?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

CostlyOstrich36 This didn't work, the value is -1 however the pipe didn't stop.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

If there aren't N datasets, the function step doesn't Squash the datasets and instead just returns -1.

Thus if I get -1, I want the pipeline execution to end or the proceeding task to be skipped.

I have checked in the args, the value is indeed -1. Unless there is some other way for conditional pipeline steps execution.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

If I understood this correctly, so in case where we have defined steps in order as a parent child. If the parent had a pre execute callback return False, will all subsequent children nodes/steps not execute or will they just ignore it and still execute?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

Wait, so the pipeline step only runs if the pre execute callback returns True? It'll stop if it doesn't run?

Only if you have a Callback function, and that callback function returns False, then it will skip it (otherwise it will process it)

Another question, in the parents sequence in pipe.add_step, we have to pass in the name of the step right?

Correct, the step name is a unique identifier for the pipeline

how would I access the artifact of a previous step within the pre execute callback? Can you share an example?

Basically you can do:
def pre_execute_callback_example(a_pipeline, a_node, current_param_override): # type (PipelineController, PipelineController.Node, dict) -> bool nodes = a_pipeline.get_pipeline_dag() parent_task_id = nodes[a_node.parents[0]].executed Task.get_task(parent_task_id).artifacts[...]

Is the only possible way to get a specific node

See the following (it is a dictionary where the key is the step/node unique name)
pipeline.get_pipeline_dag()

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Is the only possible way to get a specific node, is to use one of the get_running_nodes or get_processed_nodes, and then checking every node in the list to see if the name matches the one we're looking for?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

Looks good

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

before pipe.add_step(train_model)?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

AgitatedDove14

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

AgitatedDove14

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

Wait, so the pipeline step only runs if the pre execute callback returns True? It'll stop if it doesn't run?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

CostlyOstrich36

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

Is there anything stopping you from using regular conditions life 'if' ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Another question, in the parents sequence in pipe.add_step, we have to pass in the name of the step right?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

Write your answer

2K Views

27 Answers

3 years ago

2 years ago