Hey Just Wanting To Know: What Is The Recommended Best Practice To Write Clearml Pipelines Between Controller And Decorators ?

Answered

Hey just wanting to know: what is the recommended best practice to write ClearML Pipelines between controller and decorators ?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					FierceHamster54
				
					0
					 × 1

Votes Newest

Answers 9

Hi FierceHamster54
I would take a look at the decorator example here
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py
Think of every function as a stand-alone task running on a different machine. The controller itself is the logic that creates the jobs and passes data, and the clearml agent / autoscaler does the actual orchestration

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Sure but the same pattern can be achieved using explicitly the PipelineController class and defining steps using .add_step() pointing to CLearML's Task objects right ?

The decorators simply abstract away the controller but both methods (decorators or controller/tasks) allows to decouple your pipelines in steps each having an independent compute target, right ?

So basically choosing one method or the other only a question of best-practice or style ?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					FierceHamster54
				
					0
					 × 1

Ooooo okay I see the @PipelineDecorator.pipeline decorator you can have a function to orchestrate your components and manipulate their return data

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					FierceHamster54
				
					0
					 × 1

As opposed to the Controller/Task component where the add_step() only allows to sequentially execute them

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					FierceHamster54
				
					0
					 × 1

Btw AgitatedDove14 is there a way to define parallel tasks and use pipeline as an acyclic compute graph instead of simply sequential tasks ?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					FierceHamster54
				
					0
					 × 1

Yes the main diff between add task and decorator is basically creating dag and " executes " the tasks in parallel, based on the dag dependencies
Decorator will also take care of serializing the data in / out of the function. Imagine the pipeline logic is running as python code where the logic will wait for the function to finish only when the result of the function is being used. This means that if you need a parllel loop you can create thread pool.
Make sense

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Nice, that's a great feature! I'm also trying to have a component executing Giskard QA test suites on model and data, is there a planned feature when I can suspend execution of the pipeline, and display on the UI that this pipeline "steps" require a human confirmation to go on or stop while displaying arbitrary text/plot information ?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					FierceHamster54
				
					0
					 × 1

So it seems decorator is simply the superior option? In which case would we use add_task() option?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					GrittyCormorant73
				
					0
					 × 1

So it seems decorator is simply the superior option?

Kind of yes 😊

In which case would we use add_task() option?

When you have existing Tasks, and the piping is very straight forward (i.e. input / output in the code is basically referencing other Tasks/artifacts, and there is no real need to do any magic for serializing/deserializing data between steps

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

2K Views

9 Answers

2 years ago