Ooooo okay I see the @PipelineDecorator.pipeline
decorator you can have a function to orchestrate your components and manipulate their return data
So it seems decorator is simply the superior option?
Kind of yes 😊
In which case would we use add_task() option?
When you have existing Tasks, and the piping is very straight forward (i.e. input / output in the code is basically referencing other Tasks/artifacts, and there is no real need to do any magic for serializing/deserializing data between steps
So it seems decorator is simply the superior option? In which case would we use add_task() option?
Sure but the same pattern can be achieved using explicitly the PipelineController
class and defining steps using .add_step()
pointing to CLearML's Task
objects right ?
The decorators simply abstract away the controller but both methods (decorators or controller/tasks) allows to decouple your pipelines in steps each having an independent compute target, right ?
So basically choosing one method or the other only a question of best-practice or style ?
Nice, that's a great feature! I'm also trying to have a component executing Giskard QA test suites on model and data, is there a planned feature when I can suspend execution of the pipeline, and display on the UI that this pipeline "steps" require a human confirmation to go on or stop while displaying arbitrary text/plot information ?
As opposed to the Controller/Task component where the add_step()
only allows to sequentially execute them
- Yes the main diff between add task and decorator is basically creating dag and " executes " the tasks in parallel, based on the dag dependencies
- Decorator will also take care of serializing the data in / out of the function. Imagine the pipeline logic is running as python code where the logic will wait for the function to finish only when the result of the function is being used. This means that if you need a parllel loop you can create thread pool.
Make sense
Btw AgitatedDove14 is there a way to define parallel tasks and use pipeline as an acyclic compute graph instead of simply sequential tasks ?
Hi FierceHamster54
I would take a look at the decorator example here
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py
Think of every function as a stand-alone task running on a different machine. The controller itself is the logic that creates the jobs and passes data, and the clearml agent / autoscaler does the actual orchestration