So it seems decorator is simply the superior option?
Kind of yes 😊
In which case would we use add_task() option?
When you have existing Tasks, and the piping is very straight forward (i.e. input / output in the code is basically referencing other Tasks/artifacts, and there is no real need to do any magic for serializing/deserializing data between steps
- Yes the main diff between add task and decorator is basically creating dag and " executes " the tasks in parallel, based on the dag dependencies
- Decorator will also take care of serializing the data in / out of the function. Imagine the pipeline logic is running as python code where the logic will wait for the function to finish only when the result of the function is being used. This means that if you need a parllel loop you can create thread pool.
Sure but the same pattern can be achieved using explicitly the
PipelineController class and defining steps using
.add_step() pointing to CLearML's
Task objects right ?
The decorators simply abstract away the controller but both methods (decorators or controller/tasks) allows to decouple your pipelines in steps each having an independent compute target, right ?
So basically choosing one method or the other only a question of best-practice or style ?
Nice, that's a great feature! I'm also trying to have a component executing Giskard QA test suites on model and data, is there a planned feature when I can suspend execution of the pipeline, and display on the UI that this pipeline "steps" require a human confirmation to go on or stop while displaying arbitrary text/plot information ?
I would take a look at the decorator example here
Think of every function as a stand-alone task running on a different machine. The controller itself is the logic that creates the jobs and passes data, and the clearml agent / autoscaler does the actual orchestration