It depends on how complex your configuration is, but if config elements are all that will change between versions (i.e. not the code itself) then you could consider using parameter overrides.
A ClearML Task can have a number of "hyperparameters" attached to it. But once that task is cloned and in draft mode, one can EDIT these parameters and change them. If then the task is queued, the new parameters will be injected into the code itself.
A pipeline is no different, it can have pipeline parameters that are accessible in the code. When cloning the pipeline using the TaskScheduler
you can easily override these parameters 🙂
References: https://clear.ml/docs/latest/docs/pipelines/pipelines_sdk_tasks#pipeline-parameters
https://youtu.be/MX3BrXnaULs?t=162
That's what happens in the background when you click "new run". A pipeline is simply a task in the background. You can find the task using querying and you can clone it too! It is places in a "hidden" folder called .pipelines
as a subfolder on your main project. Check out the settings, you can enable "show hidden folders"
No. I would like to use TaskScheduler
for pipelines. For now it seems to me, that I need to firstly run whole pipeline to get it's id.
I would like to define the pipeline but not run it before it is run by the scheduler
You mean like sort of a stop period where you wait for additional input for pipeline to continue?
RoundMosquito25 it is true that the TaskScheduler
requires a task_id
, but that does not mean you have to run the pipeline every time 🙂
When setting up, you indeed need to run the pipeline once, to get it into the system. But from that point on, you should be able to just use the task_scheduler on the pipeline ID. The scheduler should automatically clone the pipeline and enqueue it. It will basically use the 1 existing pipeline as a "template" for subsequent runs.
The problem is that we have a a complex configuration of pipeline. Configuration changes quite frequently and we would not like to run the pipeline every time configuration changes, but we would like to have it scheduled in some defined periods.
Do you have an idea of some workaround / alternative solution for that problem?