Thank you very much, Martin. Step by step I am understanding better the platform (and the more I do, the more I like it!). If you don't mind, I will write down a summary of a use case for the reusing of Tasks, taken from a recent project I made using Luigi.
Hi ShinyWhale52
Every execution of the pipeline (by definition) will create a new job based on the pipeline steps
This is the reason you see all the steps twice (the default assumption is you wish to re-run the step, as this is part of the processing workflow (e.g. training a model)
the model has been overwritten. I guess this is due to this instruction:
This is because you are storing it locally to the same path, it just reflects the fact you just overwrote your model.
To create a new unique copy of the model on the clearml-server (or any other object storage),
pass the output_uri to the Task.init call in the specific step (or configure a default_oiutput_uri in the clearml.conf of the agent)Task.init(..., output_uri='s3://my_bucket/storage')
or to store on the clearml-server:Task.init(..., output_uri=True) # could also be
`
If I wanted to reuse the previous tasks outputs (in case neither code nor parameters nor data has changed), as I said in my conversation with Martin last week, how could I change the pipline_controller.py script?
So the question is what exactly is the logic for reusing Tasks ?
I this is like a parameter for the "Dataset" then adding a parameter to the Pipeline itself makes a lot of sense (the pipeline is also a Task so we can add arguments that we can later control from the UI).
wdyt?