MelancholyElk85
What's the clearml
version you are using ?
Just making sure... base_task_id has to point to a Task that is in "draft" mode, for the pipeline to use it
doesn't target_project force the same project on all pipeline steps?
In principle, I can modify almost everything with task_overrides
, omitting export part, and it's fine. But seems that by exporting I can change more things, for example project_name
MelancholyElk85 what are you trying to change ? maybe there is a better way?
BTW: if you do step_base_task.export_task()
you can use the parts that you need in the dict and pass them to:task_overrides
argument in add_step
(you might need to flatten the nested arguments with '.' , and thinking about it, maybe we should do that automatically?!)
CostlyOstrich36 On the screenshot, the upper task has the lower task as parent
Before the code I shared, there were some lines like this
step_base_task = Task.get_task(project_name=cfg[name]['base_project'],
task_name=cfg[name]['base_name'])
export_data = step_base_task.export_task()
... modify export_data in-place ...
task = Task.import_task(export_data)
pipe.add_step(base_task_id=task.id, clone_base_task=False, ...)
AgitatedDove14 yeah, I'll try calling task.reset()
before add_step
No, IMO it's better to leave task_overrides
arguments with "." - the same structure as in the dictionary we get from export_data
- this is more intuitive
base task is in draft status, so when I call import_data
it imports draft status as well, am I right?
100% of things with
task_overrides
would be the most convenient way
I think the issue is that you have to pass the project ID not project name (the project unique IS is the property that is actually stored on the Task)
MelancholyElk85 can you check the following works:
pipe.add_task(, ..., task_overrides={'project': Task.get_project_id(project_name='examples')},)
AgitatedDove14 clearml 1.1.1
Yeah, of course it is in draft mode ( Task.import_task
creates a task in draft mode, it is the lower task on the screenshot)
I think it's still an issue, not critical though, because we have another way to do it and it works
I ended up using
task_overrides
for every change, and this way I only need 2 tasks (a base task and a step task, thus I use
clone_base_task=True
and it works as expected - yay!)
Very cool!
BTW: you can also provide a function to create the entire Task, see base_task_factory
argument in add_step
I think it's still an issue, not critical though, because we have another way to do it and it works
I could not reproduce it, I think the issue was that when you did the "update_task()" you also updated the status?!
Could you verify? (just to clarify, the import_task, will import the "completed status" as well, hence the pipeline will clone it)
The lower task is created during import_task
, the upper one - during actual execution of the step, several minutes after
so I had a base_task for every step. Then I wanted to modify a shitton of things, and I found no better way than to export, modify, and import to another task. Then this second task serves as step without cloning (that was my intention)
MelancholyElk85 I just run a single step pipeline and it seemed to use the "base_task_id" without cloning it...
Any insight on how to reproduce ?
We digressed a bit from the original thread topic though 😆 About clone_base_task=False
.
I ended up using task_overrides
for every change, and this way I only need 2 tasks (a base task and a step task, thus I use clone_base_task=True
and it works as expected - yay!)
So, the problem I described in the beginning can be reproduced only this way:
- to have a base task
- export_data - modify - import_data - have a second task
- pass the second task to
add_step
withclone_base_task=False
. Then the second task is cloned and we get a third task
so yeah, in short, target_project
cannot do that
Is "project_name" diff for diff steps ? i.e. PipelineController(..., target_project='my_new_project')
is not enough?
but maybe add a lot more examples of using it to documentation
MelancholyElk85 , Hi!
Do you have anything in the console regarding this? Also If you move to the 'INFO' tab in the experiment, there is a 'parent view' can you please validate that it links the parent to it?
Suppose I have the following scenario (real-world project, real ML pipeline scenario)
- I have separate projects for different steps (ETL, train, test, tensorrt conversion...). Every step has it's own git repository, docker image, branch etc
- For quite a long time all the steps were not functioning as parts of an automated pipeline. For example, collaborative experimentation (training and validation steps). We were just focusing on reproducibility/versioning etc
- After some time, we decided to chain up everything to a single DAG to make a CI/CD and automate everything. For each step there is still a base task which I want to clone and modify every time the pipeline is launched
- Each individual step still resides in it's own project, and I want all the pipeline-initiated tasks to still reside in their respective projects
task = Task.import_task(export_data)
pipe.add_step(
name=name,
base_task_id=task.id,
parents=parents,
task_overrides={'script.branch': 'main', 'script.version_num': '', },
execution_queue=pipe_cfg['step_queue'],
cache_executed_step=True,
clone_base_task=False
)
AgitatedDove14 this last advice works, thank you!
maybe being able to change 100% of things with task_overrides
would be the most convenient way