task = Task.import_task(export_data)
pipe.add_step(
name=name,
base_task_id=task.id,
parents=parents,
task_overrides={'script.branch': 'main', 'script.version_num': '', },
execution_queue=pipe_cfg['step_queue'],
cache_executed_step=True,
clone_base_task=False
)
@<1523707653782507520:profile|MelancholyElk85> what are you trying to change ? maybe there is a better way?
BTW: if you do step_base_task.export_task()
you can use the parts that you need in the dict and pass them to:task_overrides
argument in add_step
(you might need to flatten the nested arguments with '.' , and thinking about it, maybe we should do that automatically?!)
@<1523701205467926528:profile|AgitatedDove14> this last advice works, thank you!
Suppose I have the following scenario (real-world project, real ML pipeline scenario)
- I have separate projects for different steps (ETL, train, test, tensorrt conversion...). Every step has it's own git repository, docker image, branch etc
- For quite a long time all the steps were not functioning as parts of an automated pipeline. For example, collaborative experimentation (training and validation steps). We were just focusing on reproducibility/versioning etc
- After some time, we decided to chain up everything to a single DAG to make a CI/CD and automate everything. For each step there is still a base task which I want to clone and modify every time the pipeline is launched
- Each individual step still resides in it's own project, and I want all the pipeline-initiated tasks to still reside in their respective projects
100% of things with
task_overrides
would be the most convenient way
I think the issue is that you have to pass the project ID not project name (the project unique IS is the property that is actually stored on the Task)
@<1523707653782507520:profile|MelancholyElk85> can you check the following works:
pipe.add_task(, ..., task_overrides={'project': Task.get_project_id(project_name='examples')},)
doesn't target_project force the same project on all pipeline steps?
maybe being able to change 100% of things with task_overrides
would be the most convenient way
Before the code I shared, there were some lines like this
step_base_task = Task.get_task(project_name=cfg[name]['base_project'],
task_name=cfg[name]['base_name'])
export_data = step_base_task.export_task()
... modify export_data in-place ...
task = Task.import_task(export_data)
pipe.add_step(base_task_id=task.id, clone_base_task=False, ...)
so I had a base_task for every step. Then I wanted to modify a shitton of things, and I found no better way than to export, modify, and import to another task. Then this second task serves as step without cloning (that was my intention)
@<1523707653782507520:profile|MelancholyElk85> I just run a single step pipeline and it seemed to use the "base_task_id" without cloning it...
Any insight on how to reproduce ?
The lower task is created during import_task
, the upper one - during actual execution of the step, several minutes after
@<1523701070390366208:profile|CostlyOstrich36> On the screenshot, the upper task has the lower task as parent
I ended up using
task_overrides
for every change, and this way I only need 2 tasks (a base task and a step task, thus I use
clone_base_task=True
and it works as expected - yay!)
Very cool!
BTW: you can also provide a function to create the entire Task, see base_task_factory
argument in add_step
I think it's still an issue, not critical though, because we have another way to do it and it works
I could not reproduce it, I think the issue was that when you did the "update_task()" you also updated the status?!
Could you verify? (just to clarify, the import_task, will import the "completed status" as well, hence the pipeline will clone it)
We digressed a bit from the original thread topic though 😆 About clone_base_task=False
.
I ended up using task_overrides
for every change, and this way I only need 2 tasks (a base task and a step task, thus I use clone_base_task=True
and it works as expected - yay!)
So, the problem I described in the beginning can be reproduced only this way:
- to have a base task
- export_data - modify - import_data - have a second task
- pass the second task to
add_step
withclone_base_task=False
. Then the second task is cloned and we get a third task
@<1523701205467926528:profile|AgitatedDove14> yeah, I'll try calling task.reset()
before add_step
No, IMO it's better to leave task_overrides
arguments with "." - the same structure as in the dictionary we get from export_data
- this is more intuitive
In principle, I can modify almost everything with task_overrides
, omitting export part, and it's fine. But seems that by exporting I can change more things, for example project_name
Is "project_name" diff for diff steps ? i.e. PipelineController(..., target_project='my_new_project')
is not enough?
base task is in draft status, so when I call import_data
it imports draft status as well, am I right?
I think it's still an issue, not critical though, because we have another way to do it and it works
but maybe add a lot more examples of using it to documentation
@<1523707653782507520:profile|MelancholyElk85> , Hi!
Do you have anything in the console regarding this? Also If you move to the 'INFO' tab in the experiment, there is a 'parent view' can you please validate that it links the parent to it?
@<1523707653782507520:profile|MelancholyElk85>
What's the clearml
version you are using ?
Just making sure... base_task_id has to point to a Task that is in "draft" mode, for the pipeline to use it
@<1523701205467926528:profile|AgitatedDove14> clearml 1.1.1
Yeah, of course it is in draft mode ( Task.import_task
creates a task in draft mode, it is the lower task on the screenshot)
so yeah, in short, target_project
cannot do that