task = Task.import_task(export_data)
pipe.add_step(
name=name,
base_task_id=task.id,
parents=parents,
task_overrides={'script.branch': 'main', 'script.version_num': '', },
execution_queue=pipe_cfg['step_queue'],
cache_executed_step=True,
clone_base_task=False
)
@<1523707653782507520:profile|MelancholyElk85> what are you trying to change ? maybe there is a better way?
BTW: if you do step_base_task.export_task()
you can use the parts that you need in the dict and pass them to:task_overrides
argument in add_step
(you might need to flatten the nested arguments with '.' , and thinking about it, maybe we should do that automatically?!)
@<1523701205467926528:profile|AgitatedDove14> this last advice works, thank you!
maybe being able to change 100% of things with task_overrides
would be the most convenient way
Before the code I shared, there were some lines like this
step_base_task = Task.get_task(project_name=cfg[name]['base_project'],
task_name=cfg[name]['base_name'])
export_data = step_base_task.export_task()
... modify export_data in-place ...
task = Task.import_task(export_data)
pipe.add_step(base_task_id=task.id, clone_base_task=False, ...)
so I had a base_task for every step. Then I wanted to modify a shitton of things, and I found no better way than to export, modify, and import to another task. Then this second task serves as step without cloning (that was my intention)
The lower task is created during import_task
, the upper one - during actual execution of the step, several minutes after
In principle, I can modify almost everything with task_overrides
, omitting export part, and it's fine. But seems that by exporting I can change more things, for example project_name
I think it's still an issue, not critical though, because we have another way to do it and it works
so yeah, in short, target_project
cannot do that
I ended up using
task_overrides
for every change, and this way I only need 2 tasks (a base task and a step task, thus I use
clone_base_task=True
and it works as expected - yay!)
Very cool!
BTW: you can also provide a function to create the entire Task, see base_task_factory
argument in add_step
I think it's still an issue, not critical though, because we have another way to do it and it works
I could not reproduce it, I think the issue was that when you did the "update_task()" you also updated the status?!
Could you verify? (just to clarify, the import_task, will import the "completed status" as well, hence the pipeline will clone it)
but maybe add a lot more examples of using it to documentation
@<1523707653782507520:profile|MelancholyElk85> I just run a single step pipeline and it seemed to use the "base_task_id" without cloning it...
Any insight on how to reproduce ?
@<1523707653782507520:profile|MelancholyElk85> , Hi!
Do you have anything in the console regarding this? Also If you move to the 'INFO' tab in the experiment, there is a 'parent view' can you please validate that it links the parent to it?
Is "project_name" diff for diff steps ? i.e. PipelineController(..., target_project='my_new_project')
is not enough?
Suppose I have the following scenario (real-world project, real ML pipeline scenario)
- I have separate projects for different steps (ETL, train, test, tensorrt conversion...). Every step has it's own git repository, docker image, branch etc
- For quite a long time all the steps were not functioning as parts of an automated pipeline. For example, collaborative experimentation (training and validation steps). We were just focusing on reproducibility/versioning etc
- After some time, we decided to chain up everything to a single DAG to make a CI/CD and automate everything. For each step there is still a base task which I want to clone and modify every time the pipeline is launched
- Each individual step still resides in it's own project, and I want all the pipeline-initiated tasks to still reside in their respective projects
@<1523701205467926528:profile|AgitatedDove14> clearml 1.1.1
Yeah, of course it is in draft mode ( Task.import_task
creates a task in draft mode, it is the lower task on the screenshot)
base task is in draft status, so when I call import_data
it imports draft status as well, am I right?
@<1523701070390366208:profile|CostlyOstrich36> On the screenshot, the upper task has the lower task as parent
doesn't target_project force the same project on all pipeline steps?
100% of things with
task_overrides
would be the most convenient way
I think the issue is that you have to pass the project ID not project name (the project unique IS is the property that is actually stored on the Task)
@<1523707653782507520:profile|MelancholyElk85> can you check the following works:
pipe.add_task(, ..., task_overrides={'project': Task.get_project_id(project_name='examples')},)
@<1523701205467926528:profile|AgitatedDove14> yeah, I'll try calling task.reset()
before add_step
No, IMO it's better to leave task_overrides
arguments with "." - the same structure as in the dictionary we get from export_data
- this is more intuitive
@<1523707653782507520:profile|MelancholyElk85>
What's the clearml
version you are using ?
Just making sure... base_task_id has to point to a Task that is in "draft" mode, for the pipeline to use it
We digressed a bit from the original thread topic though 😆 About clone_base_task=False
.
I ended up using task_overrides
for every change, and this way I only need 2 tasks (a base task and a step task, thus I use clone_base_task=True
and it works as expected - yay!)
So, the problem I described in the beginning can be reproduced only this way:
- to have a base task
- export_data - modify - import_data - have a second task
- pass the second task to
add_step
withclone_base_task=False
. Then the second task is cloned and we get a third task