Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
I .

I add_step with clone_base_task=False . However, it still clones the task, wtf
image

  
  
Posted 3 years ago
Votes Newest

Answers 27


task = Task.import_task(export_data)
pipe.add_step(
    name=name,
    base_task_id=task.id,
    parents=parents,
    task_overrides={'script.branch': 'main', 'script.version_num': '', },
    execution_queue=pipe_cfg['step_queue'],
    cache_executed_step=True,
    clone_base_task=False
)
  
  
Posted 3 years ago

Could you try:

task.reset()
  
  
Posted 3 years ago

100% of things with 

task_overrides

 would be the most convenient way

I think the issue is that you have to pass the project ID not project name (the project unique IS is the property that is actually stored on the Task)
@<1523707653782507520:profile|MelancholyElk85> can you check the following works:

pipe.add_task(, ..., task_overrides={'project': Task.get_project_id(project_name='examples')},)
  
  
Posted 3 years ago

Is "project_name" diff for diff steps ? i.e. PipelineController(..., target_project='my_new_project') is not enough?

  
  
Posted 3 years ago

maybe being able to change 100% of things with task_overrides would be the most convenient way

  
  
Posted 3 years ago

doesn't target_project force the same project on all pipeline steps?

  
  
Posted 3 years ago

We digressed a bit from the original thread topic though 😆 About clone_base_task=False .

I ended up using task_overrides for every change, and this way I only need 2 tasks (a base task and a step task, thus I use clone_base_task=True and it works as expected - yay!)

So, the problem I described in the beginning can be reproduced only this way:

  • to have a base task
  • export_data - modify - import_data - have a second task
  • pass the second task to add_step with clone_base_task=False . Then the second task is cloned and we get a third task
  
  
Posted 3 years ago

@<1523707653782507520:profile|MelancholyElk85> , Hi!

Do you have anything in the console regarding this? Also If you move to the 'INFO' tab in the experiment, there is a 'parent view' can you please validate that it links the parent to it?

  
  
Posted 3 years ago

The lower task is created during import_task , the upper one - during actual execution of the step, several minutes after

  
  
Posted 3 years ago

Suppose I have the following scenario (real-world project, real ML pipeline scenario)

  • I have separate projects for different steps (ETL, train, test, tensorrt conversion...). Every step has it's own git repository, docker image, branch etc
  • For quite a long time all the steps were not functioning as parts of an automated pipeline. For example, collaborative experimentation (training and validation steps). We were just focusing on reproducibility/versioning etc
  • After some time, we decided to chain up everything to a single DAG to make a CI/CD and automate everything. For each step there is still a base task which I want to clone and modify every time the pipeline is launched
  • Each individual step still resides in it's own project, and I want all the pipeline-initiated tasks to still reside in their respective projects
  
  
Posted 3 years ago

@<1523701205467926528:profile|AgitatedDove14> this last advice works, thank you!

  
  
Posted 3 years ago

@<1523707653782507520:profile|MelancholyElk85> I just run a single step pipeline and it seemed to use the "base_task_id" without cloning it...
Any insight on how to reproduce ?

  
  
Posted 3 years ago

Before the code I shared, there were some lines like this

step_base_task = Task.get_task(project_name=cfg[name]['base_project'],
                               task_name=cfg[name]['base_name'])
export_data = step_base_task.export_task()

... modify export_data in-place ...

task = Task.import_task(export_data)
pipe.add_step(base_task_id=task.id, clone_base_task=False, ...)
  
  
Posted 3 years ago

@<1523701070390366208:profile|CostlyOstrich36> On the screenshot, the upper task has the lower task as parent

  
  
Posted 3 years ago

so yeah, in short, target_project cannot do that

  
  
Posted 3 years ago

I ended up using 

task_overrides

 for every change, and this way I only need 2 tasks (a base task and a step task, thus I use 

clone_base_task=True

 and it works as expected - yay!)

Very cool!
BTW: you can also provide a function to create the entire Task, see base_task_factory argument in add_step

I think it's still an issue, not critical though, because we have another way to do it and it works

I could not reproduce it, I think the issue was that when you did the "update_task()" you also updated the status?!
Could you verify? (just to clarify, the import_task, will import the "completed status" as well, hence the pipeline will clone it)

  
  
Posted 3 years ago

@<1523707653782507520:profile|MelancholyElk85> what are you trying to change ? maybe there is a better way?
BTW: if you do step_base_task.export_task() you can use the parts that you need in the dict and pass them to:
task_overrides argument in add_step (you might need to flatten the nested arguments with '.' , and thinking about it, maybe we should do that automatically?!)

  
  
Posted 3 years ago

@<1523701205467926528:profile|AgitatedDove14> yeah, I'll try calling task.reset() before add_step
No, IMO it's better to leave task_overrides arguments with "." - the same structure as in the dictionary we get from export_data - this is more intuitive

  
  
Posted 3 years ago

In principle, I can modify almost everything with task_overrides , omitting export part, and it's fine. But seems that by exporting I can change more things, for example project_name

  
  
Posted 3 years ago

@<1523707653782507520:profile|MelancholyElk85>
What's the clearml version you are using ?
Just making sure... base_task_id has to point to a Task that is in "draft" mode, for the pipeline to use it

  
  
Posted 3 years ago

but maybe add a lot more examples of using it to documentation

  
  
Posted 3 years ago

Correct

  
  
Posted 3 years ago

I think it's still an issue, not critical though, because we have another way to do it and it works

  
  
Posted 3 years ago

@<1523701205467926528:profile|AgitatedDove14> clearml 1.1.1
Yeah, of course it is in draft mode ( Task.import_task creates a task in draft mode, it is the lower task on the screenshot)

  
  
Posted 3 years ago

Correct

  
  
Posted 3 years ago

base task is in draft status, so when I call import_data it imports draft status as well, am I right?

  
  
Posted 3 years ago

so I had a base_task for every step. Then I wanted to modify a shitton of things, and I found no better way than to export, modify, and import to another task. Then this second task serves as step without cloning (that was my intention)

  
  
Posted 3 years ago
571 Views
27 Answers
3 years ago
9 months ago
Tags
Similar posts