Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
I .

I add_step with clone_base_task=False . However, it still clones the task, wtf
image

  
  
Posted 3 years ago
Votes Newest

Answers 27


Correct

  
  
Posted 3 years ago

MelancholyElk85
What's the clearml version you are using ?
Just making sure... base_task_id has to point to a Task that is in "draft" mode, for the pipeline to use it

  
  
Posted 3 years ago

doesn't target_project force the same project on all pipeline steps?

  
  
Posted 3 years ago

In principle, I can modify almost everything with task_overrides , omitting export part, and it's fine. But seems that by exporting I can change more things, for example project_name

  
  
Posted 3 years ago

MelancholyElk85 what are you trying to change ? maybe there is a better way?
BTW: if you do step_base_task.export_task() you can use the parts that you need in the dict and pass them to:
task_overrides argument in add_step (you might need to flatten the nested arguments with '.' , and thinking about it, maybe we should do that automatically?!)

  
  
Posted 3 years ago

CostlyOstrich36 On the screenshot, the upper task has the lower task as parent

  
  
Posted 3 years ago

Before the code I shared, there were some lines like this

step_base_task = Task.get_task(project_name=cfg[name]['base_project'],
                               task_name=cfg[name]['base_name'])
export_data = step_base_task.export_task()

... modify export_data in-place ...

task = Task.import_task(export_data)
pipe.add_step(base_task_id=task.id, clone_base_task=False, ...)
  
  
Posted 3 years ago

AgitatedDove14 yeah, I'll try calling task.reset() before add_step
No, IMO it's better to leave task_overrides arguments with "." - the same structure as in the dictionary we get from export_data - this is more intuitive

  
  
Posted 3 years ago

base task is in draft status, so when I call import_data it imports draft status as well, am I right?

  
  
Posted 3 years ago

100% of things with 

task_overrides

 would be the most convenient way

I think the issue is that you have to pass the project ID not project name (the project unique IS is the property that is actually stored on the Task)
MelancholyElk85 can you check the following works:

pipe.add_task(, ..., task_overrides={'project': Task.get_project_id(project_name='examples')},)
  
  
Posted 3 years ago

AgitatedDove14 clearml 1.1.1
Yeah, of course it is in draft mode ( Task.import_task creates a task in draft mode, it is the lower task on the screenshot)

  
  
Posted 3 years ago

Correct

  
  
Posted 3 years ago

Could you try:

task.reset()
  
  
Posted 3 years ago

I think it's still an issue, not critical though, because we have another way to do it and it works

  
  
Posted 3 years ago

I ended up using 

task_overrides

 for every change, and this way I only need 2 tasks (a base task and a step task, thus I use 

clone_base_task=True

 and it works as expected - yay!)

Very cool!
BTW: you can also provide a function to create the entire Task, see base_task_factory argument in add_step

I think it's still an issue, not critical though, because we have another way to do it and it works

I could not reproduce it, I think the issue was that when you did the "update_task()" you also updated the status?!
Could you verify? (just to clarify, the import_task, will import the "completed status" as well, hence the pipeline will clone it)

  
  
Posted 3 years ago

The lower task is created during import_task , the upper one - during actual execution of the step, several minutes after

  
  
Posted 3 years ago

so I had a base_task for every step. Then I wanted to modify a shitton of things, and I found no better way than to export, modify, and import to another task. Then this second task serves as step without cloning (that was my intention)

  
  
Posted 3 years ago

MelancholyElk85 I just run a single step pipeline and it seemed to use the "base_task_id" without cloning it...
Any insight on how to reproduce ?

  
  
Posted 3 years ago

We digressed a bit from the original thread topic though 😆 About clone_base_task=False .

I ended up using task_overrides for every change, and this way I only need 2 tasks (a base task and a step task, thus I use clone_base_task=True and it works as expected - yay!)

So, the problem I described in the beginning can be reproduced only this way:

  • to have a base task
  • export_data - modify - import_data - have a second task
  • pass the second task to add_step with clone_base_task=False . Then the second task is cloned and we get a third task
  
  
Posted 3 years ago

so yeah, in short, target_project cannot do that

  
  
Posted 3 years ago

Is "project_name" diff for diff steps ? i.e. PipelineController(..., target_project='my_new_project') is not enough?

  
  
Posted 3 years ago

but maybe add a lot more examples of using it to documentation

  
  
Posted 3 years ago

MelancholyElk85 , Hi!

Do you have anything in the console regarding this? Also If you move to the 'INFO' tab in the experiment, there is a 'parent view' can you please validate that it links the parent to it?

  
  
Posted 3 years ago

Suppose I have the following scenario (real-world project, real ML pipeline scenario)

  • I have separate projects for different steps (ETL, train, test, tensorrt conversion...). Every step has it's own git repository, docker image, branch etc
  • For quite a long time all the steps were not functioning as parts of an automated pipeline. For example, collaborative experimentation (training and validation steps). We were just focusing on reproducibility/versioning etc
  • After some time, we decided to chain up everything to a single DAG to make a CI/CD and automate everything. For each step there is still a base task which I want to clone and modify every time the pipeline is launched
  • Each individual step still resides in it's own project, and I want all the pipeline-initiated tasks to still reside in their respective projects
  
  
Posted 3 years ago

task = Task.import_task(export_data)
pipe.add_step(
    name=name,
    base_task_id=task.id,
    parents=parents,
    task_overrides={'script.branch': 'main', 'script.version_num': '', },
    execution_queue=pipe_cfg['step_queue'],
    cache_executed_step=True,
    clone_base_task=False
)
  
  
Posted 3 years ago

AgitatedDove14 this last advice works, thank you!

  
  
Posted 3 years ago

maybe being able to change 100% of things with task_overrides would be the most convenient way

  
  
Posted 3 years ago
804 Views
27 Answers
3 years ago
one year ago
Tags
Similar posts