ClearML FAQ

Answered

I .

I add_step with clone_base_task=False . However, it still clones the task, wtf

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MelancholyElk85
				
					0
					 × 1

Votes Newest

Answers 27

In principle, I can modify almost everything with task_overrides , omitting export part, and it's fine. But seems that by exporting I can change more things, for example project_name

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MelancholyElk85
				
					0
					 × 1

Before the code I shared, there were some lines like this

step_base_task = Task.get_task(project_name=cfg[name]['base_project'],
                               task_name=cfg[name]['base_name'])
export_data = step_base_task.export_task()

... modify export_data in-place ...

task = Task.import_task(export_data)
pipe.add_step(base_task_id=task.id, clone_base_task=False, ...)

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MelancholyElk85
				
					0
					 × 1

@<1523707653782507520:profile|MelancholyElk85> I just run a single step pipeline and it seemed to use the "base_task_id" without cloning it...
Any insight on how to reproduce ?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Correct

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

@<1523701070390366208:profile|CostlyOstrich36> On the screenshot, the upper task has the lower task as parent

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MelancholyElk85
				
					0
					 × 1

Could you try:

task.reset()

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I ended up using

task_overrides

for every change, and this way I only need 2 tasks (a base task and a step task, thus I use

clone_base_task=True

and it works as expected - yay!)

Very cool!
BTW: you can also provide a function to create the entire Task, see base_task_factory argument in add_step

I think it's still an issue, not critical though, because we have another way to do it and it works

I could not reproduce it, I think the issue was that when you did the "update_task()" you also updated the status?!
Could you verify? (just to clarify, the import_task, will import the "completed status" as well, hence the pipeline will clone it)

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

We digressed a bit from the original thread topic though 😆 About clone_base_task=False .

I ended up using task_overrides for every change, and this way I only need 2 tasks (a base task and a step task, thus I use clone_base_task=True and it works as expected - yay!)

So, the problem I described in the beginning can be reproduced only this way:

to have a base task
export_data - modify - import_data - have a second task
pass the second task to add_step with clone_base_task=False . Then the second task is cloned and we get a third task

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MelancholyElk85
				
					0
					 × 1

@<1523707653782507520:profile|MelancholyElk85> what are you trying to change ? maybe there is a better way?
BTW: if you do step_base_task.export_task() you can use the parts that you need in the dict and pass them to:
task_overrides argument in add_step (you might need to flatten the nested arguments with '.' , and thinking about it, maybe we should do that automatically?!)

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Is "project_name" diff for diff steps ? i.e. PipelineController(..., target_project='my_new_project') is not enough?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Correct

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

task = Task.import_task(export_data)
pipe.add_step(
    name=name,
    base_task_id=task.id,
    parents=parents,
    task_overrides={'script.branch': 'main', 'script.version_num': '', },
    execution_queue=pipe_cfg['step_queue'],
    cache_executed_step=True,
    clone_base_task=False
)

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MelancholyElk85
				
					0
					 × 1

100% of things with

task_overrides

would be the most convenient way

I think the issue is that you have to pass the project ID not project name (the project unique IS is the property that is actually stored on the Task)
@<1523707653782507520:profile|MelancholyElk85> can you check the following works:

pipe.add_task(, ..., task_overrides={'project': Task.get_project_id(project_name='examples')},)

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

base task is in draft status, so when I call import_data it imports draft status as well, am I right?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MelancholyElk85
				
					0
					 × 1

so yeah, in short, target_project cannot do that

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MelancholyElk85
				
					0
					 × 1

maybe being able to change 100% of things with task_overrides would be the most convenient way

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MelancholyElk85
				
					0
					 × 1

so I had a base_task for every step. Then I wanted to modify a shitton of things, and I found no better way than to export, modify, and import to another task. Then this second task serves as step without cloning (that was my intention)

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MelancholyElk85
				
					0
					 × 1

but maybe add a lot more examples of using it to documentation

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MelancholyElk85
				
					0
					 × 1

@<1523701205467926528:profile|AgitatedDove14> clearml 1.1.1
Yeah, of course it is in draft mode ( Task.import_task creates a task in draft mode, it is the lower task on the screenshot)

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MelancholyElk85
				
					0
					 × 1

Suppose I have the following scenario (real-world project, real ML pipeline scenario)

I have separate projects for different steps (ETL, train, test, tensorrt conversion...). Every step has it's own git repository, docker image, branch etc
For quite a long time all the steps were not functioning as parts of an automated pipeline. For example, collaborative experimentation (training and validation steps). We were just focusing on reproducibility/versioning etc
After some time, we decided to chain up everything to a single DAG to make a CI/CD and automate everything. For each step there is still a base task which I want to clone and modify every time the pipeline is launched
Each individual step still resides in it's own project, and I want all the pipeline-initiated tasks to still reside in their respective projects

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MelancholyElk85
				
					0
					 × 1

@<1523701205467926528:profile|AgitatedDove14> this last advice works, thank you!

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MelancholyElk85
				
					0
					 × 1

I think it's still an issue, not critical though, because we have another way to do it and it works

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MelancholyElk85
				
					0
					 × 1

The lower task is created during import_task , the upper one - during actual execution of the step, several minutes after

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MelancholyElk85
				
					0
					 × 1

@<1523701205467926528:profile|AgitatedDove14> yeah, I'll try calling task.reset() before add_step
No, IMO it's better to leave task_overrides arguments with "." - the same structure as in the dictionary we get from export_data - this is more intuitive

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MelancholyElk85
				
					0
					 × 1

@<1523707653782507520:profile|MelancholyElk85>
What's the clearml version you are using ?
Just making sure... base_task_id has to point to a Task that is in "draft" mode, for the pipeline to use it

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

@<1523707653782507520:profile|MelancholyElk85> , Hi!

Do you have anything in the console regarding this? Also If you move to the 'INFO' tab in the experiment, there is a 'parent view' can you please validate that it links the parent to it?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

doesn't target_project force the same project on all pipeline steps?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MelancholyElk85
				
					0
					 × 1

Write your answer

1K Views

27 Answers

4 years ago

one year ago