Hello Folks! I Have A Pipeline With Three Tasks: A, B, And C I Want To Set It Up So That: A Gets Assigned A Machine (E.G. Based On The Queue) B Always Gets Assigned To The Same Machine As A (But May Run In A Different Docker Etc.) C Will Be Submitted To

Answered

Hello folks!
I have a pipeline with three tasks: A, B, and C
I want to set it up so that:

A gets assigned a machine (e.g. based on the queue)
B always gets assigned to the same machine as A (but may run in a different docker etc.)
C will be submitted to a different queue and I don’t care as much

Is there a way to define “task affinity” in this way?

  				
Posted 
	2 years ago

					More  		
  Report
		
					RoughTiger69
				
					0
					 × 1

Votes Newest

Answers 10

Hi RoughTiger69 , you can specify a queue per step with execution_queue parameter in add_function_step
https://clear.ml/docs/latest/docs/references/sdk/automation_controller_pipelinecontroller

Same goes for the docker image - docker parameter add_function_step

  				
Posted 
	2 years ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

AgitatedDove14 much obliged!

  				
Posted 
	2 years ago

					More  		
  Report
		
					RoughTiger69
				
					0
					 × 1

I don’t think so.
In most cases I woudl have multiple agents pulling from the same queue. I can’t have a queue per pipeline execution.
So if I submit A and B to the same queue, it still doesn’t gurantee that they will be pulled by the same agent….

  				
Posted 
	2 years ago

					More  		
  Report
		
					RoughTiger69
				
					0
					 × 1

Is there a specific reason you would want them executed on the same machine? Cache?

  				
Posted 
	2 years ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

Really what I need is for A and B to be separate tasks, but guarantee they will be assigned to the same machine so that the clearml dataset cache on that machine will be warm.

I think that what you are looking for is multi-machine cache (which is fully supported). Basically mount an NFS/SMB folder from a NAS to any of those machines, configure the cache folder to point to it, and not you do not need to worry about affinity ?
no?

Is there a way to group A and B into a sub-pipeline, have the pipeline be queued and executed remotely, but the tasks A and B inside it be treated like local tasks? or something like that?

actually yes, you could have pipeline AB' , that always "executes locally" (meaning not scheduling itself or it's components) , where A, B are the components. from the original pipeline perspective the component is a Task AB (which is this new pipeline). The only caveat is that pipeline AB, tasks A, B need to be on the same git repo

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

CostlyOstrich36

  				
Posted 
	2 years ago

					More  		
  Report
		
					RoughTiger69
				
					0
					 × 1

C will be submitted to a different queue and I don’t care as much

Is there a way to define “task affinity” in this way?

Hi RoughTiger69 ,
when you say Task affinity, you mean, I want C to be executed next to A/B ? Affinity as a concept doesn't really exist, it can be abstracted to a queue, where you have agents pulling from multiple queues. Then C can be pushed to one the the queues (in theory you might be able to programmtically control the Queue of C), wdyt?

  				
Posted 
	2 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Is that what you're looking for?

  				
Posted 
	2 years ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

Oh, you want the same machine to execute the two tasks/steps?

  				
Posted 
	2 years ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

CostlyOstrich36 yes, for the cache.
AgitatedDove14 I am not sure queue will be sufficient. it would require a queue per execution of the pipeline.

Really what I need is for A and B to be separate tasks, but guarantee they will be assigned to the same machine so that the clearml dataset cache on that machine will be warm.

Is there a way to group A and B into a sub-pipeline, have the pipeline be queued and executed remotely, but the tasks A and B inside it be treated like local tasks? or something like that?

  				
Posted 
	2 years ago

					More  		
  Report
		
					RoughTiger69
				
					0
					 × 1

Write your answer

2K Views

10 Answers

2 years ago