I Have A Questions About Queue Priorities With Clearml-Agent. I Have Two Queues,

Answered

I have a questions about queue priorities with clearml-agent. I have two queues, A and B . Some of my agents support queue A and B with higher priority for B and some only support A . Now what happens is that if I submit queue A tasks agents that support A and B will execute these tasks while. The behaviour that I would expect is that first all the agents that only support A will be used and only after that agents that support A and B are used. Is this a known limitation or am I doing something wrong?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Votes Newest

Answers 15

Sure thing 🙂
BTW: ReassuredTiger98 this is definitely an interesting use case, and I think you can actually write some code to solve it if you like.
Basically let's followup on you setup:
Machine X: agent listening to queue A, B_machine_a *notice we have two agents here Machine Y: agent listening to queue B_machine_bNow we (the users) will push our jobs into queues A and B
Now we have a service that does the following:
see if we have a job in queue B check if machine Y is working, if not pull the job from B and push into B_machine_b. else: check if machine X is working, if not pull the job from B and push into B_machine_a.Now the easy solution is you are that service, and you manually select the queue based on what you see in the "workers" page in the UI.
Notice that from the UI you can always move Tasks from one queue to another.
WDYT?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi ReassuredTiger98
Agent's queue priory can be translated to the order the agent will pull jobs from.
Now let's assume we have two agents with priorities A,B for one and B,A for the other. If we only push a Task to queue A, and both agents are idle (implying queue B is empty), there is no guarantee which one will pull the job.
Does that make sense ?
What is the use-case you are trying to solve/optimize for ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I will read up on the services documentation then. Thank you very much for the help 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

No. Here is a better example. I have two types of workstations: Type X can execute tasks of type A and B. Type Y can execute tasks of type B. This could be the case if type X workstations have for example more VRAM, newer drivers, etc...
I have two queues. Queue A and Queue B. I submit tasks of type A to queue A and tasks of type B to queue B.

Here is what can happen:
Enqueue the first task of type B. Workstations of type X will run this task. Enqueue the second task of type A. Workstation of type Y cannot execute it (and is not listening to queue A), so wait for the first task to finish. Workstations of type X runs the second task
Here is what should happen (should start from 1., but when saving slack just continues the list):
Enqueue the first task of type B. Workstations of type Y will run this task. Enqueue the second task of type A. Workstation of type X will run the second task.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

I see. Thank you very much. For my current problem giving priority according to queue priority would kinda solve it. For experimentation I will sometimes enqueue a task and then later enqueue a another one of a different kind, but what happens is that even though this could be trivially solved, I will have to wait for the first one to finish. I guess this is only a problem for people with small "clusters" where SLURM does not make sense, but no scheduling at all is also suboptimal.
However, I see your point about it being out of scope! Thank you very explaining. 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Makes sense, but it is not optimal if one of the agents is only able to handle tasks of a single queue (e.g. if the second agent can only work on tasks of type B).

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

To summarize: The scheduler should assign tasks the the agent first, which gives a queue the highest priority.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

With pleasure 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

a task of queue B if the next task is of type A it will have to wait,

It seems you imply there are two types of Tasks and they need to be executed one after the other ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

but it is not optimal if one of the agents is only able to handle tasks of a single queue (e.g. if the second agent can only work on tasks of type B).

How so?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Ah, now I see. This sounds like a good solution.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

To summarize: The scheduler should assign tasks the the agent first, which gives a queue the highest priority.

The issue here you assume both are idle and you need global priority based on resource preference. I understand your scenario now, but it will only hold if enqueuing order is "optimal". For example, if machine Y is running a Task B that is about to be completed (e.g. in a minute) then still machine X will pick the new Task B, and again we end up in the scenario where Task A is waiting and machine Y is idle.
The solution you are looking for is global dynamic resource scheduling and moving jobs between resources, this is a very complicated task 🙂 and actually out of scope for ClearML that said, you can check SLURM, which is the best HPC scheduling solution I'm aware of, and even there it will be hard to create a policy for such a scenario. The good news clearml integrates with slurm, so you could have slurm run the scheduling and clearml as the "external interface". I have to warn in advance, managing a SLURM cluster is challenging.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

So the service is something that I can right and that intercepts the addition of a task to a queue?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Then if the first agent is assigned a task of queue B if the next task is of type A it will have to wait, even though in theory there is capacity for it, if the first task had be executed on the second agent initially.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ReassuredTiger98
				
					0
					 × 1

Yes, albeit not actually "intercept" as the user will be able to directly put Task sin queues B_machine_a/B_machine_b , but any time the user is pushing Tasks into queue B, this service will pull it and push to the individual machines queue.
what do you think?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

1K Views

15 Answers

3 years ago

one year ago