Hi Team! Is There A Way To Make Clearml’S Aws Autoscaler And Queues Resource-Aware Please? I.E. If We Can Say, As We Enqueue Our Job, How Much Ram Or Gpu-Ram Or Even Gpus It Needs, Have The Scheduler/Autoscaler Dispatch The Job To Instances That Are Of Th

Unanswered

Thank you! I think it does. It’s just now dawning on me that: because a pipeline is composed of multiple tasks, different tasks in the pipeline could run on different machines. Or more specifically, they could run on different queues, and as you said, in your other response, we could have a Q for smaller CPU-based instances, and another queue larger GPU-based instances.

I like the idea of having a queue dedicated to CPU-based instances that has multiple agents running on it simultaneously. Like maybe four agents. Those agents could be used for more I/O-intensive tasks, such as writing results to our data warehouse. I think that would be a good used case for having a single resource handle multiple tasks concurrently.

Thanks for discussing this so thoroughly with me!

I will be starting with the AWS auto scaler script in the ClearML examples in GitHub. Do you happen to know if using that script? There is a straightforward way to provide a user-data.sh script? I imagine that’s how we would do things like fetching secrets from AWS’s secrets manager and starting the concurrent agents

  				
Posted 
	2 years ago

					More  		
  Report
		
					BattyCrocodile47
				
					0
					 × 1

231 Views

0 Answers

2 years ago