Does Clearml Support Running The Experiments On Any "Serverless" Environments (I.E. Vertexai, Sagemaker, Etc.), Such That Gpu Resources Are Allocated On Demand? Alternatively, Is There A Story For Auto-Scaling Gpu Machines Based On Experiments Waiting In

Answered

Does ClearML support running the experiments on any "serverless" environments (i.e. VertexAI, SageMaker, etc.), such that GPU resources are allocated on demand?
Alternatively, is there a story for auto-scaling GPU machines based on experiments waiting in the queue and some policy?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					IcyJellyfish61
				
					0
					 × 1

Votes Newest

Answers 3

re. "serverless" I mean running a training task on cloud services such that machines with GPUs for those tasks are provisioned on demand.
That means we don't have to keep a pool of machines with GPUs standing by, and don't have to deal with autoscaling. The cloud provider, upon receipt of such a training task, provisions the machines and runs the training.
This is a common use case for example in VertexAI.

Regarding Autoscaling - yes, autoscaling EC2 instances for example based on pending experiments in the ClearML experiments queue.
Even better - if you can autoscale (create and stop) EKS instances.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					IcyJellyfish61
				
					0
					 × 1

Hi IcyJellyfish61 , while spinning up and down EKS is not supported (albeit very cool 😄 ) we have an autoscaler in the applications section that does exactly what you need, spin up and down EC2 instances according to demand 🙂
If you're using http://app.clear.ml as you server, you can find it at https://app.clear.ml/applications .
Unfortunately, it is unavailable for the opensource server and only to paid tiers.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AnxiousSeal95
				
					0
					 × 1

Does ClearML support running the experiments on any "serverless" environments

Can you please elaborate by what you mean "serverless"?

such that GPU resources are allocated on demand?

You can define various queues for resources according to whatever structure you want. Does that make sense?

Alternatively, is there a story for auto-scaling GPU machines based on experiments waiting in the queue and some policy?

Do you mean an autoscaler for AWS for example?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Write your answer

2K Views

3 Answers

3 years ago

2 years ago