Does Clearml Support Running The Experiments On Any "Serverless" Environments (I.E. Vertexai, Sagemaker, Etc.), Such That Gpu Resources Are Allocated On Demand? Alternatively, Is There A Story For Auto-Scaling Gpu Machines Based On Experiments Waiting In

Unanswered

re. "serverless" I mean running a training task on cloud services such that machines with GPUs for those tasks are provisioned on demand.
That means we don't have to keep a pool of machines with GPUs standing by, and don't have to deal with autoscaling. The cloud provider, upon receipt of such a training task, provisions the machines and runs the training.
This is a common use case for example in VertexAI.

Regarding Autoscaling - yes, autoscaling EC2 instances for example based on pending experiments in the ClearML experiments queue.
Even better - if you can autoscale (create and stop) EKS instances.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					IcyJellyfish61
				
					0
					 × 1

175 Views

0 Answers

2 years ago