Hello All! I Am New To Clearml And Recently Installed Clearml-Server On My K8 Cluster Via The Helm Charts. I Am Now Trying To Run The Aws Auto-Scaler Just Via The Ui, However There Doesnt Seem To Be A "Services" Queue And When I Create One (Just By Name I

Answered

Hello all! I am new to ClearML and recently installed clearml-server on my k8 cluster via the helm charts. I am now trying to run the AWS Auto-Scaler just via the UI, however there doesnt seem to be a "services" queue and when I create one (just by name in the UI) and enque the task it just stays pending due to no agents. I thought the documentation said that there is a minimal clearml-agent inside the ml-server install for the purposes the services queue. Is that not correct?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ZippyAlligator65
				
					0
					 × 1

Votes Newest

Answers 4

Hi, in k8s autoscaling must be managed by cloud pro user autoscaler. When the clearml-agent bound to related queue will spawn a new task pod with configured resources, k8s will adapt. On AWS you can start here https://docs.aws.amazon.com/eks/latest/userguide/autoscaling.html

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					JuicyFox94
				
					0
					 × 1

Could you elaborate? What is “cloud pro user autoscaler” are you referring to the managed version of ClearML vs self-hosted? The ClearML-agent mentions two “flavors” of k8s integration as I understand it: a daemon ClearML-agent to spin up up sibling containers vs direct mapping to k8s jobs. Does the ClearML-agent helm chart allow you to chose or is only set up for the k8s glue method? I’ve seen some people have some troubles with the k8 glue method, so I was going to try and have a single daemon agent inside k8s for the services queue, but then use the autoscaler.py to spin up external EC2 instances to run the actually jobs. Is that a valid approach?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					ZippyAlligator65
				
					0
					 × 1

btw in k8s we abandoned the usage of services since it’s not needed anymore. you can put an agent consuming a queue and enqueue task to it

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					JuicyFox94
				
					0
					 × 1

It’s about strategy. If you have ClearML server installed on k8s I guess you want to run task on same k8s cluster. In this case using latest clearml-agent chart is the way to go that uses glue agent uinder the hood. Basically what happens is agent will spin new pod when a new task is enqueued in related queue. At this point it’s k8s duty to have enough resources to spawn the pod and this can be achieved in two ways:
you have enough resources already there you have a k8s autoscaler that can spawn nodes to reach enough resources so pod can be spawned

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					JuicyFox94
				
					0
					 × 1

Write your answer

1K Views

4 Answers

2 years ago

one year ago