Hi, in k8s autoscaling must be managed by cloud pro user autoscaler. When the clearml-agent bound to related queue will spawn a new task pod with configured resources, k8s will adapt. On AWS you can start here https://docs.aws.amazon.com/eks/latest/userguide/autoscaling.html
It’s about strategy. If you have ClearML server installed on k8s I guess you want to run task on same k8s cluster. In this case using latest clearml-agent chart is the way to go that uses glue agent uinder the hood. Basically what happens is agent will spin new pod when a new task is enqueued in related queue. At this point it’s k8s duty to have enough resources to spawn the pod and this can be achieved in two ways:
you have enough resources already there you have a k8s autoscaler that can spawn nodes to reach enough resources so pod can be spawned
btw in k8s we abandoned the usage of services since it’s not needed anymore. you can put an agent consuming a queue and enqueue task to it
Could you elaborate? What is “cloud pro user autoscaler” are you referring to the managed version of ClearML vs self-hosted? The ClearML-agent mentions two “flavors” of k8s integration as I understand it: a daemon ClearML-agent to spin up up sibling containers vs direct mapping to k8s jobs. Does the ClearML-agent helm chart allow you to chose or is only set up for the k8s glue method? I’ve seen some people have some troubles with the k8 glue method, so I was going to try and have a single daemon agent inside k8s for the services queue, but then use the autoscaler.py to spin up external EC2 instances to run the actually jobs. Is that a valid approach?