It’s about strategy. If you have ClearML server installed on k8s I guess you want to run task on same k8s cluster. In this case using latest clearml-agent chart is the way to go that uses glue agent uinder the hood. Basically what happens is agent will spin new pod when a new task is enqueued in related queue. At this point it’s k8s duty to have enough resources to spawn the pod and this can be achieved in two ways:
you have enough resources already there you have a k8s autoscaler that can spawn nodes to reach enough resources so pod can be spawned
Hi, in k8s autoscaling must be managed by cloud pro user autoscaler. When the clearml-agent bound to related queue will spawn a new task pod with configured resources, k8s will adapt. On AWS you can start here https://docs.aws.amazon.com/eks/latest/userguide/autoscaling.html
Could you elaborate? What is “cloud pro user autoscaler” are you referring to the managed version of ClearML vs self-hosted? The ClearML-agent mentions two “flavors” of k8s integration as I understand it: a daemon ClearML-agent to spin up up sibling containers vs direct mapping to k8s jobs. Does the ClearML-agent helm chart allow you to chose or is only set up for the k8s glue method? I’ve seen some people have some troubles with the k8 glue method, so I was going to try and have a single daemon agent inside k8s for the services queue, but then use the autoscaler.py to spin up external EC2 instances to run the actually jobs. Is that a valid approach?
btw in k8s we abandoned the usage of services since it’s not needed anymore. you can put an agent consuming a queue and enqueue task to it