
Reputation
Badges 1
53 × Eureka!an implementation of this kind is interesting for you or do you suggest to fork? I mean, I don't want to impact your time reviewing
You can’t write on readonly replica is about MongoDB. I guess you are using a multiple replica setup. In this case the mongodb dependency chart have a lot of parameters to tweak the system and maybe also an arbiter is good for you. But this is a huge topic regarding mongodb specific k8s setups.
I suggest to try exec into agent pod and try to do some kubectl command like a simple kubectl get pod
Basically you can install lates clearml chart
I need to investigate, ScrawnyLion96 can you pls open an issue on https://github.com/allegroai/clearml-helm-charts ?
can you also show output of kubectl get po
of the namespace where you installaed clearml?
Interesting use case, maybe we can create multiple k8s agents for different queues
I’m going to investigate (and fix it if possible) in some day
Hi ApprehensiveSeahorse83 , today we released clearml-agent
chart that just installs glue agent. My suggestion is to disable k8s glue and any other agent from the clearml
chart and install more than one clearml-agent
chart in different namespaces. In this way you will be able to have k8s glue for every queue (cpu and gpu).
Hi everyone, I just fixed releases so new charts containing this fix are published. ty!
on OSS it’s usually the only way to as many agent deployments for any queue you define
still need time because I have two very busy days
Hey ApprehensiveSeahorse83 , I didn’t forget about you, it’s just a busy time for me; will answer during the day after a couple of more tests on my testing env.
regardless of this I probably need to add some more detailed explanations on credentials configs
Did you added volumeMounts to right section? it should be under basePodTemplate
in override file.
Hi, in k8s autoscaling must be managed by cloud pro user autoscaler. When the clearml-agent bound to related queue will spawn a new task pod with configured resources, k8s will adapt. On AWS you can start here https://docs.aws.amazon.com/eks/latest/userguide/autoscaling.html
In this case I suggest to give a try to k8s-glue that is there by default in latest chart version
Ty, I have other stuff that I'd like to send but it's better to get these eventually merged first so I can proceed to shiny news PR in the near future 😄
O k, I’d like to test it more with you; credentials exposed in chart values are system ones and it’s better to not change them; let’s forget about them for now. If you create a new accesskey/secretkey pair in ui, you should use these ones in your agents and they shuld not get overwritten in any way; can you confirm it works without touching credentials
section?
in Enterprise we support multiqueueing but it’s a different story
if they are in kubernetes you can simply use k8s glue
Ok, let’s try to deep dive into it, what is the Helm chart version used for this deployment?
In my case I have a similar need; I wrote a never-ending Task similar to this one used for cleanup: https://github.com/allegroai/clearml/blob/master/examples/services/cleanup/cleanup_service.py
I can understand you maybe got confused
how you cluster reacts is about scaling infra as much as needed (karpenter or any other cloud autoscaler should work)