Interesting use case, maybe we can create multiple k8s agents for different queues
I’m going to investigate this specific use case and will get back to you
Hey ApprehensiveSeahorse83 , I didn’t forget about you, it’s just a busy time for me; will answer during the day after a couple of more tests on my testing env.
Hi ApprehensiveSeahorse83 , today we released clearml-agent
chart that just installs glue agent. My suggestion is to disable k8s glue and any other agent from the clearml
chart and install more than one clearml-agent
chart in different namespaces. In this way you will be able to have k8s glue for every queue (cpu and gpu).
but the system account key and secret can’t be the same for every installation, no? i need to generate specific one for my installation, no?
if they are in kubernetes you can simply use k8s glue
In this case I suggest to give a try to k8s-glue that is there by default in latest chart version
For now we used a workaround and forked the helm charts repo and we changed in the agents deployment.yaml, instead of taking the key and secret from the clearml-conf secret we take them from another secret we created so the server does not “know” about this new key and secret and does not reset them
for now we used fixed number of cpu agents but it will be better if it was dynamic with glue agent
we already using glue to manage our gpu pods. The agents we use for the pipelines are simple cpu agent.
but I will try to find something good for you
can we use multiple k8s-glue - one for cpu and one for gpu pods?
regardless of this I probably need to add some more detailed explanations on credentials configs
not urgent after we used the workaround
O k, I’d like to test it more with you; credentials exposed in chart values are system ones and it’s better to not change them; let’s forget about them for now. If you create a new accesskey/secretkey pair in ui, you should use these ones in your agents and they shuld not get overwritten in any way; can you confirm it works without touching credentials
section?
Not sure I understand, you are saying I should not create user credentials and add them in values.yaml at secret.credentials.apiserver and secret.credentials.tests. ?
since the gpu is expensive we want the glue to manage the pods
and then use them in agents if they are external
still need time because I have two very busy days