Do you mean the Python version that is installed on the clearml agent itself? Or do you mean the Python version available in tasks that will be run from the agent?
@<1669152726245707776:profile|ManiacalParrot65> could you please send your values file override for the Agent helm chart?
@<1734020208089108480:profile|WickedHare16> - please try configuring the cookieDomain
clearml:
cookieDomain: ""
You should set it as your base domain, example pixis.internal
, without any api or files in front of it
For this to work you might also have to add "secure": true
in the connection string object:
externalServices:
elasticsearchConnectionString: "[...,"secure":true,...]"
@<1752864322440138752:profile|GiddyDragonfly90> - MongoDB is used as a dependency Helm Chart from the Bitnami repo. We are using version 12.1.31 of the chart. See this tag None
In the clearml override values, under the mongodb
section you can specify any value that is usable in the original chart 🙂
Hi @<1752864322440138752:profile|GiddyDragonfly90> - Can you try with the last value you proposed, but use :
to separate user and password in the string, like this:
externalServices:
elasticsearchConnectionString: '[{"scheme":"http","host":"elastic:toto@elasticsearch-es-http","port":9200}]'
I see, in the example you provided you used a comma ,
to separate username and password, I suggest trying to use a column :
Hey @<1734020208089108480:profile|WickedHare16> - Not 100% sure this is the issue, but I noticed a wrong configuration in your values.
You configured both these:
elasticsearch:
enabled: true
externalServices:
# -- Existing ElasticSearch connectionstring if elasticsearch.enabled is false (example in values.yaml)
elasticsearchConnectionString: "[{\"host\":\"es_hostname1\",\"port\":9200},{\"host\":\"es_hostname2\",\"port\":9200},{\"host\":\"es_hostname3\",\"port\":9200}]"
Pl...
Hey @<1726047624538099712:profile|WorriedSwan6> , the basePodTemplate
sections configures the default base template for all pods spawned by the Agent.
If you don't want every Task (or Pod) to use the same requests/limits, one thing you could try is to set up multiple queues in the Agent.
Each queue can then have an override of the Pod template.
So, you can try removing the nvidia.com/gpu : "4"
from the root basePodTemplate
and add a section like this in ...
@<1736194540286513152:profile|DeliciousSeaturtle82> when you copy the folder on the new pod, it crashes almost instantly?
I think Mongo does not like for its db folder to be replaced like this in the running Pod.
You can try by turning off Mongo for a moment (scale it down to 0 replicas from the deployment), then create a one-time Pod (non-mongo, you can use an ubuntu image for example) mounting the same volume that Mongo was mounting, and try using this Pod to copy the db folder in the right place. When it's done, delete this Pod and scale back to 1 the Mongo deployment.
Hey @<1649221394904387584:profile|RattySparrow90> - You can try configuring CLEARML__logging__root__level
as an extraEnvs for the apiserver and fileserver 🙂
value can be DEBUG, INFO, WARNING, ERROR, CRITICAL
Hey @<1734020208089108480:profile|WickedHare16> , could you please share your override values file for the clearml helm chart?
Hey @<1743079861380976640:profile|HighKitten20> - Try to configure this section in the values override file for the Agent helm chart:
# -- Private image registry configuration
imageCredentials:
# -- Use private authentication mode
enabled: false
# -- If this is set, chart will not generate a secret but will use what is defined here
existingSecret: ""
# -- Registry name
registry: docker.io
# -- Registry username
username: someone
# -- Registry password
password: pwd...
Hey @<1523701304709353472:profile|OddShrimp85> - You can tweak the following section in the clearml-agent override values:
# -- Global parameters section
global:
# -- Images registry
imageRegistry: "docker.io"
# -- Private image registry configuration
imageCredentials:
# -- Use private authentication mode
enabled: true # <-- Set this to true
# -- Registry name
registry: docker.io
# -- Registry username
username: someone
# -- Registry password
password: pwd
# -- ...
@<1736194540286513152:profile|DeliciousSeaturtle82> the data folder for mongo4 and mongo5 might be slightly different. What is the target path where you're moving data in mongo5? And how is that mounted?
And when you say "broken", could you elaborate on that? Does the target Mongo Pod crash when trying to move the data? Or you succeed in copying the data but can't see the result in the UI?
@<1752864322440138752:profile|GiddyDragonfly90> - I think you can also add verify_certs: false
in the same elasticsearchConnectionString
object, have you tried?
Hey @<1726047624538099712:profile|WorriedSwan6> - I am sorry, I forgot that the multi-queue feature with templateOverrides is only for the enterprise version.
What you can do, though, is to deploy two different agents in k8s using the helm chart. Simply try installing two different releases, then modify only one of them to have basePodTemplate use the nvidia.com/gpu
: "4"
Let me know if this solves your issue 🙂
🙂 let me know if that works for you
If that doesn't work, try removing the auth from the connection string and instead define two extraEnvs
for the apiserver
:
apiserver:
extraEnvs:
- name: CLEARML_ELASTIC_SERVICE_USERNAME
value: "elastic"
- name: CLEARML_ELASTIC_SERVICE_PASSWORD
value: "toto"
For tasks Pods running your experiments through the agent you can change the base image to something you like and have the Python version you need. You can use this section of the values:
agentk8sglue:
# -- default container image for ClearML Task pod
defaultContainerImage: ubuntu:18.04 # <-- Change me!!
Sure! I'll talk to the guys to update the documentation 🙂
Hey @<1734020156465614848:profile|ClearKitten90> - You can try with the following in your ClearML Agent override helm values. Make sure to replace mygitusername
and git-password
agentk8sglue:
basePodTemplate:
env:
# to setup access to private repo, setup secret with git credentials
- name: CLEARML_AGENT_GIT_USER
value: mygitusername
- name: CLEARML_AGENT_GIT_PASS
valueFrom:
secretKeyRef:
name: git-password
...
Hey @<1736194540286513152:profile|DeliciousSeaturtle82> , yes please try changing the health check to /debug.conf
or /debug.ping
🙂
@<1710827340621156352:profile|HungryFrog27> have you installed the Nvidia gpu-operator to advertise GPUs to Kubernetes?