Hi @<1752864322440138752:profile|GiddyDragonfly90> - Can you try with the last value you proposed, but use :
to separate user and password in the string, like this:
externalServices:
elasticsearchConnectionString: '[{"scheme":"http","host":"elastic:toto@elasticsearch-es-http","port":9200}]'
Sure! I'll talk to the guys to update the documentation 🙂
For this to work you might also have to add "secure": true
in the connection string object:
externalServices:
elasticsearchConnectionString: "[...,"secure":true,...]"
I see, in the example you provided you used a comma ,
to separate username and password, I suggest trying to use a column :
@<1752864322440138752:profile|GiddyDragonfly90> - MongoDB is used as a dependency Helm Chart from the Bitnami repo. We are using version 12.1.31 of the chart. See this tag None
In the clearml override values, under the mongodb
section you can specify any value that is usable in the original chart 🙂
🙂 let me know if that works for you
@<1669152726245707776:profile|ManiacalParrot65> could you please send your values file override for the Agent helm chart?
Hey @<1734020156465614848:profile|ClearKitten90> - You can try with the following in your ClearML Agent override helm values. Make sure to replace mygitusername
and git-password
agentk8sglue:
basePodTemplate:
env:
# to setup access to private repo, setup secret with git credentials
- name: CLEARML_AGENT_GIT_USER
value: mygitusername
- name: CLEARML_AGENT_GIT_PASS
valueFrom:
secretKeyRef:
name: git-password
...
For tasks Pods running your experiments through the agent you can change the base image to something you like and have the Python version you need. You can use this section of the values:
agentk8sglue:
# -- default container image for ClearML Task pod
defaultContainerImage: ubuntu:18.04 # <-- Change me!!
Do you mean the Python version that is installed on the clearml agent itself? Or do you mean the Python version available in tasks that will be run from the agent?
Hey @<1649221394904387584:profile|RattySparrow90> - You can try configuring CLEARML__logging__root__level
as an extraEnvs for the apiserver and fileserver 🙂
value can be DEBUG, INFO, WARNING, ERROR, CRITICAL
If that doesn't work, try removing the auth from the connection string and instead define two extraEnvs
for the apiserver
:
apiserver:
extraEnvs:
- name: CLEARML_ELASTIC_SERVICE_USERNAME
value: "elastic"
- name: CLEARML_ELASTIC_SERVICE_PASSWORD
value: "toto"
Hey @<1743079861380976640:profile|HighKitten20> - Try to configure this section in the values override file for the Agent helm chart:
# -- Private image registry configuration
imageCredentials:
# -- Use private authentication mode
enabled: false
# -- If this is set, chart will not generate a secret but will use what is defined here
existingSecret: ""
# -- Registry name
registry: docker.io
# -- Registry username
username: someone
# -- Registry password
password: pwd...
Hey @<1734020208089108480:profile|WickedHare16> - Not 100% sure this is the issue, but I noticed a wrong configuration in your values.
You configured both these:
elasticsearch:
enabled: true
externalServices:
# -- Existing ElasticSearch connectionstring if elasticsearch.enabled is false (example in values.yaml)
elasticsearchConnectionString: "[{\"host\":\"es_hostname1\",\"port\":9200},{\"host\":\"es_hostname2\",\"port\":9200},{\"host\":\"es_hostname3\",\"port\":9200}]"
Pl...
@<1734020208089108480:profile|WickedHare16> - please try configuring the cookieDomain
clearml:
cookieDomain: ""
You should set it as your base domain, example pixis.internal
, without any api or files in front of it
Hey @<1726047624538099712:profile|WorriedSwan6> - I am sorry, I forgot that the multi-queue feature with templateOverrides is only for the enterprise version.
What you can do, though, is to deploy two different agents in k8s using the helm chart. Simply try installing two different releases, then modify only one of them to have basePodTemplate use the nvidia.com/gpu
: "4"
Let me know if this solves your issue 🙂
@<1752864322440138752:profile|GiddyDragonfly90> - I think you can also add verify_certs: false
in the same elasticsearchConnectionString
object, have you tried?
Hey @<1523701304709353472:profile|OddShrimp85> - You can tweak the following section in the clearml-agent override values:
# -- Global parameters section
global:
# -- Images registry
imageRegistry: "docker.io"
# -- Private image registry configuration
imageCredentials:
# -- Use private authentication mode
enabled: true # <-- Set this to true
# -- Registry name
registry: docker.io
# -- Registry username
username: someone
# -- Registry password
password: pwd
# -- ...
Hey @<1734020208089108480:profile|WickedHare16> , could you please share your override values file for the clearml helm chart?
Hey @<1726047624538099712:profile|WorriedSwan6> , the basePodTemplate
sections configures the default base template for all pods spawned by the Agent.
If you don't want every Task (or Pod) to use the same requests/limits, one thing you could try is to set up multiple queues in the Agent.
Each queue can then have an override of the Pod template.
So, you can try removing the nvidia.com/gpu : "4"
from the root basePodTemplate
and add a section like this in ...
I think Mongo does not like for its db folder to be replaced like this in the running Pod.
You can try by turning off Mongo for a moment (scale it down to 0 replicas from the deployment), then create a one-time Pod (non-mongo, you can use an ubuntu image for example) mounting the same volume that Mongo was mounting, and try using this Pod to copy the db folder in the right place. When it's done, delete this Pod and scale back to 1 the Mongo deployment.
@<1736194540286513152:profile|DeliciousSeaturtle82> when you copy the folder on the new pod, it crashes almost instantly?
@<1736194540286513152:profile|DeliciousSeaturtle82> the data folder for mongo4 and mongo5 might be slightly different. What is the target path where you're moving data in mongo5? And how is that mounted?
And when you say "broken", could you elaborate on that? Does the target Mongo Pod crash when trying to move the data? Or you succeed in copying the data but can't see the result in the UI?
Hey @<1736194540286513152:profile|DeliciousSeaturtle82> , yes please try changing the health check to /debug.conf
or /debug.ping
🙂