Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello, I Am First Timer In Clearml And Try To Deploy Locally A Clear Ml Server (Successfully) And Then Agent In My Kubernetes Cluster. I Follow The Helm Chart From "Helm Repo Add Clearml

Hello, I am first timer in ClearML and try to deploy locally a Clear ML server (successfully) and then agent in my Kubernetes cluster. I follow the helm chart from "helm repo add clearml None " and in the helm chart values for agent I changed the below parameters:

agentk8sglueKey: <API KEY>
agentk8sglueSecret: <ACCESS KEY>

-- Reference to Api server url

apiServerUrlReference: " None "

-- Reference to File server url

fileServerUrlReference: " None "

-- Reference to Web server url

webServerUrlReference: " None "

the rest all stay with default values

The pod is running and then goes into restart mode and CrashLoopBack mode

ubuntu@vm4v9lm3:~$ kubectl get pods
NAME READY STATUS RESTARTS AGE
clearml-agent-7c6d58c497-xk8hn 0/1 CrashLoopBackOff 9 (3m11s ago) 25m
clearml-apiserver-57d4f9776d-pgn6q 1/1 Running 0 7h58m
clearml-apiserver-asyncdelete-59484594b9-zdm4p 1/1 Running 0 7h58m
clearml-elastic-master-0 1/1 Running 0 7h58m
clearml-fileserver-769d646d7-tzpg6 1/1 Running 0 7h58m
clearml-mongodb-5f995fbb5-mgwbt 1/1 Running 0 7h58m
clearml-redis-master-0 1/1 Running 0 7h58m
clearml-webserver-7df664dcbf-856f9 1/1 Running 0 7h58m
jupyter-notebook-84c6f6fcf9-4lrrv 1/1 Running 0 38m

The logs are below. Any idea what is wrong?
Any other value to update in helm chart for agent?

/root/entrypoint.sh: line 29: /root/clearml.conf: Read-only file system

  • echo 'api.api_server: None '
    /root/entrypoint.sh: line 30: /root/clearml.conf: Read-only file system
  • echo 'api.web_server: None '
    /root/entrypoint.sh: line 31: /root/clearml.conf: Read-only file system
  • echo 'api.files_server: None '
    /root/entrypoint.sh: line 32: /root/clearml.conf: Read-only file system
  • ./provider_entrypoint.sh
  • source /root/.bashrc
    ++ '[' -z '' ']'
    ++ return
  • export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/root/bin:/root/bin
  • PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/root/bin:/root/bin
  • [[ -z '' ]]
  • python3 k8s_glue_example.py --queue default --namespace default --template-yaml /root/template/template.yaml
    /usr/local/lib/python3.6/dist-packages/clearml_agent/_vendor/jwt/utils.py:7: CryptographyDeprecationWarning: Python 3.6 is no longer supported by the Python core team. Therefore, support for it is deprecated in cryptography and will be removed in a future release.
    from cryptography.hazmat.primitives.asymmetric.ec import EllipticCurve
    Traceback (most recent call last):
    File "k8s_glue_example.py", line 8, in <module>
    from clearml_agent.glue.k8s import K8sIntegration
    File "/usr/local/lib/python3.6/dist-packages/clearml_agent/glue/k8s.py", line 19, in <module>
    from clearml_agent.commands.events import Events
    File "/usr/local/lib/python3.6/dist-packages/clearml_agent/commands/init.py", line 3, in <module>
    from .worker import Worker
    File "/usr/local/lib/python3.6/dist-packages/clearml_agent/commands/worker.py", line 47, in <module>
    from clearml_agent.commands.base import resolve_names, ServiceCommandSection
    File "/usr/local/lib/python3.6/dist-packages/clearml_agent/commands/base.py", line 20, in <module>
    from clearml_agent.interface.base import ObjectID
    File "/usr/local/lib/python3.6/dist-packages/clearml_agent/interface/init.py", line 7, in <module>
    from .base import Parser, base_arguments, add_service, OnlyPluralChoicesHelpFormatter
    File "/usr/local/lib/python3.6/dist-packages/clearml_agent/interface/base.py", line 12, in <module>
    from clearml_agent.session import Session
    File "/usr/local/lib/python3.6/dist-packages/clearml_agent/session.py", line 23, in <module>
    from clearml_agent.helper.docker_args import DockerArgsSanitizer, sanitize_urls
    File "/usr/local/lib/python3.6/dist-packages/clearml_agent/helper/docker_args.py", line 279, in <module>
    class CustomTemplate(Template):
    File "/usr/lib/python3.6/string.py", line 74, in init
    cls.pattern = _re.compile(pattern, cls.flags | _re.VERBOSE)
    File "/usr/lib/python3.6/re.py", line 233, in compile
    return _compile(pattern, flags)
    File "/usr/lib/python3.6/re.py", line 301, in _compile
    p = sre_compile.compile(pattern, flags)
    File "/usr/lib/python3.6/sre_compile.py", line 562, in compile
    p = sre_parse.parse(p, flags)
    File "/usr/lib/python3.6/sre_parse.py", line 855, in parse
    p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
    File "/usr/lib/python3.6/sre_parse.py", line 416, in _parse_sub
    not nested and not items))
    File "/usr/lib/python3.6/sre_parse.py", line 765, in _parse
    p = _parse_sub(source, state, sub_verbose, nested + 1)
    File "/usr/lib/python3.6/sre_parse.py", line 416, in _parse_sub
    not nested and not items))
    File "/usr/lib/python3.6/sre_parse.py", line 765, in _parse
    p = _parse_sub(source, state, sub_verbose, nested + 1)
    File "/usr/lib/python3.6/sre_parse.py", line 416, in _parse_sub
    not nested and not items))
    File "/usr/lib/python3.6/sre_parse.py", line 734, in _parse
    flags = _parse_flags(source, state, char)
    File "/usr/lib/python3.6/sre_parse.py", line 803, in _parse_flags
    raise source.error("bad inline flags: cannot turn on global flag", 1)
    sre_constants.error: bad inline flags: cannot turn on global flag at position 92 (line 4, column 20)
  
  
Posted 3 months ago
Votes Newest

Answers 46


Ok will try it

  
  
Posted 3 months ago

I will get back at you in 15mn if thats ok

  
  
Posted 3 months ago

for now:

  • name: clearml-access-key
    value: CLEARML8AGENT9KEY1234567890ABCD
    - name: clearml-secret-key
    value: CLEARML-AGENT-SECRET-1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ123456
    - name: admin-password
    value: clearml123!
  
  
Posted 3 months ago

yes

  
  
Posted 3 months ago

Since with argo i can pass them as params

  
  
Posted 3 months ago

It's a bit hard for me to provide support here with the additional layer of Argo.
I assume the server is working fine and you can open the clearml UI and log in, right? If yes, would it be possible to extract the Agent part only, out of Argo, and proceed installing it through standard helm?

  
  
Posted 3 months ago

Hi @<1857232027015712768:profile|PompousCrow47> , are you using pods with a read-only-filesystem limitation?

  
  
Posted 3 months ago

Sorry we had a short delay on the deployment but

with these values:

clearml:
  agentk8sglueKey: "8888TMDLWYY7ZQJJ0I7R2X2RSP8XFT"
  agentk8sglueSecret: "oNODbBkDGhcDscTENQyr-GM0cE8IO7xmpaPdqyfsfaWearo1S8EQ8eBOxu-opW8dVUU"
  clearmlConfig: |-
    api {
        web_server: 

        api_server: 

        files_server: 

        credentials {
            "access_key" = "8888TMDLWYY7ZQJJ0I7R2X2RSP8XFT"
            "secret_key" = "oNODbBkDGhcDscTENQyr-GM0cE8IO7xmpaPdqyfsfaWearo1S8EQ8eBOxu-opW8dVUU"
        }
    }

agentk8sglue:
  # Try different image versions to avoid Python 3.6 regex issue
  image:
    repository: allegroai/clearml-agent-k8s-base
    tag: "latest"  # Use latest instead of specific version
    pullPolicy: Always

  # Essential server references
  apiServerUrlReference: "
"
  fileServerUrlReference: "
"
  webServerUrlReference: "
"

  # Disable certificate checking
  clearmlcheckCertificate: false

  # Queue configuration
  queue: default
  createQueueIfNotExists: true

  # Minimal resources
  resources:
    limits:
      cpu: 500m
      memory: 1Gi
    requests:
      cpu: 100m
      memory: 256Mi

sessions:
  svcType: NodePort
  externalIP: 192.168.70.211
  startingPort: 30100
  maxServices: 5
EOF

The following commands


helm repo add clearml 

helm repo update
helm install clearml-agent clearml/clearml-agent \
  --namespace clearml-prod \
  --values clearml-agent-values.yaml \
  --wait \
  --timeout 300s
"clearml" already exists with the same configuration, skipping
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "argo" chart repository
...Successfully got an update from the "clearml" chart repository
...Successfully got an update from the "harbor" chart repository
...Successfully got an update from the "nvidia" chart repository
Update Complete. ⎈Happy Helming!⎈
NAME: clearml-agent
LAST DEPLOYED: Mon Jul 21 15:11:38 2025
NAMESPACE: clearml-prod
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Glue Agent deployed.
  
  
Posted 3 months ago

then share what i found

  
  
Posted 3 months ago

I have separed the most crutial part. Its a container that runs the standard helm commands

example:
....
cat > /tmp/server-values.yaml <<EOF
global:
defaultStorageClass: $STORAGE_CLASS

        apiserver:

...
helm install clearml clearml/clearml
--namespace "$NS"
--values /tmp/server-values.yaml
--wait
--timeout "$TMO"

...

helm install clearml-agent clearml/clearml-agent
--namespace "$NS"
--values /tmp/simple-agent-values.yaml
--wait
--timeout 300s

these are the values :

clearml:
agentk8sglueKey: $ACCESS_KEY
agentk8sglueSecret: $SECRET_KEY
clearmlConfig: |-
api {
web_server: http://$NODE_IP:30080
api_server: http://$NODE_IP:30008
files_server: http://$NODE_IP:30081
credentials {
"access_key" = "$ACCESS_KEY"
"secret_key" = "$SECRET_KEY"
}
}

        agentk8sglue:
          # Try newer image version to fix Python 3.6 regex issue
          image:
            repository: allegroai/clearml-agent-k8s-base
            tag: "1.25-1"
            pullPolicy: Always

          apiServerUrlReference: "http://$NODE_IP:30008"
          fileServerUrlReference: "http://$NODE_IP:30081"
          webServerUrlReference: "http://$NODE_IP:30080"
          clearmlcheckCertificate: false
          queue: default
          createQueueIfNotExists: true

          # Keep resources minimal for testing
          resources:
            limits:
              cpu: 500m
              memory: 1Gi
            requests:
              cpu: 100m
              memory: 256Mi

        sessions:
          svcType: NodePort
          externalIP: $NODE_IP
          startingPort: 30100
          maxServices: 5
  
  
Posted 3 months ago

Yeah i know.. thats what i did for the github implementation, but for this i need them to be generated on the fly or via CLI that i can use argo to create if thats possible

  
  
Posted 3 months ago

I will try to create them on the UI and only run the Agent task on argo or so to see if it helps

  
  
Posted 3 months ago

It got an error then backed off

  
  
Posted 3 months ago

As far as i can test, the server is going ok, i had some isses with resources not loading but solved those. The bigger issue for now is agent and prob could propagate to the serving. Later on i plan on adding also gpu resouces to both so im not entirely sure on that part

clearml-apiserver-866ccf75f7-zr5wx 1/1 Running 0 37m
clearml-apiserver-asyncdelete-8dfb574b8-8gbcv 1/1 Running 0 37m
clearml-elastic-master-0 1/1 Running 0 37m
clearml-fileserver-86b8ddf6f6-4xnqd 1/1 Running 0 37m
clearml-mongodb-5f995fbb5-xmdpb 1/1 Running 0 37m
clearml-redis-master-0 1/1 Running 0 37m
clearml-webserver-c487cfcb-vv5z5 1/1 Running 0 37m
image

  
  
Posted 3 months ago

I assume the key and secret values here are redacted values and not the actual ones, right?

  
  
Posted 3 months ago

I will try :
1- update the agent with these values
2- run argo with those changes

  
  
Posted 3 months ago

@<1729671499981262848:profile|CooperativeKitten94> @<1857232027015712768:profile|PompousCrow47>

I figured it out for future reference this is a error regarding the Kubernetes Support on the agent : None

As for getting the credentials to lauch the agent the only way i can do it is via UI manually i could not get a way to get them via code

  
  
Posted 3 months ago

with the values on helm

 helm get values clearml-agent -n clearml-prod
USER-SUPPLIED VALUES:
agentk8sglue:
  apiServerUrlReference: 

  clearmlcheckCertificate: false
  createQueueIfNotExists: true
  fileServerUrlReference: 

  image:
    pullPolicy: Always
    repository: allegroai/clearml-agent-k8s-base
    tag: latest
  queue: default
  resources:
    limits:
      cpu: 500m
      memory: 1Gi
    requests:
      cpu: 100m
      memory: 256Mi
  webServerUrlReference: 

clearml:
  agentk8sglueKey: 8888TMDLWYY7ZQJJ0I7R2X2RSP8XFT
  agentk8sglueSecret: oNODbBkDGhcDscTENQyr-GM0cE8IO7xmpaPdqyfsfaWearo1S8EQ8eBOxu-opW8dVUU
  clearmlConfig: |-
    api {
        web_server: 

        api_server: 

        files_server: 

        credentials {
            "access_key" = "8888TMDLWYY7ZQJJ0I7R2X2RSP8XFT"
            "secret_key" = "oNODbBkDGhcDscTENQyr-GM0cE8IO7xmpaPdqyfsfaWearo1S8EQ8eBOxu-opW8dVUU"
        }
    }
sessions:
  externalIP: 192.168.70.211
  maxServices: 5
  startingPort: 30100
  svcType: NodePort
jcarvalho@kharrinhao:~$
  
  
Posted 3 months ago

Please replace those credentials on the Agent and try upgrading the helm release

  
  
Posted 3 months ago

Yes i am using those, they are hardcoded ones cause i will on a later stage generate them via a secure method

  
  
Posted 3 months ago

Hi! Im using just a plain Kubernetes cluster (kubeadm) running on Proxmox VM, and im using Argo to deploy the helm, in order to standarize it Let me know if you need any more details!

  
  
Posted 3 months ago

Hi @<1811208768843681792:profile|BraveGrasshopper38> , following up on your last message, are you running in an OpenShift k8s cluster?

  
  
Posted 3 months ago

The value field is a default argo falls back into if i dont provide any

  
  
Posted 3 months ago

I had no issues deploying via the Github but helm is quite more confusing

  
  
Posted 3 months ago

Also, in order to simplify the installation, can you use a simpler version of your values for now, something like this should work:

agentk8sglue:
  apiServerUrlReference: 

  clearmlcheckCertificate: false
  createQueueIfNotExists: true
  fileServerUrlReference: 

  queue: default
  resources:
    limits:
      cpu: 500m
      memory: 1Gi
    requests:
      cpu: 100m
      memory: 256Mi
  webServerUrlReference: 

clearml:
  agentk8sglueKey: <NEW_KEY>
  agentk8sglueSecret: <NEW_SECRET>
sessions:
  externalIP: 192.168.70.211
  maxServices: 5
  startingPort: 30100
  svcType: NodePort
  
  
Posted 3 months ago

Python regex error in k8s glue agent :

sre_constants.error: bad inline flags: cannot turn on global flag at position 92
  • Issue is in clearml-agent k8s glue codebase (Python 3.6 compatibility)
  • Not configuration-related - persists across different HOCON formats
  • Affects image tags: 1.24-21 , 1.24-23 , latest
  
  
Posted 3 months ago

Cause when i check it references to 3y ago and i am following this: None

  
  
Posted 3 months ago

Oh, okay, not sure this will be the only issue but you'll need these credentials to be valid, since they are used by the ClearML Agent to connect to the ClearML Server 🙂
The easiest way to generate credentials is to open the ClearML UI in the browser, login with an Admin user, then navigate to the Settings located on the top right corner when clicking on the user icon. From there go to "Workspace" and click "Create new credentials" and use the value provided

  
  
Posted 3 months ago

Hey! @<1729671499981262848:profile|CooperativeKitten94> Is there any tips you can give me on this?

It seems like the most recent version supported for kubernetes is clearml-agent==1.9.2?

thanks again!

  
  
Posted 3 months ago

Oh no worries, I understand 😄
Sure, if you could share the whole values and configs you're using to run both the server and agent that would be useful.
Also what about other Pods from the ClearML server, are there any other crash or similar error referring to a read-only filesystem? Are the server and agent installed on the same K8s node?

  
  
Posted 3 months ago
12K Views
46 Answers
3 months ago
3 months ago
Tags
Similar posts