Answered

Hello, I Am First Timer In Clearml And Try To Deploy Locally A Clear Ml Server (Successfully) And Then Agent In My Kubernetes Cluster. I Follow The Helm Chart From "Helm Repo Add Clearml

Hello, I am first timer in ClearML and try to deploy locally a Clear ML server (successfully) and then agent in my Kubernetes cluster. I follow the helm chart from "helm repo add clearml None " and in the helm chart values for agent I changed the below parameters:

agentk8sglueKey: <API KEY>
agentk8sglueSecret: <ACCESS KEY>

-- Reference to Api server url

apiServerUrlReference: " None "

-- Reference to File server url

fileServerUrlReference: " None "

-- Reference to Web server url

webServerUrlReference: " None "

the rest all stay with default values

The pod is running and then goes into restart mode and CrashLoopBack mode

ubuntu@vm4v9lm3:~$ kubectl get pods
NAME READY STATUS RESTARTS AGE
clearml-agent-7c6d58c497-xk8hn 0/1 CrashLoopBackOff 9 (3m11s ago) 25m
clearml-apiserver-57d4f9776d-pgn6q 1/1 Running 0 7h58m
clearml-apiserver-asyncdelete-59484594b9-zdm4p 1/1 Running 0 7h58m
clearml-elastic-master-0 1/1 Running 0 7h58m
clearml-fileserver-769d646d7-tzpg6 1/1 Running 0 7h58m
clearml-mongodb-5f995fbb5-mgwbt 1/1 Running 0 7h58m
clearml-redis-master-0 1/1 Running 0 7h58m
clearml-webserver-7df664dcbf-856f9 1/1 Running 0 7h58m
jupyter-notebook-84c6f6fcf9-4lrrv 1/1 Running 0 38m

The logs are below. Any idea what is wrong?
Any other value to update in helm chart for agent?

/root/entrypoint.sh: line 29: /root/clearml.conf: Read-only file system

echo 'api.api_server: None '
/root/entrypoint.sh: line 30: /root/clearml.conf: Read-only file system
echo 'api.web_server: None '
/root/entrypoint.sh: line 31: /root/clearml.conf: Read-only file system
echo 'api.files_server: None '
/root/entrypoint.sh: line 32: /root/clearml.conf: Read-only file system
./provider_entrypoint.sh
source /root/.bashrc
++ '[' -z '' ']'
++ return
export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/root/bin:/root/bin
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/root/bin:/root/bin
[[ -z '' ]]
python3 k8s_glue_example.py --queue default --namespace default --template-yaml /root/template/template.yaml
/usr/local/lib/python3.6/dist-packages/clearml_agent/_vendor/jwt/utils.py:7: CryptographyDeprecationWarning: Python 3.6 is no longer supported by the Python core team. Therefore, support for it is deprecated in cryptography and will be removed in a future release.
from cryptography.hazmat.primitives.asymmetric.ec import EllipticCurve
Traceback (most recent call last):
File "k8s_glue_example.py", line 8, in <module>
from clearml_agent.glue.k8s import K8sIntegration
File "/usr/local/lib/python3.6/dist-packages/clearml_agent/glue/k8s.py", line 19, in <module>
from clearml_agent.commands.events import Events
File "/usr/local/lib/python3.6/dist-packages/clearml_agent/commands/init.py", line 3, in <module>
from .worker import Worker
File "/usr/local/lib/python3.6/dist-packages/clearml_agent/commands/worker.py", line 47, in <module>
from clearml_agent.commands.base import resolve_names, ServiceCommandSection
File "/usr/local/lib/python3.6/dist-packages/clearml_agent/commands/base.py", line 20, in <module>
from clearml_agent.interface.base import ObjectID
File "/usr/local/lib/python3.6/dist-packages/clearml_agent/interface/init.py", line 7, in <module>
from .base import Parser, base_arguments, add_service, OnlyPluralChoicesHelpFormatter
File "/usr/local/lib/python3.6/dist-packages/clearml_agent/interface/base.py", line 12, in <module>
from clearml_agent.session import Session
File "/usr/local/lib/python3.6/dist-packages/clearml_agent/session.py", line 23, in <module>
from clearml_agent.helper.docker_args import DockerArgsSanitizer, sanitize_urls
File "/usr/local/lib/python3.6/dist-packages/clearml_agent/helper/docker_args.py", line 279, in <module>
class CustomTemplate(Template):
File "/usr/lib/python3.6/string.py", line 74, in init
cls.pattern = _re.compile(pattern, cls.flags | _re.VERBOSE)
File "/usr/lib/python3.6/re.py", line 233, in compile
return _compile(pattern, flags)
File "/usr/lib/python3.6/re.py", line 301, in _compile
p = sre_compile.compile(pattern, flags)
File "/usr/lib/python3.6/sre_compile.py", line 562, in compile
p = sre_parse.parse(p, flags)
File "/usr/lib/python3.6/sre_parse.py", line 855, in parse
p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
File "/usr/lib/python3.6/sre_parse.py", line 416, in _parse_sub
not nested and not items))
File "/usr/lib/python3.6/sre_parse.py", line 765, in _parse
p = _parse_sub(source, state, sub_verbose, nested + 1)
File "/usr/lib/python3.6/sre_parse.py", line 416, in _parse_sub
not nested and not items))
File "/usr/lib/python3.6/sre_parse.py", line 765, in _parse
p = _parse_sub(source, state, sub_verbose, nested + 1)
File "/usr/lib/python3.6/sre_parse.py", line 416, in _parse_sub
not nested and not items))
File "/usr/lib/python3.6/sre_parse.py", line 734, in _parse
flags = _parse_flags(source, state, char)
File "/usr/lib/python3.6/sre_parse.py", line 803, in _parse_flags
raise source.error("bad inline flags: cannot turn on global flag", 1)
sre_constants.error: bad inline flags: cannot turn on global flag at position 92 (line 4, column 20)

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					PompousCrow47
				
					0

Votes Newest

Answers 46

Ok will try it

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

I will get back at you in 15mn if thats ok

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

kubectl describe pod -n clearml-prod -l app.kubernetes.io/name=clearml-agent
kubectl logs -n clearml-prod -l app.kubernetes.io/name=clearml-agent --previous 2>/dev/null || true
Name:             clearml-agent-848875fbdc-x8x6s
Namespace:        clearml-prod
Priority:         0
Service Account:  clearml-agent-sa
Node:             kharrinhao/192.168.70.211
Start Time:       Mon, 21 Jul 2025 15:23:02 +0000
Labels:           app.kubernetes.io/instance=clearml-agent
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=clearml-agent
                  app.kubernetes.io/version=1.24
                  helm.sh/chart=clearml-agent-5.3.3
                  pod-template-hash=848875fbdc
Annotations:      checksum/config: 5c1b50a353fea7ffd1fa5e62f968edc92e2610e0f0fd7783900a44f899ebe9ca
                  cni.projectcalico.org/containerID: 6964e25aa0cf54fa1dc91e36648d97e6deeae3366a924579be1e72742a25365a
                  cni.projectcalico.org/podIP: 192.168.31.162/32
                  cni.projectcalico.org/podIPs: 192.168.31.162/32
Status:           Running
IP:               192.168.31.162
IPs:
  IP:           192.168.31.162
Controlled By:  ReplicaSet/clearml-agent-848875fbdc
Init Containers:
  init-k8s-glue:
    Container ID:

5
    Image:         docker.io/allegroai/clearml-agent-k8s-base:1.24-21
    Image ID:      docker.io/allegroai/clearml-agent-k8s-base@sha256:772827a01bb5a4fff5941980634c8afa55d1d6bbf3ad805ccd4edafef6090f28
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -c
      set -x; while [ $(curl --insecure -sw '%{http_code}' "

" -o /dev/null) -ne 200 ] ; do
        echo "waiting for apiserver" ;
        sleep 5 ;
      done; while [[ $(curl --insecure -sw '%{http_code}' "

" -o /dev/null) =~ 403|405 ]] ; do
        echo "waiting for fileserver" ;
        sleep 5 ;
      done; while [ $(curl --insecure -sw '%{http_code}' "

" -o /dev/null) -ne 200 ] ; do
        echo "waiting for webserver" ;
        sleep 5 ;
      done

    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 21 Jul 2025 15:23:03 +0000
      Finished:     Mon, 21 Jul 2025 15:23:03 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-7f2zt (ro)
Containers:
  k8s-glue:
    Container ID:

6
    Image:         docker.io/allegroai/clearml-agent-k8s-base:1.24-21
    Image ID:      docker.io/allegroai/clearml-agent-k8s-base@sha256:772827a01bb5a4fff5941980634c8afa55d1d6bbf3ad805ccd4edafef6090f28
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/bash
      -c
      export PATH=$PATH:$HOME/bin; source /root/.bashrc && /root/entrypoint.sh

    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 21 Jul 2025 15:23:58 +0000
      Finished:     Mon, 21 Jul 2025 15:24:02 +0000
    Ready:          False
    Restart Count:  3
    Environment:
      CLEARML_API_HOST:


      CLEARML_WEB_HOST:


      CLEARML_FILES_HOST:


      CLEARML_API_HOST_VERIFY_CERT:  false
      K8S_GLUE_EXTRA_ARGS:           --namespace clearml-prod --template-yaml /root/template/template.yaml  --create-queue
      CLEARML_CONFIG_FILE:           /root/clearml.conf
      K8S_DEFAULT_NAMESPACE:         clearml-prod
      CLEARML_API_ACCESS_KEY:        <set to the key 'agentk8sglue_key' in secret 'clearml-agent-ac'>     Optional: false
      CLEARML_API_SECRET_KEY:        <set to the key 'agentk8sglue_secret' in secret 'clearml-agent-ac'>  Optional: false
      CLEARML_WORKER_ID:             clearml-agent
      CLEARML_AGENT_UPDATE_REPO:
      FORCE_CLEARML_AGENT_REPO:
      CLEARML_DOCKER_IMAGE:          ubuntu:18.04
      K8S_GLUE_QUEUE:                default
    Mounts:
      /root/clearml.conf from k8sagent-clearml-conf-volume (ro,path="clearml.conf")
      /root/template from clearml-agent-pt (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-7f2zt (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       False
  ContainersReady             False
  PodScheduled                True
Volumes:
  clearml-agent-pt:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      clearml-agent-pt
    Optional:  false
  k8sagent-clearml-conf-volume:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  clearml-agent-ac
    Optional:    false
  kube-api-access-7f2zt:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  96s                default-scheduler  Successfully assigned clearml-prod/clearml-agent-848875fbdc-x8x6s to kharrinhao
  Normal   Pulled     95s                kubelet            Container image "docker.io/allegroai/clearml-agent-k8s-base:1.24-21" already present on machine
  Normal   Created    95s                kubelet            Created container: init-k8s-glue
  Normal   Started    95s                kubelet            Started container init-k8s-glue
  Normal   Pulled     40s (x4 over 94s)  kubelet            Container image "docker.io/allegroai/clearml-agent-k8s-base:1.24-21" already present on machine
  Normal   Created    40s (x4 over 94s)  kubelet            Created container: k8s-glue
  Normal   Started    40s (x4 over 93s)  kubelet            Started container k8s-glue
  Warning  BackOff    10s (x6 over 84s)  kubelet            Back-off restarting failed container k8s-glue in pod clearml-agent-848875fbdc-x8x6s_clearml-prod(42a51ff8-6423-485a-89e3-6109b3c0583a)
    not nested and not items))
  File "/usr/lib/python3.6/sre_parse.py", line 765, in _parse
    p = _parse_sub(source, state, sub_verbose, nested + 1)
  File "/usr/lib/python3.6/sre_parse.py", line 416, in _parse_sub
    not nested and not items))
  File "/usr/lib/python3.6/sre_parse.py", line 734, in _parse
    flags = _parse_flags(source, state, char)
  File "/usr/lib/python3.6/sre_parse.py", line 803, in _parse_flags
    raise source.error("bad inline flags: cannot turn on global flag", 1)
sre_constants.error: bad inline flags: cannot turn on global flag at position 92 (line 4, column 20)

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

I understand, I'd just like to make sure if that's the root issue and there's no other bug, and if so then you can think of how to automate it via API

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					CooperativeKitten94
				
					0

Can you try with these values? For instance the changes are: not using clearmlConfig, not overriding the image and use default, not defining resources

agentk8sglue:
  apiServerUrlReference:


  clearmlcheckCertificate: false
  createQueueIfNotExists: true
  fileServerUrlReference:


  queue: default
  webServerUrlReference:


clearml:
  agentk8sglueKey: 8888TMDLWYY7ZQJJ0I7R2X2RSP8XFT
  agentk8sglueSecret: oNODbBkDGhcDscTENQyr-GM0cE8IO7xmpaPdqyfsfaWearo1S8EQ8eBOxu-opW8dVUU
sessions:
  externalIP: 192.168.70.211
  maxServices: 5
  startingPort: 30100
  svcType: NodePort

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					CooperativeKitten94
				
					0

for now:

name: clearml-access-key
value: CLEARML8AGENT9KEY1234567890ABCD
- name: clearml-secret-key
value: CLEARML-AGENT-SECRET-1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ123456
- name: admin-password
value: clearml123!

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

I also see these logs:
bash

/root/entrypoint.sh: line 28: /root/clearml.conf: Read-only file system

This indicates that the container's filesystem is mounted as read-only , preventing the agent from writing its configuration file.

From

podSecurityContext:
  readOnlyRootFilesystem: true  # This causes the issue

PodSecurityPolicies
Security Context Constraints (OpenShift)
Admission controllers enforcing read-only filesystems

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

resulting in the same issue

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

yes

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

So CLEARML8AGENT9KEY1234567890ABCD is the actual real value you are using?

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					CooperativeKitten94
				
					0

So if you now run helm get values clearml-agent -n <NAMESPACE> where <NAMESPACE> is the value you have in the $NS variable, can you confirm this is the full and only output? Of course the $VARIABLES will have their real value

agentk8sglue:
  # Try newer image version to fix Python 3.6 regex issue
  image:
    repository: allegroai/clearml-agent-k8s-base
    tag: "1.25-1"
    pullPolicy: Always
  apiServerUrlReference: "http://$NODE_IP:30008"
  fileServerUrlReference: "http://$NODE_IP:30081"
  webServerUrlReference: "http://$NODE_IP:30080"
  clearmlcheckCertificate: false
  queue: default
  createQueueIfNotExists: true
  # Keep resources minimal for testing
  resources:
    limits:
      cpu: 500m
      memory: 1Gi
    requests:
      cpu: 100m
      memory: 256Mi
sessions:
  svcType: NodePort
  externalIP: $NODE_IP
  startingPort: 30100
  maxServices: 5

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					CooperativeKitten94
				
					0

Since with argo i can pass them as params

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

In your last message, you are referring to pod security context and admission controllers enforcing some policies such as a read-only filesystem. Is that the case in your cluster?
Or was this some output of a GPT-like chat? If yes, please do not use LLMs to generate values for the helm installation as they're usually not providing a useful or real config

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					CooperativeKitten94
				
					0

I had those setted on the config file, but i can provide you what i am using for server and agent config if it helps. I got lost on the configs so i tried everything 🤣

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

parameters:
      - name: namespace
        value: clearml-prod
      - name: node-ip
        value: "192.168.70.211"
      - name: force-cleanup
        value: "false"
      - name: install-server
        value: "true"
      - name: install-agent
        value: "true"
      - name: install-serving
        value: "true"
      - name: diagnose-only
        value: "false"
      - name: storage-class
        value: openebs-hostpath
      - name: helm-timeout
        value: 900s
      - name: clearml-access-key
        value: CLEARML8AGENT9KEY1234567890ABCD
      - name: clearml-secret-key
        value: CLEARML-AGENT-SECRET-1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ123456
      - name: admin-password
        value: clearml123!

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

jcarvalho@kharrinhao:~$ kubectl get pods -n clearml-prod -l app.kubernetes.io/name=clearml-agent
NAME                             READY   STATUS   RESTARTS      AGE
clearml-agent-547584497c-xf98z   0/1     Error    4 (60s ago)   2m8s
jcarvalho@kharrinhao:~$ kubectl logs -n clearml-prod -l app.kubernetes.io/name=clearml-agent
Defaulted container "k8s-glue" out of: k8s-glue, init-k8s-glue (init)
    not nested and not items))
  File "/usr/lib/python3.6/sre_parse.py", line 765, in _parse
    p = _parse_sub(source, state, sub_verbose, nested + 1)
  File "/usr/lib/python3.6/sre_parse.py", line 416, in _parse_sub
    not nested and not items))
  File "/usr/lib/python3.6/sre_parse.py", line 734, in _parse
    flags = _parse_flags(source, state, char)
  File "/usr/lib/python3.6/sre_parse.py", line 803, in _parse_flags
    raise source.error("bad inline flags: cannot turn on global flag", 1)
sre_constants.error: bad inline flags: cannot turn on global flag at position 92 (line 4, column 20)
jcarvalho@kharrinhao:~$

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

It's a bit hard for me to provide support here with the additional layer of Argo.
I assume the server is working fine and you can open the clearml UI and log in, right? If yes, would it be possible to extract the Agent part only, out of Argo, and proceed installing it through standard helm?

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					CooperativeKitten94
				
					0

cat values-prod.yaml
agent:
  api:
    web_server: "

"
    api_server: "

"
    files_server: "

"
    credentials:
      access_key: "8888TMDLWYY7ZQJJ0I7R2X2RSP8XFT"
      secret_key: "oNODbBkDGhcDscTENQyr-GM0cE8IO7xmpaPdqyfsfaWearo1S8EQ8eBOxu-opW8dVUU"

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

Hi @<1857232027015712768:profile|PompousCrow47> , are you using pods with a read-only-filesystem limitation?

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Sorry we had a short delay on the deployment but

with these values:

clearml:
  agentk8sglueKey: "8888TMDLWYY7ZQJJ0I7R2X2RSP8XFT"
  agentk8sglueSecret: "oNODbBkDGhcDscTENQyr-GM0cE8IO7xmpaPdqyfsfaWearo1S8EQ8eBOxu-opW8dVUU"
  clearmlConfig: |-
    api {
        web_server:


        api_server:


        files_server:


        credentials {
            "access_key" = "8888TMDLWYY7ZQJJ0I7R2X2RSP8XFT"
            "secret_key" = "oNODbBkDGhcDscTENQyr-GM0cE8IO7xmpaPdqyfsfaWearo1S8EQ8eBOxu-opW8dVUU"
        }
    }

agentk8sglue:
  # Try different image versions to avoid Python 3.6 regex issue
  image:
    repository: allegroai/clearml-agent-k8s-base
    tag: "latest"  # Use latest instead of specific version
    pullPolicy: Always

  # Essential server references
  apiServerUrlReference: "

"
  fileServerUrlReference: "

"
  webServerUrlReference: "

"

  # Disable certificate checking
  clearmlcheckCertificate: false

  # Queue configuration
  queue: default
  createQueueIfNotExists: true

  # Minimal resources
  resources:
    limits:
      cpu: 500m
      memory: 1Gi
    requests:
      cpu: 100m
      memory: 256Mi

sessions:
  svcType: NodePort
  externalIP: 192.168.70.211
  startingPort: 30100
  maxServices: 5
EOF

The following commands


helm repo add clearml


helm repo update

helm install clearml-agent clearml/clearml-agent \
  --namespace clearml-prod \
  --values clearml-agent-values.yaml \
  --wait \
  --timeout 300s
"clearml" already exists with the same configuration, skipping
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "argo" chart repository
...Successfully got an update from the "clearml" chart repository
...Successfully got an update from the "harbor" chart repository
...Successfully got an update from the "nvidia" chart repository
Update Complete. ⎈Happy Helming!⎈
NAME: clearml-agent
LAST DEPLOYED: Mon Jul 21 15:11:38 2025
NAMESPACE: clearml-prod
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Glue Agent deployed.

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

just to check is this the intended image: docker.io/allegroai/clearml-agent-k8s-base:1.24-2

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

Hi, im trying to add the agent to a running server and facing the same issue.

Defaulted container "k8s-glue" out of: k8s-glue, init-k8s-glue (init)
p = sre_compile.compile(pattern, flags)
File "/usr/lib/python3.6/sre_compile.py", line 562, in compile
p = sre_parse.parse(p, flags)
File "/usr/lib/python3.6/sre_parse.py", line 855, in parse
p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
File "/usr/lib/python3.6/sre_parse.py", line 416, in _parse_sub
not nested and not items))
File "/usr/lib/python3.6/sre_parse.py", line 765, in _parse
p = _parse_sub(source, state, sub_verbose, nested + 1)
File "/usr/lib/python3.6/sre_parse.py", line 416, in _parse_sub
not nested and not items))
File "/usr/lib/python3.6/sre_parse.py", line 765, in _parse
p = _parse_sub(source, state, sub_verbose, nested + 1)
File "/usr/lib/python3.6/sre_parse.py", line 416, in _parse_sub
not nested and not items))
File "/usr/lib/python3.6/sre_parse.py", line 734, in _parse
flags = _parse_flags(source, state, char)
File "/usr/lib/python3.6/sre_parse.py", line 803, in _parse_flags
raise source.error("bad inline flags: cannot turn on global flag", 1)
sre_constants.error: bad inline flags: cannot turn on global flag at position 92 (line 4, column 20)

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

then share what i found

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

I have separed the most crutial part. Its a container that runs the standard helm commands

example:
....
cat > /tmp/server-values.yaml <<EOF
global:
defaultStorageClass: $STORAGE_CLASS

        apiserver:

...
helm install clearml clearml/clearml
--namespace "$NS"
--values /tmp/server-values.yaml
--wait
--timeout "$TMO"

...

helm install clearml-agent clearml/clearml-agent
--namespace "$NS"
--values /tmp/simple-agent-values.yaml
--wait
--timeout 300s

these are the values :

clearml:
agentk8sglueKey: $ACCESS_KEY
agentk8sglueSecret: $SECRET_KEY
clearmlConfig: |-
api {
web_server: http://$NODE_IP:30080
api_server: http://$NODE_IP:30008
files_server: http://$NODE_IP:30081
credentials {
"access_key" = "$ACCESS_KEY"
"secret_key" = "$SECRET_KEY"
}
}

        agentk8sglue:
          # Try newer image version to fix Python 3.6 regex issue
          image:
            repository: allegroai/clearml-agent-k8s-base
            tag: "1.25-1"
            pullPolicy: Always

          apiServerUrlReference: "http://$NODE_IP:30008"
          fileServerUrlReference: "http://$NODE_IP:30081"
          webServerUrlReference: "http://$NODE_IP:30080"
          clearmlcheckCertificate: false
          queue: default
          createQueueIfNotExists: true

          # Keep resources minimal for testing
          resources:
            limits:
              cpu: 500m
              memory: 1Gi
            requests:
              cpu: 100m
              memory: 256Mi

        sessions:
          svcType: NodePort
          externalIP: $NODE_IP
          startingPort: 30100
          maxServices: 5

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

Yeah i know.. thats what i did for the github implementation, but for this i need them to be generated on the fly or via CLI that i can use argo to create if thats possible

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

I will try to create them on the UI and only run the Agent task on argo or so to see if it helps

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

It got an error then backed off

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

As far as i can test, the server is going ok, i had some isses with resources not loading but solved those. The bigger issue for now is agent and prob could propagate to the serving. Later on i plan on adding also gpu resouces to both so im not entirely sure on that part

clearml-apiserver-866ccf75f7-zr5wx 1/1 Running 0 37m
clearml-apiserver-asyncdelete-8dfb574b8-8gbcv 1/1 Running 0 37m
clearml-elastic-master-0 1/1 Running 0 37m
clearml-fileserver-86b8ddf6f6-4xnqd 1/1 Running 0 37m
clearml-mongodb-5f995fbb5-xmdpb 1/1 Running 0 37m
clearml-redis-master-0 1/1 Running 0 37m
clearml-webserver-c487cfcb-vv5z5 1/1 Running 0 37m

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

I assume the key and secret values here are redacted values and not the actual ones, right?

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					CooperativeKitten94
				
					0

I will try :
1- update the agent with these values
2- run argo with those changes

  				
Posted 
	3 months ago

					More
				  		
  Report
		
					BraveGrasshopper38
				
					0
					 × 1

Show more results

Write your answer

12K Views

46 Answers

3 months ago