So CLEARML8AGENT9KEY1234567890ABCD
is the actual real value you are using?
I will try :
1- update the agent with these values
2- run argo with those changes
cat values-prod.yaml
agent:
api:
web_server: "
"
api_server: "
"
files_server: "
"
credentials:
access_key: "8888TMDLWYY7ZQJJ0I7R2X2RSP8XFT"
secret_key: "oNODbBkDGhcDscTENQyr-GM0cE8IO7xmpaPdqyfsfaWearo1S8EQ8eBOxu-opW8dVUU"
Also, in order to simplify the installation, can you use a simpler version of your values for now, something like this should work:
agentk8sglue:
apiServerUrlReference:
clearmlcheckCertificate: false
createQueueIfNotExists: true
fileServerUrlReference:
queue: default
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 100m
memory: 256Mi
webServerUrlReference:
clearml:
agentk8sglueKey: <NEW_KEY>
agentk8sglueSecret: <NEW_SECRET>
sessions:
externalIP: 192.168.70.211
maxServices: 5
startingPort: 30100
svcType: NodePort
for now:
- name: clearml-access-key
value: CLEARML8AGENT9KEY1234567890ABCD
- name: clearml-secret-key
value: CLEARML-AGENT-SECRET-1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ123456
- name: admin-password
value: clearml123!
just to check is this the intended image: docker.io/allegroai/clearml-agent-k8s-base:1.24-2
Hi @<1857232027015712768:profile|PompousCrow47> , are you using pods with a read-only-filesystem limitation?
Please replace those credentials on the Agent and try upgrading the helm release
Sorry we had a short delay on the deployment but
with these values:
clearml:
agentk8sglueKey: "8888TMDLWYY7ZQJJ0I7R2X2RSP8XFT"
agentk8sglueSecret: "oNODbBkDGhcDscTENQyr-GM0cE8IO7xmpaPdqyfsfaWearo1S8EQ8eBOxu-opW8dVUU"
clearmlConfig: |-
api {
web_server:
api_server:
files_server:
credentials {
"access_key" = "8888TMDLWYY7ZQJJ0I7R2X2RSP8XFT"
"secret_key" = "oNODbBkDGhcDscTENQyr-GM0cE8IO7xmpaPdqyfsfaWearo1S8EQ8eBOxu-opW8dVUU"
}
}
agentk8sglue:
# Try different image versions to avoid Python 3.6 regex issue
image:
repository: allegroai/clearml-agent-k8s-base
tag: "latest" # Use latest instead of specific version
pullPolicy: Always
# Essential server references
apiServerUrlReference: "
"
fileServerUrlReference: "
"
webServerUrlReference: "
"
# Disable certificate checking
clearmlcheckCertificate: false
# Queue configuration
queue: default
createQueueIfNotExists: true
# Minimal resources
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 100m
memory: 256Mi
sessions:
svcType: NodePort
externalIP: 192.168.70.211
startingPort: 30100
maxServices: 5
EOF
The following commands
helm repo add clearml
helm repo update
helm install clearml-agent clearml/clearml-agent \
--namespace clearml-prod \
--values clearml-agent-values.yaml \
--wait \
--timeout 300s
"clearml" already exists with the same configuration, skipping
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "argo" chart repository
...Successfully got an update from the "clearml" chart repository
...Successfully got an update from the "harbor" chart repository
...Successfully got an update from the "nvidia" chart repository
Update Complete. ⎈Happy Helming!⎈
NAME: clearml-agent
LAST DEPLOYED: Mon Jul 21 15:11:38 2025
NAMESPACE: clearml-prod
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Glue Agent deployed.
If i run helm get values clearml-agent -n clearml-prod
the output is the following:
USER-SUPPLIED VALUES:
agentk8sglue:
apiServerUrlReference: None
clearmlcheckCertificate: false
createQueueIfNotExists: true
fileServerUrlReference: None
image:
pullPolicy: Always
repository: allegroai/clearml-agent-k8s-base
tag: 1.25-1
queue: default
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 100m
memory: 256Mi
webServerUrlReference: None
clearml:
agentk8sglueKey: CLEARML8AGENT9KEY1234567890ABCD
agentk8sglueSecret: CLEARML-AGENT-SECRET-1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ123456
clearmlConfig: |-
api {
web_server: None
api_server: None
files_server: None
credentials {
"access_key" = "CLEARML8AGENT9KEY1234567890ABCD"
"secret_key" = "CLEARML-AGENT-SECRET-1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ123456"
}
}
sessions:
externalIP: 192.168.70.211
maxServices: 5
startingPort: 30100
svcType: NodePort
I will get back at you in 15mn if thats ok
Since with argo i can pass them as params
I also see these logs:
bash
/root/entrypoint.sh: line 28: /root/clearml.conf: Read-only file system
This indicates that the container's filesystem is mounted as read-only , preventing the agent from writing its configuration file.
From
podSecurityContext:
readOnlyRootFilesystem: true # This causes the issue
PodSecurityPolicies
Security Context Constraints (OpenShift)
Admission controllers enforcing read-only filesystems
Python regex error in k8s glue agent :
sre_constants.error: bad inline flags: cannot turn on global flag at position 92
- Issue is in clearml-agent k8s glue codebase (Python 3.6 compatibility)
- Not configuration-related - persists across different HOCON formats
- Affects image tags:
1.24-21
,1.24-23
,latest
I will try to create them on the UI and only run the Agent task on argo or so to see if it helps
I assume the key and secret values here are redacted values and not the actual ones, right?
Oh, okay, not sure this will be the only issue but you'll need these credentials to be valid, since they are used by the ClearML Agent to connect to the ClearML Server 🙂
The easiest way to generate credentials is to open the ClearML UI in the browser, login with an Admin user, then navigate to the Settings located on the top right corner when clicking on the user icon. From there go to "Workspace" and click "Create new credentials" and use the value provided
Can you try with these values? For instance the changes are: not using clearmlConfig, not overriding the image and use default, not defining resources
agentk8sglue:
apiServerUrlReference:
clearmlcheckCertificate: false
createQueueIfNotExists: true
fileServerUrlReference:
queue: default
webServerUrlReference:
clearml:
agentk8sglueKey: 8888TMDLWYY7ZQJJ0I7R2X2RSP8XFT
agentk8sglueSecret: oNODbBkDGhcDscTENQyr-GM0cE8IO7xmpaPdqyfsfaWearo1S8EQ8eBOxu-opW8dVUU
sessions:
externalIP: 192.168.70.211
maxServices: 5
startingPort: 30100
svcType: NodePort
I had those setted on the config file, but i can provide you what i am using for server and agent config if it helps. I got lost on the configs so i tried everything 🤣
So if you now run helm get values clearml-agent -n <NAMESPACE>
where <NAMESPACE>
is the value you have in the $NS
variable, can you confirm this is the full and only output? Of course the $VARIABLES
will have their real value
agentk8sglue:
# Try newer image version to fix Python 3.6 regex issue
image:
repository: allegroai/clearml-agent-k8s-base
tag: "1.25-1"
pullPolicy: Always
apiServerUrlReference: "http://$NODE_IP:30008"
fileServerUrlReference: "http://$NODE_IP:30081"
webServerUrlReference: "http://$NODE_IP:30080"
clearmlcheckCertificate: false
queue: default
createQueueIfNotExists: true
# Keep resources minimal for testing
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 100m
memory: 256Mi
sessions:
svcType: NodePort
externalIP: $NODE_IP
startingPort: 30100
maxServices: 5
Yes i am using those, they are hardcoded ones cause i will on a later stage generate them via a secure method
@<1729671499981262848:profile|CooperativeKitten94> @<1857232027015712768:profile|PompousCrow47>
I figured it out for future reference this is a error regarding the Kubernetes Support on the agent : None
As for getting the credentials to lauch the agent the only way i can do it is via UI manually i could not get a way to get them via code
Hi @<1811208768843681792:profile|BraveGrasshopper38> , following up on your last message, are you running in an OpenShift k8s cluster?
As far as i can test, the server is going ok, i had some isses with resources not loading but solved those. The bigger issue for now is agent and prob could propagate to the serving. Later on i plan on adding also gpu resouces to both so im not entirely sure on that part
clearml-apiserver-866ccf75f7-zr5wx 1/1 Running 0 37m
clearml-apiserver-asyncdelete-8dfb574b8-8gbcv 1/1 Running 0 37m
clearml-elastic-master-0 1/1 Running 0 37m
clearml-fileserver-86b8ddf6f6-4xnqd 1/1 Running 0 37m
clearml-mongodb-5f995fbb5-xmdpb 1/1 Running 0 37m
clearml-redis-master-0 1/1 Running 0 37m
clearml-webserver-c487cfcb-vv5z5 1/1 Running 0 37m
parameters:
- name: namespace
value: clearml-prod
- name: node-ip
value: "192.168.70.211"
- name: force-cleanup
value: "false"
- name: install-server
value: "true"
- name: install-agent
value: "true"
- name: install-serving
value: "true"
- name: diagnose-only
value: "false"
- name: storage-class
value: openebs-hostpath
- name: helm-timeout
value: 900s
- name: clearml-access-key
value: CLEARML8AGENT9KEY1234567890ABCD
- name: clearml-secret-key
value: CLEARML-AGENT-SECRET-1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ123456
- name: admin-password
value: clearml123!
In your last message, you are referring to pod security context and admission controllers enforcing some policies such as a read-only filesystem. Is that the case in your cluster?
Or was this some output of a GPT-like chat? If yes, please do not use LLMs to generate values for the helm installation as they're usually not providing a useful or real config