Reputation
Badges 1
21 × Eureka!jcarvalho@kharrinhao:~$ kubectl get pods -n clearml-prod -l app.kubernetes.io/name=clearml-agent
NAME READY STATUS RESTARTS AGE
clearml-agent-547584497c-xf98z 0/1 Error 4 (60s ago) 2m8s
jcarvalho@kharrinhao:~$ kubectl logs -n clearml-prod -l app.kubernetes.io/name=clearml-agent
Defaulted container "k8s-glue" out of: k8s-glue, init-k8s-glue (init)
not nested and not items))
File "/usr/lib/python3.6/sre_parse.py", line 765, in _parse
...
If i run helm get values clearml-agent -n clearml-prod
the output is the following:
USER-SUPPLIED VALUES:
agentk8sglue:
apiServerUrlReference: None
clearmlcheckCertificate: false
createQueueIfNotExists: true
fileServerUrlReference: None
image:
pullPolicy: Always
repository: allegroai/clearml-agent-k8s-base
tag: 1.25-1
queue: default
resources:
limits:
cpu: 500m
memory: 1Gi
requests...
I have separed the most crutial part. Its a container that runs the standard helm commands
example:
....
cat > /tmp/server-values.yaml <<EOF
global:
defaultStorageClass: $STORAGE_CLASS
apiserver:
...
helm install clearml clearml/clearml
--namespace "$NS"
--values /tmp/server-values.yaml
--wait
--timeout "$TMO"
...
helm install clearml-agent clearml/clearml-agent
-...
Sorry we had a short delay on the deployment but
with these values:
clearml:
agentk8sglueKey: "8888TMDLWYY7ZQJJ0I7R2X2RSP8XFT"
agentk8sglueSecret: "oNODbBkDGhcDscTENQyr-GM0cE8IO7xmpaPdqyfsfaWearo1S8EQ8eBOxu-opW8dVUU"
clearmlConfig: |-
api {
web_server:
api_server:
files_server:
credentials {
"access_key" = "8888TMDLWYY7ZQJJ0I7R2X2RSP8XFT"
"secret_key" = "oNODbBkDGhcDscTENQyr-...
Hi! Im using just a plain Kubernetes cluster (kubeadm) running on Proxmox VM, and im using Argo to deploy the helm, in order to standarize it Let me know if you need any more details!
The value field is a default argo falls back into if i dont provide any
I had those setted on the config file, but i can provide you what i am using for server and agent config if it helps. I got lost on the configs so i tried everything 🤣
I will try to create them on the UI and only run the Agent task on argo or so to see if it helps
@<1729671499981262848:profile|CooperativeKitten94> @<1857232027015712768:profile|PompousCrow47>
I figured it out for future reference this is a error regarding the Kubernetes Support on the agent : None
As for getting the credentials to lauch the agent the only way i can do it is via UI manually i could not get a way to get them via code
Yeah i know.. thats what i did for the github implementation, but for this i need them to be generated on the fly or via CLI that i can use argo to create if thats possible
As far as i can test, the server is going ok, i had some isses with resources not loading but solved those. The bigger issue for now is agent and prob could propagate to the serving. Later on i plan on adding also gpu resouces to both so im not entirely sure on that part
clearml-apiserver-866ccf75f7-zr5wx 1/1 Running 0 37m
clearml-apiserver-asyncdelete-8dfb574b8-8gbcv 1/1 Running 0 37m
clearml-elastic-master-0 ...
Hi, im trying to add the agent to a running server and facing the same issue.
Defaulted container "k8s-glue" out of: k8s-glue, init-k8s-glue (init)
p = sre_compile.compile(pattern, flags)
File "/usr/lib/python3.6/sre_compile.py", line 562, in compile
p = sre_parse.parse(p, flags)
File "/usr/lib/python3.6/sre_parse.py", line 855, in parse
p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
File "/usr/lib/python3.6/sre_parse.py", line 416, in _parse_sub
not neste...
Since with argo i can pass them as params
kubectl describe pod -n clearml-prod -l app.kubernetes.io/name=clearml-agent
kubectl logs -n clearml-prod -l app.kubernetes.io/name=clearml-agent --previous 2>/dev/null || true
Name: clearml-agent-848875fbdc-x8x6s
Namespace: clearml-prod
Priority: 0
Service Account: clearml-agent-sa
Node: kharrinhao/192.168.70.211
Start Time: Mon, 21 Jul 2025 15:23:02 +0000
Labels: app.kubernetes.io/instance=clearml-agent
app.kube...
cat values-prod.yaml
agent:
api:
web_server: "
"
api_server: "
"
files_server: "
"
credentials:
access_key: "8888TMDLWYY7ZQJJ0I7R2X2RSP8XFT"
secret_key: "oNODbBkDGhcDscTENQyr-GM0cE8IO7xmpaPdqyfsfaWearo1S8EQ8eBOxu-opW8dVUU"
I had no issues deploying via the Github but helm is quite more confusing
It got an error then backed off
Python regex error in k8s glue agent :
sre_constants.error: bad inline flags: cannot turn on global flag at position 92
- Issue is in clearml-agent k8s glue codebase (Python 3.6 compatibility)
- Not configuration-related - persists across different HOCON formats
- Affects image tags:
1.24-21,1.24-23,latest
Yes i am using those, they are hardcoded ones cause i will on a later stage generate them via a secure method
I will get back at you in 15mn if thats ok
parameters:
- name: namespace
value: clearml-prod
- name: node-ip
value: "192.168.70.211"
- name: force-cleanup
value: "false"
- name: install-server
value: "true"
- name: install-agent
value: "true"
- name: install-serving
value: "true"
- name: diagnose-only
value: "false"
- name: storage-class
value: openebs-hostpath
- name: helm-timeout
value: 900s
- nam...
for now:
- name: clearml-access-key
value: CLEARML8AGENT9KEY1234567890ABCD
- name: clearml-secret-key
value: CLEARML-AGENT-SECRET-1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ123456
- name: admin-password
value: clearml123!
Following up on this i was unable to fix the issue. But i ended up finding another complication. When uploading a onnx model using the upload command it keeps getting tagged as a TensorFlow model, even with the correct file structure, and that leads to the previous issue since the serving module will search for different format than the onnx.
As far as i could see this comes from the helper inside the triton engine, but as of right now i could not fix it.
Is there anything i might be doing ...
#Step 4: Using the configured admin credentials for initial authentication
curl -s -X POST \
-H "Content-Type: application/json" \
-u "admin:mypassword123" \
-d "$USER_PAYLOAD" \
"$CREATE_USER_URL"
What i intended to do was via calls do the same so i can automate it
i also tried to match it with the secure.conf
Defaulted container "clearml-apiserver" out of: clearml-apiserver, init-apiserver (init)
{
"http": {
"session_secret": {
"apiserver": "V8gcW3EneNDcNfO7G_TSUsWe7uLozyacc9_I33o7bxUo8rCN31VLRg"
}
},
"auth": {
"fixed_users": {
"enabled": true,
"pass_hashed": false,
"users": [
{"username": "admin", "password": "clearml123!", "name": "Administrator"...