
Reputation
Badges 1
21 × Eureka!Sorry we had a short delay on the deployment but
with these values:
clearml:
agentk8sglueKey: "8888TMDLWYY7ZQJJ0I7R2X2RSP8XFT"
agentk8sglueSecret: "oNODbBkDGhcDscTENQyr-GM0cE8IO7xmpaPdqyfsfaWearo1S8EQ8eBOxu-opW8dVUU"
clearmlConfig: |-
api {
web_server:
api_server:
files_server:
credentials {
"access_key" = "8888TMDLWYY7ZQJJ0I7R2X2RSP8XFT"
"secret_key" = "oNODbBkDGhcDscTENQyr-...
with the values on helm
helm get values clearml-agent -n clearml-prod
USER-SUPPLIED VALUES:
agentk8sglue:
apiServerUrlReference:
clearmlcheckCertificate: false
createQueueIfNotExists: true
fileServerUrlReference:
image:
pullPolicy: Always
repository: allegroai/clearml-agent-k8s-base
tag: latest
queue: default
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 100m
memory: 256Mi
webServerUrlRefe...
cat values-prod.yaml
agent:
api:
web_server: "
"
api_server: "
"
files_server: "
"
credentials:
access_key: "8888TMDLWYY7ZQJJ0I7R2X2RSP8XFT"
secret_key: "oNODbBkDGhcDscTENQyr-GM0cE8IO7xmpaPdqyfsfaWearo1S8EQ8eBOxu-opW8dVUU"
@<1729671499981262848:profile|CooperativeKitten94> @<1857232027015712768:profile|PompousCrow47>
I figured it out for future reference this is a error regarding the Kubernetes Support on the agent : None
As for getting the credentials to lauch the agent the only way i can do it is via UI manually i could not get a way to get them via code
I will try :
1- update the agent with these values
2- run argo with those changes
for now:
- name: clearml-access-key
value: CLEARML8AGENT9KEY1234567890ABCD
- name: clearml-secret-key
value: CLEARML-AGENT-SECRET-1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ123456
- name: admin-password
value: clearml123!
Since with argo i can pass them as params
jcarvalho@kharrinhao:~$ kubectl get pods -n clearml-prod -l app.kubernetes.io/name=clearml-agent
NAME READY STATUS RESTARTS AGE
clearml-agent-547584497c-xf98z 0/1 Error 4 (60s ago) 2m8s
jcarvalho@kharrinhao:~$ kubectl logs -n clearml-prod -l app.kubernetes.io/name=clearml-agent
Defaulted container "k8s-glue" out of: k8s-glue, init-k8s-glue (init)
not nested and not items))
File "/usr/lib/python3.6/sre_parse.py", line 765, in _parse
...
I also see these logs:
bash
/root/entrypoint.sh: line 28: /root/clearml.conf: Read-only file system
This indicates that the container's filesystem is mounted as read-only , preventing the agent from writing its configuration file.
From
podSecurityContext:
readOnlyRootFilesystem: true # This causes the issue
PodSecurityPolicies
Security Context Constraints (OpenShift)
Admission controllers enforcing read-only filesystems
Cause when i check it references to 3y ago and i am following this: None
Hi, im trying to add the agent to a running server and facing the same issue.
Defaulted container "k8s-glue" out of: k8s-glue, init-k8s-glue (init)
p = sre_compile.compile(pattern, flags)
File "/usr/lib/python3.6/sre_compile.py", line 562, in compile
p = sre_parse.parse(p, flags)
File "/usr/lib/python3.6/sre_parse.py", line 855, in parse
p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
File "/usr/lib/python3.6/sre_parse.py", line 416, in _parse_sub
not neste...
As far as i can test, the server is going ok, i had some isses with resources not loading but solved those. The bigger issue for now is agent and prob could propagate to the serving. Later on i plan on adding also gpu resouces to both so im not entirely sure on that part
clearml-apiserver-866ccf75f7-zr5wx 1/1 Running 0 37m
clearml-apiserver-asyncdelete-8dfb574b8-8gbcv 1/1 Running 0 37m
clearml-elastic-master-0 ...
The value field is a default argo falls back into if i dont provide any
Hey! @<1729671499981262848:profile|CooperativeKitten94> Is there any tips you can give me on this?
It seems like the most recent version supported for kubernetes is clearml-agent==1.9.2?
thanks again!
Yeah i know.. thats what i did for the github implementation, but for this i need them to be generated on the fly or via CLI that i can use argo to create if thats possible
I had no issues deploying via the Github but helm is quite more confusing
It got an error then backed off
I have separed the most crutial part. Its a container that runs the standard helm commands
example:
....
cat > /tmp/server-values.yaml <<EOF
global:
defaultStorageClass: $STORAGE_CLASS
apiserver:
...
helm install clearml clearml/clearml
--namespace "$NS"
--values /tmp/server-values.yaml
--wait
--timeout "$TMO"
...
helm install clearml-agent clearml/clearml-agent
-...
Yes i am using those, they are hardcoded ones cause i will on a later stage generate them via a secure method
Python regex error in k8s glue agent :
sre_constants.error: bad inline flags: cannot turn on global flag at position 92
- Issue is in clearml-agent k8s glue codebase (Python 3.6 compatibility)
- Not configuration-related - persists across different HOCON formats
- Affects image tags:
1.24-21
,1.24-23
,latest
I had those setted on the config file, but i can provide you what i am using for server and agent config if it helps. I got lost on the configs so i tried everything 🤣
From what i could find, since the serving endpoint is not treated as a independent enviroment, the packages are being instaled into a 3.8.10 version of python. And the endpoint is trying to get them from another version that does not contain the packages. But i cannot change the version of either i dont understand why...
hi thanks for the reply!
I have setted it and its dowloading (i checked on the container logs) but when i try to POST i get that error
# Step 3: Create credentials using JWT token via web server
API_URL="http://${NODE_IP}:30080/api/v2.31/auth.create_credentials"
PAYLOAD='{"label": "k8s-agent-credentials"}'
RESPONSE=$(curl -s -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d "$PAYLOAD" \
"$API_URL")