
Reputation
Badges 1
21 × Eureka!I will try :
1- update the agent with these values
2- run argo with those changes
cat values-prod.yaml
agent:
api:
web_server: "
"
api_server: "
"
files_server: "
"
credentials:
access_key: "8888TMDLWYY7ZQJJ0I7R2X2RSP8XFT"
secret_key: "oNODbBkDGhcDscTENQyr-GM0cE8IO7xmpaPdqyfsfaWearo1S8EQ8eBOxu-opW8dVUU"
for now:
- name: clearml-access-key
value: CLEARML8AGENT9KEY1234567890ABCD
- name: clearml-secret-key
value: CLEARML-AGENT-SECRET-1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ123456
- name: admin-password
value: clearml123!
just to check is this the intended image: docker.io/allegroai/clearml-agent-k8s-base:1.24-2
It got an error then backed off
Following up on this i was unable to fix the issue. But i ended up finding another complication. When uploading a onnx model using the upload command it keeps getting tagged as a TensorFlow model, even with the correct file structure, and that leads to the previous issue since the serving module will search for different format than the onnx.
As far as i could see this comes from the helper inside the triton engine, but as of right now i could not fix it.
Is there anything i might be doing ...
Sorry we had a short delay on the deployment but
with these values:
clearml:
agentk8sglueKey: "8888TMDLWYY7ZQJJ0I7R2X2RSP8XFT"
agentk8sglueSecret: "oNODbBkDGhcDscTENQyr-GM0cE8IO7xmpaPdqyfsfaWearo1S8EQ8eBOxu-opW8dVUU"
clearmlConfig: |-
api {
web_server:
api_server:
files_server:
credentials {
"access_key" = "8888TMDLWYY7ZQJJ0I7R2X2RSP8XFT"
"secret_key" = "oNODbBkDGhcDscTENQyr-...
If i run helm get values clearml-agent -n clearml-prod
the output is the following:
USER-SUPPLIED VALUES:
agentk8sglue:
apiServerUrlReference: None
clearmlcheckCertificate: false
createQueueIfNotExists: true
fileServerUrlReference: None
image:
pullPolicy: Always
repository: allegroai/clearml-agent-k8s-base
tag: 1.25-1
queue: default
resources:
limits:
cpu: 500m
memory: 1Gi
requests...
Do you have idea what might cause this?
Yeah that is also my understanding, i have it on a separate machine as resources are not a issue.
But as of rn i can acess and serve models, but cannot get them to be listed on the server interface
I will get back at you in 15mn if thats ok
Since with argo i can pass them as params
I also see these logs:
bash
/root/entrypoint.sh: line 28: /root/clearml.conf: Read-only file system
This indicates that the container's filesystem is mounted as read-only , preventing the agent from writing its configuration file.
From
podSecurityContext:
readOnlyRootFilesystem: true # This causes the issue
PodSecurityPolicies
Security Context Constraints (OpenShift)
Admission controllers enforcing read-only filesystems
Ok! perfect, that was i was looking. Thank you so much
What i intended to do was via calls do the same so i can automate it
I will try to create them on the UI and only run the Agent task on argo or so to see if it helps
i also tried to match it with the secure.conf
Defaulted container "clearml-apiserver" out of: clearml-apiserver, init-apiserver (init)
{
"http": {
"session_secret": {
"apiserver": "V8gcW3EneNDcNfO7G_TSUsWe7uLozyacc9_I33o7bxUo8rCN31VLRg"
}
},
"auth": {
"fixed_users": {
"enabled": true,
"pass_hashed": false,
"users": [
{"username": "admin", "password": "clearml123!", "name": "Administrator"...
#Step 4: Using the configured admin credentials for initial authentication
curl -s -X POST \
-H "Content-Type: application/json" \
-u "admin:mypassword123" \
-d "$USER_PAYLOAD" \
"$CREATE_USER_URL"
I had those setted on the config file, but i can provide you what i am using for server and agent config if it helps. I got lost on the configs so i tried everything 🤣
Yes i am using those, they are hardcoded ones cause i will on a later stage generate them via a secure method
@<1729671499981262848:profile|CooperativeKitten94> @<1857232027015712768:profile|PompousCrow47>
I figured it out for future reference this is a error regarding the Kubernetes Support on the agent : None
As for getting the credentials to lauch the agent the only way i can do it is via UI manually i could not get a way to get them via code
As far as i can test, the server is going ok, i had some isses with resources not loading but solved those. The bigger issue for now is agent and prob could propagate to the serving. Later on i plan on adding also gpu resouces to both so im not entirely sure on that part
clearml-apiserver-866ccf75f7-zr5wx 1/1 Running 0 37m
clearml-apiserver-asyncdelete-8dfb574b8-8gbcv 1/1 Running 0 37m
clearml-elastic-master-0 ...
then
helm install clearml clearml/clearml \
--namespace "$NS" \
--values /tmp/server-values.yaml \
--wait \
--timeout "$TMO"
# Step 2: Login via web UI API
LOGIN_URL="http://${NODE_IP}:30080/api/v2.31/auth.login"
LOGIN_PAYLOAD='{"username": "k8s-agent"}'
LOGIN_RESPONSE=$(curl -s -X POST \
-H "Content-Type: application/json" \
-d "$LOGIN_PAYLOAD" \
"$LOGIN_URL")
parameters:
- name: namespace
value: clearml-prod
- name: node-ip
value: "192.168.70.211"
- name: force-cleanup
value: "false"
- name: install-server
value: "true"
- name: install-agent
value: "true"
- name: install-serving
value: "true"
- name: diagnose-only
value: "false"
- name: storage-class
value: openebs-hostpath
- name: helm-timeout
value: 900s
- nam...