
Reputation
Badges 1
21 × Eureka!From what i could find, since the serving endpoint is not treated as a independent enviroment, the packages are being instaled into a 3.8.10 version of python. And the endpoint is trying to get them from another version that does not contain the packages. But i cannot change the version of either i dont understand why...
# Step 3: Create credentials using JWT token via web server
API_URL="http://${NODE_IP}:30080/api/v2.31/auth.create_credentials"
PAYLOAD='{"label": "k8s-agent-credentials"}'
RESPONSE=$(curl -s -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TOKEN" \
-d "$PAYLOAD" \
"$API_URL")
So im setting the server like this:
global:
defaultStorageClass: $STORAGE_CLASS
apiserver:
replicaCount: 1
resources:
requests:
cpu: "200m"
memory: "512Mi"
limits:
cpu: "2000m"
memory: "4Gi"
service:
type: NodePort
nodePort: 30008
port: 8008
ad...
then
helm install clearml clearml/clearml \
--namespace "$NS" \
--values /tmp/server-values.yaml \
--wait \
--timeout "$TMO"
with this i have full acess to the running server, and can create the credentials via UI with no problem
yes that was my understanding but:
From browser network analysis:
- Create User :
POST /api/v2.31/auth.create_user
via port 30080- Payload:{"email": "...", "name": "...", "company": "...", "given_name": "...", "family_name": "..."}
- Response: User ID- Login :
POST /api/v2.31/auth.login
via port 30080- Payload:{"username": "username"}
- Response: JWT token- Create Credentials :
POST /api/v2.31/auth.create_credentials
via port 30080- Headers: `Authorization: Bearer <JWT_TO...
this is how i am doing it and Its prob something simple that i am missing
But in all of them i get missing credentials Unauthorized (missing credentials)
What i intended to do was via calls do the same so i can automate it
examples that i tested:
# Step 1: Create a user via the web UI API
CREATE_USER_URL="http://${NODE_IP}:30080/api/v2.31/auth.create_user"
USER_PAYLOAD='{"email": "k8s-agent@clearml.ai", "name": "k8s-agent", "company": "clearml", "given_name": "k8s", "family_name": "agent"}'
CREATE_USER_RESPONSE=$(curl -s -X POST \
-H "Content-Type: application/json" \
-d "$USER_PAYLOAD" \
"$CREATE_USER_URL")
I have separed the most crutial part. Its a container that runs the standard helm commands
example:
....
cat > /tmp/server-values.yaml <<EOF
global:
defaultStorageClass: $STORAGE_CLASS
apiserver:
...
helm install clearml clearml/clearml
--namespace "$NS"
--values /tmp/server-values.yaml
--wait
--timeout "$TMO"
...
helm install clearml-agent clearml/clearml-agent
-...
I will try to create them on the UI and only run the Agent task on argo or so to see if it helps
Following up on this i was unable to fix the issue. But i ended up finding another complication. When uploading a onnx model using the upload command it keeps getting tagged as a TensorFlow model, even with the correct file structure, and that leads to the previous issue since the serving module will search for different format than the onnx.
As far as i could see this comes from the helper inside the triton engine, but as of right now i could not fix it.
Is there anything i might be doing ...
# Step 2: Login via web UI API
LOGIN_URL="http://${NODE_IP}:30080/api/v2.31/auth.login"
LOGIN_PAYLOAD='{"username": "k8s-agent"}'
LOGIN_RESPONSE=$(curl -s -X POST \
-H "Content-Type: application/json" \
-d "$LOGIN_PAYLOAD" \
"$LOGIN_URL")
#Step 4: Using the configured admin credentials for initial authentication
curl -s -X POST \
-H "Content-Type: application/json" \
-u "admin:mypassword123" \
-d "$USER_PAYLOAD" \
"$CREATE_USER_URL"
cat values-prod.yaml
agent:
api:
web_server: "
"
api_server: "
"
files_server: "
"
credentials:
access_key: "8888TMDLWYY7ZQJJ0I7R2X2RSP8XFT"
secret_key: "oNODbBkDGhcDscTENQyr-GM0cE8IO7xmpaPdqyfsfaWearo1S8EQ8eBOxu-opW8dVUU"
hi thanks for the reply!
I have setted it and its dowloading (i checked on the container logs) but when i try to POST i get that error
curl -s -X POST \
-H "Content-Type: application/json" \
-d '{"username": "admin", "password": "clearml123!"}' \
"
"
{"meta":{"id":"db21b4b3124f4bdda14c00f60c621599","trx":"db21b4b3124f4bdda14c00f60c621599","endpoint":{"name":"auth.login","requested_version":"2.31","actual_version":"1.0"},"result_code":401,"result_subcode":20,"# First, try to login as adminsing credentials)","error_stack":null,"error_data":{}},"data":{}}
parameters:
- name: namespace
value: clearml-prod
- name: node-ip
value: "192.168.70.211"
- name: force-cleanup
value: "false"
- name: install-server
value: "true"
- name: install-agent
value: "true"
- name: install-serving
value: "true"
- name: diagnose-only
value: "false"
- name: storage-class
value: openebs-hostpath
- name: helm-timeout
value: 900s
- nam...
Ok! perfect, that was i was looking. Thank you so much
Yeah i know.. thats what i did for the github implementation, but for this i need them to be generated on the fly or via CLI that i can use argo to create if thats possible
With no sucess, @<1523701070390366208:profile|CostlyOstrich36> I hope this provides a clear idea of what i am trying, any help is fantastic
Since with argo i can pass them as params
Cause when i check it references to 3y ago and i am following this: None
with the values on helm
helm get values clearml-agent -n clearml-prod
USER-SUPPLIED VALUES:
agentk8sglue:
apiServerUrlReference:
clearmlcheckCertificate: false
createQueueIfNotExists: true
fileServerUrlReference:
image:
pullPolicy: Always
repository: allegroai/clearml-agent-k8s-base
tag: latest
queue: default
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 100m
memory: 256Mi
webServerUrlRefe...
Hi, im trying to add the agent to a running server and facing the same issue.
Defaulted container "k8s-glue" out of: k8s-glue, init-k8s-glue (init)
p = sre_compile.compile(pattern, flags)
File "/usr/lib/python3.6/sre_compile.py", line 562, in compile
p = sre_parse.parse(p, flags)
File "/usr/lib/python3.6/sre_parse.py", line 855, in parse
p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
File "/usr/lib/python3.6/sre_parse.py", line 416, in _parse_sub
not neste...
Do you have idea what might cause this?
I had those setted on the config file, but i can provide you what i am using for server and agent config if it helps. I got lost on the configs so i tried everything 🤣
jcarvalho@kharrinhao:~$ kubectl get pods -n clearml-prod -l app.kubernetes.io/name=clearml-agent
NAME READY STATUS RESTARTS AGE
clearml-agent-547584497c-xf98z 0/1 Error 4 (60s ago) 2m8s
jcarvalho@kharrinhao:~$ kubectl logs -n clearml-prod -l app.kubernetes.io/name=clearml-agent
Defaulted container "k8s-glue" out of: k8s-glue, init-k8s-glue (init)
not nested and not items))
File "/usr/lib/python3.6/sre_parse.py", line 765, in _parse
...