I wouldn’t say it’s related RBAC because the issue seems Networking realted so connection timed out
but it;s just a quick guess, not sure if i’m right
Thanks for pitching in JuicyFox94 . For the connectivity, I used the "public" names for the various server
(e.g. we set clearml.internal.domain.name, clearml-apiserver.internal.domain.name and clearml-apiserver.internal.domain.name)
So in the agent values.yaml I set the parameters:# -- Reference to Api server url apiServerUrlReference: "
` "
-- Reference to File server url
fileServerUrlReference: " "
-- Reference to Web server url
webServerUrlReference: " " `to the actual domain name.
Should I try using the internal kubernetes domain names instead, likeclearml-fileserver.clearml.svc.cluster.local
instead?
I don’t think it’s related how agent talk with apiserver or fileserver. It’s more related the fact agent pod internal kubectl cannot contact kubernetes apiserver
I suggest to try exec into agent pod and try to do some kubectl command like a simple kubectl get pod
probably you will see it’s not capable of doing it and it should be related k8s config
Effectively kubectl commands don't work from within the agent pod, I'll try to figure out why
JuicyFox94 apparently to make it work I'll have to add a "kubeconfig" file, but I can't see any obvious way to mount it in the agent pod, am I wrong?
accessing apiserver from a pod doesn’t require kubeconfig
# Point to the internal API server hostname APISERVER=
`
Path to ServiceAccount token
SERVICEACCOUNT=/var/run/secrets/kubernetes.io/serviceaccount
Read this Pod's namespace
NAMESPACE=$(cat ${SERVICEACCOUNT}/namespace)
Read the ServiceAccount bearer token
TOKEN=$(cat ${SERVICEACCOUNT}/token)
Reference the internal certificate authority (CA)
CACERT=${SERVICEACCOUNT}/ca.crt
Explore the API with TOKEN
curl --cacert ${CACERT} --header "Authorization: Bearer ${TOKEN}" -X GET ${APISERVER}/api `
Thanks Valeriano, so copying the .kube/config file from a node from which I can run kubectl I could run kubectl commands correctly
yep but this is not how it should work with inpod
if it turns 503 it’s not network but something on top of it
OK, I'll report that, and see if we can get it fixed
you can workaround the issue mlunting the kubeconfig but I guess the issue is someway to be investigated
that's what I wanted to ask, while the proper networking is setup (I don't manage the cluster),
can I do tests using the .kube/config?
I guess yes but honestly I’m not sure you will get the right results
because while I can run kubectl commands from within the agent pod, clearml doesn't seem to pick the right value:
2022-08-05 12:09:47 task 29f1645fbe1a4bb29898b1e71a8b1489 pulled from 51f5309bfb1940acb514d64931ffddb9 by worker k8s-agent-cpu 2022-08-05 12:12:59 Running kubectl encountered an error: Unable to connect to the server: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) 2022-08-05 15:15:07 task 29f1645fbe1a4bb29898b1e71a8b1489 pulled from 51f5309bfb1940acb514d64931ffddb9 by worker k8s-agent-cpu
OK, thanks a lot, I'll try to get the networking thing sorted (and then I am sure I'll have lots more many doubts 😂 )