Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Folks, I Just Deployed A Clearml Agent Using The Helm Chart. I Have A Few Doubts:

Hi folks, I just deployed a ClearML agent using the Helm chart. I have a few doubts:
after the deployment, I see a new queue called k8s_scheduler, which I didn't create. Is this normal? in the helm Chart, I specified that I want the AgentK8sGlue to process the queue cpu . When I enqueue an experiment to this queue, I see that there is a worker called "k8s-agent", with an ip associated, but the "currently executing" column stays empty. How can I look the logs of this agent (I can't see any deployment nor pod in the same namespace as the Agent), so I don't know where it is running. Also, I see an experiment enqueed in the "default" queue. If I delete it, it "kills" the one that was enqueued in the "cpu" queue, is this a bug?

  
  
Posted 2 years ago
Votes Newest

Answers 31


image

  
  
Posted 2 years ago

I wouldn’t say it’s related RBAC because the issue seems Networking realted so connection timed out

  
  
Posted 2 years ago

but it;s just a quick guess, not sure if i’m right

  
  
Posted 2 years ago

Thanks for pitching in JuicyFox94 . For the connectivity, I used the "public" names for the various server
(e.g. we set clearml.internal.domain.name, clearml-apiserver.internal.domain.name and clearml-apiserver.internal.domain.name)

So in the agent values.yaml I set the parameters:
# -- Reference to Api server url apiServerUrlReference: " ` "

-- Reference to File server url

fileServerUrlReference: " "

-- Reference to Web server url

webServerUrlReference: " " `to the actual domain name.

Should I try using the internal kubernetes domain names instead, like
clearml-fileserver.clearml.svc.cluster.localinstead?

  
  
Posted 2 years ago

I don’t think it’s related how agent talk with apiserver or fileserver. It’s more related the fact agent pod internal kubectl cannot contact kubernetes apiserver

  
  
Posted 2 years ago

I suggest to try exec into agent pod and try to do some kubectl command like a simple kubectl get pod

  
  
Posted 2 years ago

probably you will see it’s not capable of doing it and it should be related k8s config

  
  
Posted 2 years ago

ah I see, I'll give it a try then

  
  
Posted 2 years ago

thanks for the help!

  
  
Posted 2 years ago

Effectively kubectl commands don't work from within the agent pod, I'll try to figure out why

  
  
Posted 2 years ago

JuicyFox94 apparently to make it work I'll have to add a "kubeconfig" file, but I can't see any obvious way to mount it in the agent pod, am I wrong?

  
  
Posted 2 years ago

accessing apiserver from a pod doesn’t require kubeconfig

  
  
Posted 2 years ago

try this into the pod

  
  
Posted 2 years ago

# Point to the internal API server hostname APISERVER= `

Path to ServiceAccount token

SERVICEACCOUNT=/var/run/secrets/kubernetes.io/serviceaccount

Read this Pod's namespace

NAMESPACE=$(cat ${SERVICEACCOUNT}/namespace)

Read the ServiceAccount bearer token

TOKEN=$(cat ${SERVICEACCOUNT}/token)

Reference the internal certificate authority (CA)

CACERT=${SERVICEACCOUNT}/ca.crt

Explore the API with TOKEN

curl --cacert ${CACERT} --header "Authorization: Bearer ${TOKEN}" -X GET ${APISERVER}/api `

  
  
Posted 2 years ago

Thanks Valeriano, so copying the .kube/config file from a node from which I can run kubectl I could run kubectl commands correctly

  
  
Posted 2 years ago

yep but this is not how it should work with inpod

  
  
Posted 2 years ago

probably
is not accessible from your pod

  
  
Posted 2 years ago

this means network issues at some level

  
  
Posted 2 years ago

yes, the curl returned a 503 error

  
  
Posted 2 years ago

if it turns 503 it’s not network but something on top of it

  
  
Posted 2 years ago

btw it’s an infra issue

  
  
Posted 2 years ago

OK, I'll report that, and see if we can get it fixed

  
  
Posted 2 years ago

you can workaround the issue mlunting the kubeconfig but I guess the issue is someway to be investigated

  
  
Posted 2 years ago

just my two cents

  
  
Posted 2 years ago

that's what I wanted to ask, while the proper networking is setup (I don't manage the cluster),
can I do tests using the .kube/config?

  
  
Posted 2 years ago

I guess yes but honestly I’m not sure you will get the right results

  
  
Posted 2 years ago

because while I can run kubectl commands from within the agent pod, clearml doesn't seem to pick the right value:

2022-08-05 12:09:47 task 29f1645fbe1a4bb29898b1e71a8b1489 pulled from 51f5309bfb1940acb514d64931ffddb9 by worker k8s-agent-cpu 2022-08-05 12:12:59 Running kubectl encountered an error: Unable to connect to the server: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) 2022-08-05 15:15:07 task 29f1645fbe1a4bb29898b1e71a8b1489 pulled from 51f5309bfb1940acb514d64931ffddb9 by worker k8s-agent-cpu

  
  
Posted 2 years ago

exactly

  
  
Posted 2 years ago

OK, thanks a lot, I'll try to get the networking thing sorted (and then I am sure I'll have lots more many doubts 😂 )

  
  
Posted 2 years ago
20K Views
31 Answers
2 years ago
7 months ago
Tags
Similar posts