Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Folks, I Just Deployed A Clearml Agent Using The Helm Chart. I Have A Few Doubts:

Hi folks, I just deployed a ClearML agent using the Helm chart. I have a few doubts:
after the deployment, I see a new queue called k8s_scheduler, which I didn't create. Is this normal? in the helm Chart, I specified that I want the AgentK8sGlue to process the queue cpu . When I enqueue an experiment to this queue, I see that there is a worker called "k8s-agent", with an ip associated, but the "currently executing" column stays empty. How can I look the logs of this agent (I can't see any deployment nor pod in the same namespace as the Agent), so I don't know where it is running. Also, I see an experiment enqueed in the "default" queue. If I delete it, it "kills" the one that was enqueued in the "cpu" queue, is this a bug?

  
  
Posted one year ago
Votes Newest

Answers 31


yes, the curl returned a 503 error

  
  
Posted one year ago

Thanks Valeriano, so copying the .kube/config file from a node from which I can run kubectl I could run kubectl commands correctly

  
  
Posted one year ago

yep but this is not how it should work with inpod

  
  
Posted one year ago

probably
is not accessible from your pod

  
  
Posted one year ago

this means network issues at some level

  
  
Posted one year ago

you can workaround the issue mlunting the kubeconfig but I guess the issue is someway to be investigated

  
  
Posted one year ago

I wouldn’t say it’s related RBAC because the issue seems Networking realted so connection timed out

  
  
Posted one year ago

but it;s just a quick guess, not sure if i’m right

  
  
Posted one year ago

I don’t think it’s related how agent talk with apiserver or fileserver. It’s more related the fact agent pod internal kubectl cannot contact kubernetes apiserver

  
  
Posted one year ago

I suggest to try exec into agent pod and try to do some kubectl command like a simple kubectl get pod

  
  
Posted one year ago

ah I see, I'll give it a try then

  
  
Posted one year ago

JuicyFox94 apparently to make it work I'll have to add a "kubeconfig" file, but I can't see any obvious way to mount it in the agent pod, am I wrong?

  
  
Posted one year ago

accessing apiserver from a pod doesn’t require kubeconfig

  
  
Posted one year ago

try this into the pod

  
  
Posted one year ago

thanks for the help!

  
  
Posted one year ago

OK, thanks a lot, I'll try to get the networking thing sorted (and then I am sure I'll have lots more many doubts 😂 )

  
  
Posted one year ago

Effectively kubectl commands don't work from within the agent pod, I'll try to figure out why

  
  
Posted one year ago

probably you will see it’s not capable of doing it and it should be related k8s config

  
  
Posted one year ago

because kubectl inside pod uses inpod method

  
  
Posted one year ago

image

  
  
Posted one year ago

Thanks for pitching in JuicyFox94 . For the connectivity, I used the "public" names for the various server
(e.g. we set clearml.internal.domain.name, clearml-apiserver.internal.domain.name and clearml-apiserver.internal.domain.name)

So in the agent values.yaml I set the parameters:
# -- Reference to Api server url apiServerUrlReference: " ` "

-- Reference to File server url

fileServerUrlReference: " "

-- Reference to Web server url

webServerUrlReference: " " `to the actual domain name.

Should I try using the internal kubernetes domain names instead, like
clearml-fileserver.clearml.svc.cluster.localinstead?

  
  
Posted one year ago

# Point to the internal API server hostname APISERVER= `

Path to ServiceAccount token

SERVICEACCOUNT=/var/run/secrets/kubernetes.io/serviceaccount

Read this Pod's namespace

NAMESPACE=$(cat ${SERVICEACCOUNT}/namespace)

Read the ServiceAccount bearer token

TOKEN=$(cat ${SERVICEACCOUNT}/token)

Reference the internal certificate authority (CA)

CACERT=${SERVICEACCOUNT}/ca.crt

Explore the API with TOKEN

curl --cacert ${CACERT} --header "Authorization: Bearer ${TOKEN}" -X GET ${APISERVER}/api `

  
  
Posted one year ago

if it turns 503 it’s not network but something on top of it

  
  
Posted one year ago

btw it’s an infra issue

  
  
Posted one year ago

because while I can run kubectl commands from within the agent pod, clearml doesn't seem to pick the right value:

2022-08-05 12:09:47 task 29f1645fbe1a4bb29898b1e71a8b1489 pulled from 51f5309bfb1940acb514d64931ffddb9 by worker k8s-agent-cpu 2022-08-05 12:12:59 Running kubectl encountered an error: Unable to connect to the server: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) 2022-08-05 15:15:07 task 29f1645fbe1a4bb29898b1e71a8b1489 pulled from 51f5309bfb1940acb514d64931ffddb9 by worker k8s-agent-cpu

  
  
Posted one year ago

that's what I wanted to ask, while the proper networking is setup (I don't manage the cluster),
can I do tests using the .kube/config?

  
  
Posted one year ago

I guess yes but honestly I’m not sure you will get the right results

  
  
Posted one year ago

OK, I'll report that, and see if we can get it fixed

  
  
Posted one year ago

just my two cents

  
  
Posted one year ago
1K Views
31 Answers
one year ago
25 days ago
Tags
Similar posts