Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Good Morning Folks, I Am Setting Up Clearml On A (Self-Hosted) K8S Cluster Using The

Good morning folks, I am setting up ClearML on a (self-hosted) K8s cluster using the https://github.com/allegroai/clearml-helm-charts/blob/main/charts/clearml as the basis.
I managed to get the basic (authentication, replacing tokens etc.) and I would now like configure some agents to add to the cluster. In the https://github.com/allegroai/clearml-helm-charts/blob/4422cf433d3bf30699ae7094296a1eaa65fb3787/charts/clearml/values.yaml#L208 I see there are several parameters related to the agents, but I am not really sure it that fits our needs and what's the recommended way to proceed.

Our cluster has nodes with different GPUs, so that agent on different nodes will probably require different cuda-version images.

How would you recommend configuring the agents?

  
  
Posted 2 years ago
Votes Newest

Answers 11


So that agent on different nodes will probably require different cuda-version images.

That makes sense SarcasticSquirrel56
I would edit the helm chart (or deploy manually) based on a selector that will select the different nodes/gpus and assign the correct containers (i.e. matching CUDA versions to the diff GPUs / drivers)
BTW: you can also playaround with k8s glue, which would dynamically spin pods based on clearml Tasks.
wdyt?

  
  
Posted 2 years ago

Hi Martin, thanks. My doubt is:
if I configure manually the pods for the different nodes, how do I make clearml server aware that those agents exist? This step is really not clear to me from the documentation (it talks about user, and it uses interactive commands which would mean entering in the agents manually) I will try also the k8s glue, but I would like first to understand how to configure a fixed number of agents manually

  
  
Posted 2 years ago

I guess to achieve what I want, I could disable the agent using the helm chart values.yaml
and then define pods for each of the agent on their respective nodes

  
  
Posted 2 years ago

Right now I see the default agent that comes with the helm chart...

  
  
Posted 2 years ago

Thanks Martin, so if I understand correctly, when I do the clearml-agent init command (I have to check the syntax), by providing the apiserver webeserver and fileserver url they'll be registered to the clearml cluster?

  
  
Posted 2 years ago

Correct, (if this is running on k8s it is most likely be passed via env variables , CLEARML_WEB_HOST etc,)

  
  
Posted 2 years ago

SarcasticSquirrel56

if I configure manually the pods for the different nodes, how do I make clearml server aware that those agents exist?

Basically the agent register themselves on your cleaml-server, and they register on which Queue(s) they listen to. In other words the interface to choose the different types of machines/gpus is by enqueue the Task to different queues.
For example: Queue(1): "CUDA11_GPUx1" , Queue(2): "CUDA10_GPUx1"
Make sense ?

EDIT:

I guess to achieve what I want, I could disable the agent using the helm chart values.yaml
and then define pods for each of the agent on their respective nodes

It might be the case, I have to admit I can't remember how flexible the helm chart is in this manner ...

  
  
Posted 2 years ago

Hi AgitatedDove14 I have spent some time going through the helm charts but I admit I still haven't clear how things should work.

I see that with the default values (mostly what I am using), the K8s Glue agent is deployed (which is what you suggested to use).

  
  
Posted 2 years ago

Thanks, I'll try to understand how the default agent coming with the helm chart is configured and try to copy how to setup a different one from there then

  
  
Posted 2 years ago

What I still don't get, is how you would create different queues, targeting different nodes with different GPUs, and having them using the appropriate Cuda image.
Looking at the template, I don't understand how that's possible.

  
  
Posted 2 years ago

is how you would create different queues,

SarcasticSquirrel56 you can create them from the UI, when the server is already running
(if you are saying, how do I create them in the first installaiton, then yes you are correct, this is possible in the helm chart, I think 😞 )

  
  
Posted 2 years ago
1K Views
11 Answers
2 years ago
one year ago
Tags