Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
In Order For A New Worker To Come Online In My K8 Cluster, Do I Need To Have An Ec2 Startup Script Init The Agent/Config, And Then Start The Daemon? Do I Have To Do This Manually Is This A Better Way?

In order for a new worker to come online in my k8 cluster, do I need to have an EC2 startup script init the agent/config, and then start the daemon? Do I have to do this manually is this a better way?

  
  
Posted one year ago
Votes Newest

Answers 30


Basically just change the helm yaml
queue: my_second_queue_name_here

  
  
Posted one year ago

No I'm not tracking. I'm pretty new to k8s so this might be beyond my current knowledge. Maybe if I rephrase my goals it may make more sense. Essentially I want to enqueue an experiment, pick a queue (gpu), and have a gpu ec2 node provisioned upon that, lastly the experiment is then initialized on that new gpu ec2 and executed. When the work is completed, I want the gpu ec2 node to terminate after x amount of time.

  
  
Posted one year ago

I got everything working using the default queue. I can submit an experiment, and a new GPU node is provisioned, all good

  
  
Posted one year ago

Okay, seems like there are ways to do it, just need to be a bit clever

  
  
Posted one year ago

Also, how do I associate that new queue with a worker?

  
  
Posted one year ago

Made some progress getting the gpu nodes to provision, but got this error on my task K8S glue status: Unschedulable (0/4 nodes are available: 1 node(s) had taint { http://nvidia.com/gpu : true}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity/selector.)

  
  
Posted one year ago

The agents are docker containers, how do I modify the startup script so it creates a queue?

Hmm actually not sure about that, might not be part of the helm chart.
So maybe the easiest is:
from clearml.backend_api.session.client import APIClient c = APIClient() c.queues.create(name="new_queue")

  
  
Posted one year ago

Okay fixed that taint restriction

  
  
Posted one year ago

yes, I see in the UI how to create a new queue. How do I associate that queue with a nodeSelector though?

  
  
Posted one year ago

yea, does the enterprise version have more functionality like this?

  
  
Posted one year ago

For instance, if I wanted the default queue and gpu queue that I create, how do I do that?

  
  
Posted one year ago

Are you able to do screenshare to discuss this? I'm not sure I understand the k8 glue agent purpose.

  
  
Posted one year ago

In other words, I'd like to create 3 queues via helm install. Each queue has its own podTemplate Is this possible?

  
  
Posted one year ago

How do I setup the clearml k8s glue?

  
  
Posted one year ago

So that it spins up nodes

  
  
Posted one year ago

My next question, how do I add more queues?

  
  
Posted one year ago

Yes, this is exactly how the clearml k8s glue works (notice the resource allocation, spin nodes up/down, is done by k8s which sometimes do take some time, if you only need "bare metal nodes" on the cloud, it might be more efficient to use the aws autoscaler, that essentially does the same thing

  
  
Posted one year ago

For example, in my agent helm yaml, I have
` queue: default

podTemplate:
nodeSelector:
purpose: gpu-nvidia-t4-c8-m32-g1-od `

  
  
Posted one year ago

Also how do I provide the k8 glue agent permissions to spin up/down ec2 nodes?

  
  
Posted one year ago

So I'd create the queue in the UI, then update the helm yaml as above, and install? How would I add a 3rd queue?

Same process?!

Also I'd like to create the queues pragmatically, is that possible?

Yes, you can, you can also pass an argument for the agent to create the queue if it does not already exist, just add --create-queue to the agent execution commandline

  
  
Posted one year ago

Yep 🙂
Also maybe worth changing the entry point of the agent docker to always create a queue if it is missing?

  
  
Posted one year ago

Also I'd like to create the queues pragmatically, is that possible?

  
  
Posted one year ago

yea, does the enterprise version have more functionality like this?

yes, all sorts of bit and pieces for easier DevOps / K8s etc.

  
  
Posted one year ago

Would I copy and paste this block to produce another queue and k8 glue agent?

  
  
Posted one year ago

I got everything working using the default queue. I can submit an experiment, and a new GPU node is provisioned, all good

Nice!

My next question, how do I add more queues?

You can create new queues in the UI and spin a new glue for the queue (basically think of a queue as an abstraction for a specific type of resource)
Make sense ?

  
  
Posted one year ago

The agents are docker containers, how do I modify the startup script so it creates a queue? It seems like having additional queues beyond default is not handled by helm installs?

  
  
Posted one year ago

So I'd create the queue in the UI, then update the helm yaml as above, and install? How would I add a 3rd queue?

  
  
Posted one year ago

How would I do similar with a new queue

  
  
Posted one year ago
581 Views
30 Answers
one year ago
one year ago
Tags