Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
On A Related Line But More Complicated: How Can We Ask The Autoscaler To Queue, Say, N Jobs On An N-Gpu Machine, Please? For Example, On Aws, Nvidia A100 Gpus Are Only Available On Instances With 8X A100, Which Is Overkill For A Single-Gpu Job, So Might A

On a related line but more complicated: how can we ask the Autoscaler to queue, say, N jobs on an N-GPU machine, please? For example, on AWS, NVIDIA A100 GPUs are only available on instances with 8x A100, which is overkill for a single-GPU job, so might as well use that instance for other jobs too.
(And yes, it does raise the question of optimal packing/scheduling, so might be a complicated can of worms)

  
  
Posted one year ago
Votes Newest

Answers 7


@<1541954607595393024:profile|BattyCrocodile47>

Is that instance only able to handle one task at a time?

You could have multiple agents on the same machine, each one with its own dedicated GPU, but you will not be able to change the allocation (i.e. now I want 2 GPUs on one agent) without restarting the agents on the instance. In either case, this is for a "bare-metal" machine, and in the AWS autoscaler case, this goes under "dynamic" GPUs (see above)

  
  
Posted one year ago

My understanding may be bad. Say I have a single EC2 instance. Is that instance only able to handle one task at a time?

Or can I start multiple instances of the clearml-agent process on it and then have one task per agent?

And if that's the case, can we have multiple agents on the EC2 instance listening to the same queue, e.g. default . Or would this only work if they were listening to different queues?

  
  
Posted one year ago

We could use our 8xA100 as 8 workers, for 8 single-gpu jobs running faster than on a single 1xV100 each.

@<1546665634195050496:profile|SolidGoose91> I think that in order to have the flexibility there you need the "dynamic" GPU allocation that is only part of the "enterprise" offering 😞
That said, why not allocate a single a100 machine? no?

  
  
Posted one year ago

@<1523701070390366208:profile|CostlyOstrich36> Any idea please? We could use our 8xA100 as 8 workers, for 8 single-gpu jobs running faster than on a single 1xV100 each.

  
  
Posted one year ago

Yes, it's pretty lame that a clearml-agent can only process one task at a time if it's not listening to a services queue 🤔

  
  
Posted one year ago

I see. Is it possible for two agents to be utilizing the same GPU? (like if the machine has a terrific GPU, but only one of them?)

  
  
Posted one year ago

. Is it possible for two agents to be utilizing the same GPU?

It is, as long as memory wise they do not limit one another.
(If you are using k8s and clearml enterprise, then it supports GPU slicing and dynamic memory allocation)

  
  
Posted one year ago
651 Views
7 Answers
one year ago
one year ago
Tags