Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Heya, Is There Any Plan For Clearml To Leverage The New

Heya, is there any plan for ClearML to leverage the new https://developer.nvidia.com/blog/getting-the-most-out-of-the-a100-gpu-with-multi-instance-gpu/ tech introduced with the A100 TPU, so we can config a ClearML agent deployed on a machine equipped with an A100 to count as an arbitrary number of workers and dispatch tasks in the queue to multiple GPU instances of the same machine ?

  
  
Posted 2 years ago
Votes Newest

Answers 7


Oh wow, would definitely try it out if there were an Autoscaler App integrating it with ClearML

  
  
Posted 2 years ago

Hey, I'm a SaaS user in PRO tier and I was wondering if it was a feature available on the auto-scaler apps so I could improve the cost-efficiency of my provisionned GCP A100 instances

  
  
Posted 2 years ago

I think it's supposed to be out early Nov 🙂

  
  
Posted 2 years ago

Hi FierceHamster54
This is already supported, unfortunately the open-source version only supports static allocation (i.e you can spin multiple agents and connect each one to specific number of GPUs), the dynamic option (where you have single agent allocating jobs to multiple GPUs / Slices is only part of the enterprise edition
(there is the hidden assumption there that if you spent so much on a DGX you are probably not a small team 🙂 )

  
  
Posted 2 years ago

I could improve the cost-efficiency of my provisionned GCP A100 instances

But their pricing is linear, if you do not need a100 get a cheaper instance ?! no?

  
  
Posted 2 years ago

There is a gap in the GPU offer on GCP and there is no modern middle-ground for a TPU with more than 16GB GRAM and less than 40GB, so sometime we need to provision a A100 to get the training speed we want but we don't use all the GRAM so I figured out if we could batch 2 training tasks on the same A100 instance we would still be on the winning side in term of CUDA cores and getting the most of the GPU-time we're paying.

  
  
Posted 2 years ago

for a TPU with more than 16GB GRAM and less than 40GB, so sometime we need to provision a A100 to get the training speed we want but we don't use all the GRAM

Oh that makes sense...
Just saw this one, this might help?
https://www.globenewswire.com/news-release/2022/10/24/2539924/0/en/ClearML-and-Genesis-Cloud-Announce-New-MLOps-Partnership-Delivering-100-Green-Energy-Compute-Solution-for-Machine-Learning.html

  
  
Posted 2 years ago
1K Views
7 Answers
2 years ago
one year ago
Tags