Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Heya, Is There Any Plan For Clearml To Leverage The New

Heya, is there any plan for ClearML to leverage the new https://developer.nvidia.com/blog/getting-the-most-out-of-the-a100-gpu-with-multi-instance-gpu/ tech introduced with the A100 TPU, so we can config a ClearML agent deployed on a machine equipped with an A100 to count as an arbitrary number of workers and dispatch tasks in the queue to multiple GPU instances of the same machine ?

  
  
Posted one year ago
Votes Newest

Answers 7


for a TPU with more than 16GB GRAM and less than 40GB, so sometime we need to provision a A100 to get the training speed we want but we don't use all the GRAM

Oh that makes sense...
Just saw this one, this might help?
https://www.globenewswire.com/news-release/2022/10/24/2539924/0/en/ClearML-and-Genesis-Cloud-Announce-New-MLOps-Partnership-Delivering-100-Green-Energy-Compute-Solution-for-Machine-Learning.html

  
  
Posted one year ago

I think it's supposed to be out early Nov 🙂

  
  
Posted one year ago

Hi FierceHamster54
This is already supported, unfortunately the open-source version only supports static allocation (i.e you can spin multiple agents and connect each one to specific number of GPUs), the dynamic option (where you have single agent allocating jobs to multiple GPUs / Slices is only part of the enterprise edition
(there is the hidden assumption there that if you spent so much on a DGX you are probably not a small team 🙂 )

  
  
Posted one year ago

Hey, I'm a SaaS user in PRO tier and I was wondering if it was a feature available on the auto-scaler apps so I could improve the cost-efficiency of my provisionned GCP A100 instances

  
  
Posted one year ago

There is a gap in the GPU offer on GCP and there is no modern middle-ground for a TPU with more than 16GB GRAM and less than 40GB, so sometime we need to provision a A100 to get the training speed we want but we don't use all the GRAM so I figured out if we could batch 2 training tasks on the same A100 instance we would still be on the winning side in term of CUDA cores and getting the most of the GPU-time we're paying.

  
  
Posted one year ago

I could improve the cost-efficiency of my provisionned GCP A100 instances

But their pricing is linear, if you do not need a100 get a cheaper instance ?! no?

  
  
Posted one year ago

Oh wow, would definitely try it out if there were an Autoscaler App integrating it with ClearML

  
  
Posted one year ago
577 Views
7 Answers
one year ago
one year ago
Tags