Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Clearml, Does Clearml Orchestration Have The Ability To Break Gpu Devices Into Virtual Ones?

Hi clearml, does clearml orchestration have the ability to break gpu devices into virtual ones?

  
  
Posted 2 years ago
Votes Newest

Answers 9


Ok - thanks AgitatedDove14

  
  
Posted 2 years ago

What is your use case?

  
  
Posted 2 years ago

Hi, do you mean out of the box virtualization of your gpu or using virtual gpus on the machine?

  
  
Posted 2 years ago

Hi I mean something like what runai are doing, or how would you work together with http://run.ai ?

  
  
Posted 2 years ago

Hi BattyLizard6

does clearml orchestration have the ability to break gpu devices into virtual ones?

So this is fully supported on A100 with MIG slices. That said dynamic multi-tenant GPU on Kubernetes is a Kubernetes issue... We do support multi agents on the same GPU on bare metal, or over shared GPU instances over k8s with:
https://github.com/nano-gpu/nano-gpu-agent
https://github.com/intel/intel-device-plugins-for-kubernetes/tree/main/cmd/gpu_plugin#fractional-resources
https://github.com/NTHU-LSALAB/KubeShare
https://github.com/AliyunContainerService/gpushare-scheduler-extender

  
  
Posted 2 years ago

Sure thing, any specific reason for querying on multi pod per GPU?
Is this for remote development process ?
BTW: the funny thing is, on bare metal machines multi GPU woks out of he box, and deploying it with bare metal clearml-agents is very simple

  
  
Posted 2 years ago

We want to have many people working on a cluster of machines and we want to be able to allocate fraction of GPU to specific jobs, to avoid starvation

  
  
Posted 2 years ago

So basically development on a "shared" GPU?

  
  
Posted 2 years ago

BattyLizard6 to my knowledge the main issue with fractional GPU, is there is no real restriction on GPU memory allocation (with the exception of MIG slices, which is limited in other ways).
Basically one process/container can consume the maximum GPU ram on the allocated card (this also includes http://run.ai fractional solution, at least from what I understand).
This means that developer A can allocate memory so that developer B on the same GPU will start getting out-of-memory
(Notice in a few k8s solution you can ask for specific amount of GPU ram, but in runtime there are no actual restrictions)

  
  
Posted 2 years ago
936 Views
9 Answers
2 years ago
one year ago
Tags