BattyLizard6 to my knowledge the main issue with fractional GPU, is there is no real restriction on GPU memory allocation (with the exception of MIG slices, which is limited in other ways).
Basically one process/container can consume the maximum GPU ram on the allocated card (this also includes http://run.ai fractional solution, at least from what I understand).
This means that developer A can allocate memory so that developer B on the same GPU will start getting out-of-memory
(Notice in a few k8s solution you can ask for specific amount of GPU ram, but in runtime there are no actual restrictions)
So basically development on a "shared" GPU?
We want to have many people working on a cluster of machines and we want to be able to allocate fraction of GPU to specific jobs, to avoid starvation
Sure thing, any specific reason for querying on multi pod per GPU?
Is this for remote development process ?
BTW: the funny thing is, on bare metal machines multi GPU woks out of he box, and deploying it with bare metal clearml-agents is very simple
Hi BattyLizard6
does clearml orchestration have the ability to break gpu devices into virtual ones?
So this is fully supported on A100 with MIG slices. That said dynamic multi-tenant GPU on Kubernetes is a Kubernetes issue... We do support multi agents on the same GPU on bare metal, or over shared GPU instances over k8s with:
https://github.com/nano-gpu/nano-gpu-agent
https://github.com/intel/intel-device-plugins-for-kubernetes/tree/main/cmd/gpu_plugin#fractional-resources
https://github.com/NTHU-LSALAB/KubeShare
https://github.com/AliyunContainerService/gpushare-scheduler-extender
Hi I mean something like what runai are doing, or how would you work together with http://run.ai ?
Hi, do you mean out of the box virtualization of your gpu or using virtual gpus on the machine?