Hi Martin, I admit I don't know about MIG I'll have to ask some of our engineers.
As for the memory, yes the reasoning is clear, the main thing we'll have to see is hot define the limits, because we have nodes with quite different resources available, and this might get tricky, but I'll try and let's see what happens 🙂
We actually plan to create different queues for different types of workloads, we are a bit seeing what the actual usage is to define what type of workloads make sense for us.
Containers (and Pods) do not share GPUs. There's no overcommitting of GPUs.
Actually I am as well, this is Kubernets doing the resource scheduling and actually Kubernetes decided it is okay to run two pods on the Same GPU, which is cool, but I was not aware Nvidia already added this feature (I know it was in beta for a long time)
https://developer.nvidia.com/blog/improving-gpu-utilization-in-kubernetes/
I also see thety added dynamic slicing and Memory Proteciton:
Notice you can control the number of pods per GPU
This mechanism for enabling “time-sharing” of GPUs in Kubernetes allows a system administrator to define a set of “replicas” for a GPU, each of which can be handed out independently to a pod to run workloads on. Unlike MIG, there is no memory or fault-isolation between replicas, but for some workloads this is better than not being able to share at all. Internally, GPU time-slicing is used to multiplex workloads from replicas of the same underlying GPU.
https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/gpu-sharing.html#introduction
https://github.com/NVIDIA/k8s-device-plugin#shared-access-to-gpus-with-cuda-time-slicing
Lastly are you using MIG enabled devices? If so you can limit the memory per shared Pod:
https://github.com/NVIDIA/k8s-device-plugin/blob/e2b4ff39b5b4cebe702c8aa102b914b03f6eb81d/README.md#configuration-option-details
Back to the original remark SarcasticSquirrel56 limiting the Pod allocation can also be done via the general k8s "memory limit" requirement, which will take place on top of the GPU plugin, so essentially let's say we have a node with 100GB RAM we can have a pod limit of 25GB, which means no more than 4 pods will be running on the same node.
Does that make sense ?
Hi Martin, thanks for the explanation! I work with Maggie and help with the ClearML setup.
Just to be sure, currently, the PodTemplate contains:
resources: limits: nvidia.com/gpu: 1
you are suggesting to add also, something like:requests: memory: "100Mi" limits: memory: "200Mi"
is that correct?
On a related note, I am a bit puzzled by the fact that all the 4 GPUs are visible.
In the https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/ , it says:Containers (and Pods) do not share GPUs. There's no overcommitting of GPUs.
so I am surprised that this situation happens.
Oh that makes sense, This depends on how you setup the clearml k8s glue, (becuase the resource allocation is done by k8s) a good hack to limit the number of containers per GPU is to set a RAM limitation per pod, then k8s will know to limit the number of pods on the same GPU machine,
wdty?
We actually plan to create different queues for different types of workloads, we are a bit seeing what the actual usage is to define what type of workloads make sense for us.
That sounds like a great path to take, it will make it very clear fro users on what they will be getting and why they should use specific queue.
As for the memory, yes the reasoning is clear, the main thing we'll have to see is hot define the limits, because we have nodes with quite different resources available, and this might get tricky, but I'll try and let's see what happens
Not sure if it helps, but I think this resource limitation (or book-keeping if you will) is one of the advanced feature of the enterprise version, But I would probably start with something simple just to get going before jumping to it.
DefeatedMoth52 how many agents do you have running on the same GPU ?
AgitatedDove14 Since the agents are running on the server and set-up by k8s, more than 1 agent can run on the same GPU. So far I’ve tried running up to 4 concurrent tasks. This means if they all get cloned to use the same GPU, max agent would be 4.