Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Unanswered
Hi All! Question Around Resource Management Using


Containers (and Pods) do not share GPUs. There's no overcommitting of GPUs.Actually I am as well, this is Kubernets doing the resource scheduling and actually Kubernetes decided it is okay to run two pods on the Same GPU, which is cool, but I was not aware Nvidia already added this feature (I know it was in beta for a long time)
https://developer.nvidia.com/blog/improving-gpu-utilization-in-kubernetes/
I also see thety added dynamic slicing and Memory Proteciton:
Notice you can control the number of pods per GPU

This mechanism for enabling “time-sharing” of GPUs in Kubernetes allows a system administrator to define a set of “replicas” for a GPU, each of which can be handed out independently to a pod to run workloads on. Unlike MIG, there is no memory or fault-isolation between replicas, but for some workloads this is better than not being able to share at all. Internally, GPU time-slicing is used to multiplex workloads from replicas of the same underlying GPU.

https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/gpu-sharing.html#introduction
https://github.com/NVIDIA/k8s-device-plugin#shared-access-to-gpus-with-cuda-time-slicing
Lastly are you using MIG enabled devices? If so you can limit the memory per shared Pod:
https://github.com/NVIDIA/k8s-device-plugin/blob/e2b4ff39b5b4cebe702c8aa102b914b03f6eb81d/README.md#configuration-option-details

Back to the original remark SarcasticSquirrel56 limiting the Pod allocation can also be done via the general k8s "memory limit" requirement, which will take place on top of the GPU plugin, so essentially let's say we have a node with 100GB RAM we can have a pod limit of 25GB, which means no more than 4 pods will be running on the same node.
Does that make sense ?

  
  
Posted one year ago
98 Views
0 Answers
one year ago
one year ago