this seems to be confirmed by this documentation None If you have not changed the default runtime on your GPU nodes, you must explicitly request the NVIDIA runtime by setting runtimeClassName: nvidia
in the Pod spec:
Hello AntsyElk37 🙂
You are right, the spec.runtimeClassName
field is not supported in the Agent at the moment, I'll work on your Pull Request ASAP.
Could you elaborate a bit about why you need Tasks Pods to specify the runtimeclass to use GPUs?
Usually, you'd need to specify a Pod's container with, for example, resources.limits.nvidia.com/gpu
: 1
, and the Nvidia Device Plugin would itself assign the correct device to the container. Will that work?
i'm still trying to understand why it was needed in our case. i have a nvidia gpu operator with mostly the default values installed on our on prem cluster. i found there is an option CONTAINERD_SET_AS_DEFAULT in the operator, which, when enabled, puts the runtime for all pods. we didn't enable that option, maybe if we had enabled it would have worked.