Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello! I Had Trouble Running Clearml-Agent On K8S. I Fixed It By Modifying The Helm Chart To Allow Specifying Runtimeclassname (Which Is Needed When Using Nvidia Gpu Operator). I Did This,

hello! i had trouble running clearml-agent on k8s. i fixed it by modifying the helm chart to allow specifying runtimeClassName (which is needed when using nvidia gpu operator). i did this, None . its trivial. should i do anything more than this ?anybody else running clearml agent on a kubernetes cluster with nvidia gpu-operator ?

  
  
Posted 5 months ago
Votes Newest

Answers 6


Hello @<1523708147405950976:profile|AntsyElk37> 🙂
You are right, the spec.runtimeClassName field is not supported in the Agent at the moment, I'll work on your Pull Request ASAP.
Could you elaborate a bit about why you need Tasks Pods to specify the runtimeclass to use GPUs?
Usually, you'd need to specify a Pod's container with, for example, resources.limits.nvidia.com/gpu : 1 , and the Nvidia Device Plugin would itself assign the correct device to the container. Will that work?

  
  
Posted 5 months ago

this seems to be confirmed by this documentation None If you have not changed the default runtime on your GPU nodes, you must explicitly request the NVIDIA runtime by setting runtimeClassName: nvidia in the Pod spec:

  
  
Posted 5 months ago

hi @<1729671499981262848:profile|CooperativeKitten94> did i convince you with my argument ? do you think having runtimeClass configurable is worth it ?

  
  
Posted 5 months ago

i'm still trying to understand why it was needed in our case. i have a nvidia gpu operator with mostly the default values installed on our on prem cluster. i found there is an option CONTAINERD_SET_AS_DEFAULT in the operator, which, when enabled, puts the runtime for all pods. we didn't enable that option, maybe if we had enabled it would have worked.

  
  
Posted 5 months ago

Hi @<1523708147405950976:profile|AntsyElk37> - There's a few points missing for the PR to be completed, let's follow-up on GitHub. See my comments here None

  
  
Posted 5 months ago

Hi @<1523708147405950976:profile|AntsyElk37> - Yes, having the runtimeClass makes sense. I am handling your PR soon 🙂

  
  
Posted 5 months ago
724 Views
6 Answers
5 months ago
5 months ago
Tags