Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hey! Did Anyone Had Experience With Setting Up Clearml K8S-Based Agents To Create K8S Jobs Connected To The Node'S Gpu? Running K3S Over A Local Server Thanks, As This Is Currently Blocking Us

Hey!
Did anyone had experience with setting up clearml k8s-based agents to create k8s jobs connected to the node's gpu?
running k3s over a local server
Thanks, as this is currently blocking us

  
  
Posted 15 days ago
Votes Newest

Answers 4


Hi @<1710827340621156352:profile|HungryFrog27> , what seems to be the issue?

  
  
Posted 14 days ago

Hi @<1523701070390366208:profile|CostlyOstrich36> ,
I tried setting up in the clearml-agent helm chart values requests & limits under the k8sGlue configuration in order to force the pods to pick up the gpu from the server, while of course choosing a pod image for the k8s jobs that includes a gpu in it (we're using nvidia/cuda:12.4.1 for testing)

the job is created - but simply can't detect a GPU. attaching the value overrides im using for the chart -

agentk8sglue:
          apiServerUrlReference: "http://<server-ip>:30008"
          fileServerUrlReference: "http://<server-ip>:30081"
          webServerUrlReference: "http://<server-ip>:30080"
          queue: "qubo-emulator"
          replicaCount: 1
          basePodTemplate:
            resource:
              requests:
                cpu: "2"
                memory: "4Gi"
                nvidia.com/gpu: "1"
              limits:
                cpu: "2"
                memory: "4Gi"
                nvidia.com/gpu: "1"
        clearml:
          {
            "agentk8sglueKey": "<key>",
            "agentk8sglueSecret": "<secret>"
          }
        sessions:
          svcType: "NodePort"
          startingPort: 30000
          maxServices: 20
          externalIP: "<node's IP>"
  
  
Posted 14 days ago

@<1710827340621156352:profile|HungryFrog27> have you installed the Nvidia gpu-operator to advertise GPUs to Kubernetes?

  
  
Posted 13 days ago

Not yet, I tried making it work manually. Might give it a try, thanks!

  
  
Posted 13 days ago
78 Views
4 Answers
15 days ago
12 days ago
Tags