Unanswered
Hey!
Did Anyone Had Experience With Setting Up Clearml K8S-Based Agents To Create K8S Jobs Connected To The Node'S Gpu?
Running K3S Over A Local Server
Thanks, As This Is Currently Blocking Us
Hi @<1523701070390366208:profile|CostlyOstrich36> ,
I tried setting up in the clearml-agent helm chart values requests & limits under the k8sGlue configuration in order to force the pods to pick up the gpu from the server, while of course choosing a pod image for the k8s jobs that includes a gpu in it (we're using nvidia/cuda:12.4.1 for testing)
the job is created - but simply can't detect a GPU. attaching the value overrides im using for the chart -
agentk8sglue:
apiServerUrlReference: "http://<server-ip>:30008"
fileServerUrlReference: "http://<server-ip>:30081"
webServerUrlReference: "http://<server-ip>:30080"
queue: "qubo-emulator"
replicaCount: 1
basePodTemplate:
resource:
requests:
cpu: "2"
memory: "4Gi"
nvidia.com/gpu: "1"
limits:
cpu: "2"
memory: "4Gi"
nvidia.com/gpu: "1"
clearml:
{
"agentk8sglueKey": "<key>",
"agentk8sglueSecret": "<secret>"
}
sessions:
svcType: "NodePort"
startingPort: 30000
maxServices: 20
externalIP: "<node's IP>"
14 Views
0
Answers
14 days ago
14 days ago