ClearML FAQ | Hi Everyone! We’Re Facing An Issue Where Clearml Workloads Run Successfully On Our Kubernetes Cluster (Community Edition), But Never Utilize The Gpu

Unanswered

Hi Everyone! We’Re Facing An Issue Where Clearml Workloads Run Successfully On Our Kubernetes Cluster (Community Edition), But Never Utilize The Gpu — Despite Being Scheduled On

Hey @<1523701070390366208:profile|CostlyOstrich36> , thanks for the suggestion!
Yes, I did manually run the same code on the worker node (e.g., using python3 llm_deployment.py ), and it successfully utilized the GPU as expected.
What I’m observing is that when I deploy the workload directly on the worker node like that, everything works fine — the task picks up the GPU, logs stream back properly, and execution behaves normally.
However, when I submit the same code using clearml-task from the control node (which schedules it to the same GPU-enabled worker), the task starts and even detects the GPU (e.g., sees cuda:0 ), but doesn’t actually utilize it.
Let me know if I might be missing something in the configuration. Really appreciate the help!

  				
Posted 
	4 months ago

					More
				  		
  Report
		
					ConvolutedRaven86
				
					0
					 × 1

83 Views

0 Answers

4 months ago