Hi Everyone, I’M Encountering A Cpu Bottleneck While Performing Inference With Clearml Serving And Am Hoping To Get Some Assistance. Setup: I Have Successfully Deployed A Clearml Server And Configured Clearml Serving Following The Instructions Provided He

Answered

Hi everyone,
I’m encountering a CPU bottleneck while performing inference with ClearML Serving and am hoping to get some assistance.
Setup: I have successfully deployed a ClearML Server and configured ClearML Serving following the instructions provided here: ClearML Serving Setup . I’m specifically using the docker-compose-triton.yml file, as I’m working with ONNX models.
Issue: During inference on ClearML Serving, I’ve noticed that only a single CPU core is being utilized, while the remaining cores remain idle. This causes the inference to time out each time. Is there a way to distribute the workload across multiple CPU cores to improve performance?
Thanks in advance for any help or suggestions!

  				
Posted 
	5 months ago

					More  		
  Report
		
					ReassuredFrog10
				
					0

Votes Newest

Answers

Hi ReassuredFrog10 , do you have a GPU available? Maybe try the other docker compose without Triton as that one is specifically built for GPU inference.

  				
Posted 
	5 months ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

Write your answer

410 Views

1 Answer

5 months ago