Hi @<1769534182561681408:profile|ReassuredFrog10> , do you have a GPU available? Maybe try the other docker compose without Triton as that one is specifically built for GPU inference.
			
				Answered
			
			
 
			
	
		
			
		
		
		
		
	
			
 				
	
	
		
			
		
		
		
		
	
 
					
		
		Hi Everyone,
I’M Encountering A Cpu Bottleneck While Performing Inference With Clearml Serving And Am Hoping To Get Some Assistance.
Setup: I Have Successfully Deployed A Clearml Server And Configured Clearml Serving Following The Instructions Provided He
Hi everyone,
I’m encountering a CPU bottleneck while performing inference with ClearML Serving and am hoping to get some assistance.
Setup: I have successfully deployed a ClearML Server and configured ClearML Serving following the instructions provided here:  ClearML Serving Setup . I’m specifically using the  docker-compose-triton.yml  file, as I’m working with ONNX models.
Issue: During inference on ClearML Serving, I’ve noticed that only a single CPU core is being utilized, while the remaining cores remain idle. This causes the inference to time out each time. Is there a way to distribute the workload across multiple CPU cores to improve performance?
Thanks in advance for any help or suggestions!
931 Views
				1
Answer
				
					 
	11 months ago
				
					
						 
	11 months ago
					
					 Tags