Hi @<1769534182561681408:profile|ReassuredFrog10> , do you have a GPU available? Maybe try the other docker compose without Triton as that one is specifically built for GPU inference.
Answered
Hi Everyone,
I’M Encountering A Cpu Bottleneck While Performing Inference With Clearml Serving And Am Hoping To Get Some Assistance.
Setup: I Have Successfully Deployed A Clearml Server And Configured Clearml Serving Following The Instructions Provided He
Hi everyone,
I’m encountering a CPU bottleneck while performing inference with ClearML Serving and am hoping to get some assistance.
Setup: I have successfully deployed a ClearML Server and configured ClearML Serving following the instructions provided here: ClearML Serving Setup . I’m specifically using the docker-compose-triton.yml
file, as I’m working with ONNX models.
Issue: During inference on ClearML Serving, I’ve noticed that only a single CPU core is being utilized, while the remaining cores remain idle. This causes the inference to time out each time. Is there a way to distribute the workload across multiple CPU cores to improve performance?
Thanks in advance for any help or suggestions!
56 Views
1
Answer
15 days ago
14 days ago
Tags