Reputation
Badges 1
9 × Eureka!So instead of updating gpu drivers can we install a lower compatible version of CUDA inside docker for clearml-serving?
Also when I checked log file I found thisagent.default_docker.image = nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 agent.enable_task_env = false agent.git_user = agent.default_python = 3.8 agent.cuda_version = 112
This might be a dumb question but I'm confused with CUDA version being installed here, is it 10.1(from first line) or 11.2(from last line)?
I already shared log from UI, anyways I'm sharing log for recently tried experiment please find the attachment
Tried installing latest clearml-serving from git, but still not luck same error persists.
I have attached both serving service and serving engine(triton) console logs from clearml-server, please have a look at them
By default clearml-serving is installing triton version 21.03, can we somehow override this to install some other version. I tried to configure but could not find anything related to tritonserver in clearml.conf file. So can you please guide me on this?
Server that I'm using has GPU Driver Version: 455.23.05 and CUDA Version: 11.1, Conda is also installed and clearml-serving is installing cuda version 10.1 for which gpu drivers should be >= 418.39 so I guess version mismatch is not the problem and currently I can't update gpu drivers since other processes are running.
And also I tried overriding clearml.conf file and changed default docker image by modifying below lineimage: "nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04"
to this:
` imag...
This solved tritonserver not found issue but now a new error is occuring which is UNAVAILABLE: Internal: unable to create stream: the provided PTX was compiled with an unsupported toolchain
Please check attached log file for complete console log.
And also I am facing issue while initializing serving server and triton engine using below two commands:clearml-serving triton --project "serving" --name "serving ex1"
` clearml-serving triton --endpoint "inference" --model-project "ser...
Hi AgitatedDove14 , thanks for the reply!
It's not the same issue that you just pointed, in fact the issue is raised after launching inference onto the queue using below commands
` clearml-serving triton --project "serving" --name "serving example"
clearml-serving triton --endpoint "keras_mnist" --model-project "examples" --model-name "Keras MNIST serve example - serving_model"
clearml-serving launch --queue default `
Yeah uncommenting that line worked, Thanks for the help AgitatedDove14 🙂