Hi
I'm trying to setup CLearML Serving however the clearml-serving-triton
keeps failing to start.
Attached the full docker-compose up log and the docker-compose.yml I used.
Here are main error snippets:
clearml-serving-triton | Starting server: ['tritonserver', '--model-control-mode=poll', '--model-repository=/models', '--repository-poll-secs=60.0', '--metrics-port=8002', '--allow-metrics=true', '--allow-gpu-metrics=true']
clearml-serving-triton | Traceback (most recent call last):
clearml-serving-triton | File "clearml_serving/engines/triton/triton_helper.py", line 588, in <module>
clearml-serving-triton | main()
clearml-serving-triton | File "clearml_serving/engines/triton/triton_helper.py", line 580, in main
clearml-serving-triton | helper.maintenance_daemon(
clearml-serving-triton | File "clearml_serving/engines/triton/triton_helper.py", line 274, in maintenance_daemon
clearml-serving-triton | raise ValueError("triton-server process ended with error code {}".format(error_code))
clearml-serving-triton | ValueError: triton-server process ended with error code 1
clearml-serving-triton | Error: Failed to initialize NVML
clearml-serving-triton | W0401 15:36:36.685770 47 metrics.cc:571] DCGM unable to start: DCGM initialization error
This is my NVIDIA-SMI output.
This is nvcc -V
output.
▶ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
I don't know what is causing the problem.
Any help would be appreciated! Thanks!
