Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi I'M Trying To Setup Clearml Serving However The

Hi

I'm trying to setup CLearML Serving however the clearml-serving-triton keeps failing to start.

Attached the full docker-compose up log and the docker-compose.yml I used.

Here are main error snippets:

clearml-serving-triton        | Starting server: ['tritonserver', '--model-control-mode=poll', '--model-repository=/models', '--repository-poll-secs=60.0', '--metrics-port=8002', '--allow-metrics=true', '--allow-gpu-metrics=true']
clearml-serving-triton        | Traceback (most recent call last):
clearml-serving-triton        |   File "clearml_serving/engines/triton/triton_helper.py", line 588, in <module>
clearml-serving-triton        |     main()
clearml-serving-triton        |   File "clearml_serving/engines/triton/triton_helper.py", line 580, in main
clearml-serving-triton        |     helper.maintenance_daemon(
clearml-serving-triton        |   File "clearml_serving/engines/triton/triton_helper.py", line 274, in maintenance_daemon
clearml-serving-triton        |     raise ValueError("triton-server process ended with error code {}".format(error_code))
clearml-serving-triton        | ValueError: triton-server process ended with error code 1
clearml-serving-triton        | Error: Failed to initialize NVML
clearml-serving-triton        | W0401 15:36:36.685770 47 metrics.cc:571] DCGM unable to start: DCGM initialization error

This is my NVIDIA-SMI output.
This is nvcc -V output.

▶ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

I don't know what is causing the problem.
Any help would be appreciated! Thanks!
image

  
  
Posted 5 days ago
Votes Newest

Answers 3


Hi SuperficialDolphin93
The error seems like nvml fails to initialize inside the container, you can test it with nvidia-smi and check if that wirks
Regrading Cuda version the ClearML serving inherits from the Triton container, could you try to build a new one with the latest Triton container (I think 25). The docker compose is in the cleaml serving git repo. wdyt?

  
  
Posted 3 days ago

Thanks AgitatedDove14
Works well

  
  
Posted 14 hours ago

Hi,
My conclusion is that these errors are probably caused by CUDA version mismatch.
The latest clearml-serving-triton docker container is built for CUDA 11.7
and my machine is configured with CUDA 12.X

Are there any plans to support ClearML serving with CUDA 12.X in the near future?

  
  
Posted 3 days ago