Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Anyone

Hi anyone tried clearml-serving? Can you please help me debugging below error while launching a clearml triton inference server(using command "clearml-serving launch --queue default ") the error I get at clearml-agent log is the following:

Update model v4 in /models/keras_mnist/4 Starting server: ['tritonserver', '--model-control-mode=poll', '--model-repository=/models', '--repository-poll-secs=600.0', '--metrics-port=8002', '--allow-metrics=true', '--allow-gpu-metrics=true'] Traceback (most recent call last): File "clearml_serving/triton_helper.py", line 214, in <module> main() File "clearml_serving/triton_helper.py", line 209, in main metric_frequency_sec=args.metric_frequency*60.0, File "clearml_serving/triton_helper.py", line 117, in maintenance_daemon proc = subprocess.Popen(cmd) File "/anaconda3/lib/python3.7/subprocess.py", line 800, in __init__ restore_signals, start_new_session) File "/anaconda3/lib/python3.7/subprocess.py", line 1551, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'tritonserver': 'tritonserver'Note: I have installed required triton version(can be seen in log file) and please check attached file for complete log information

  
  
Posted 2 years ago
Votes Newest

Answers 19


AstonishingWorm64 I found the issue.
The cleamlr-serving assume the agent is working in docker mode, as it Has to have the triton docker (where triton engine is installed).
Since you are running in venv mode, tritonserver is not installed, hence the error

  
  
Posted 2 years ago

Hi AstonishingWorm64
Is this the same ?
https://github.com/allegroai/clearml-serving/issues/1
(I think it was fixed on the later branch, we are releasing 0.3.2 later today with a fix)
Can you try:
pip install git+

  
  
Posted 2 years ago

(I'll make sure we reply on the issue as well later)

  
  
Posted 2 years ago

Hi AgitatedDove14 , thanks for the reply!

It's not the same issue that you just pointed, in fact the issue is raised after launching inference onto the queue using below commands
` clearml-serving triton --project "serving" --name "serving example"

clearml-serving triton --endpoint "keras_mnist" --model-project "examples" --model-name "Keras MNIST serve example - serving_model"

clearml-serving launch --queue default `

  
  
Posted 2 years ago

Tried installing latest clearml-serving from git, but still not luck same error persists.

I have attached both serving service and serving engine(triton) console logs from clearml-server, please have a look at them

  
  
Posted 2 years ago

FileNotFoundError: [Errno 2] No such file or directory: 'tritonserver': 'tritonserver'

This is oddd.
Can you retry with the latest from the github ?
pip install git+

  
  
Posted 2 years ago

This solved tritonserver not found issue but now a new error is occuring which is UNAVAILABLE: Internal: unable to create stream: the provided PTX was compiled with an unsupported toolchain

Please check attached log file for complete console log.

And also I am facing issue while initializing serving server and triton engine using below two commands:
clearml-serving triton --project "serving" --name "serving ex1"
clearml-serving triton --endpoint "inference" --model-project "serving" --model-name "exp_v1"So after the second command I am seeing below error
Error: No projects found when searching forDevOps
But when I clubbed these two commands as a single command like below, the error disappeared so I went on and launced service and engine, does this change in blending of commands resulted in above error?
clearml-serving triton --project "serving" --name "serving ex1" --endpoint "inference" --model-project "serving" --model-name "exp_v1"

  
  
Posted 2 years ago

Bottom line the driver version in the host machine does not support the CUDA version you have in the docker container

  
  
Posted 2 years ago

Server that I'm using has GPU Driver Version: 455.23.05 and CUDA Version: 11.1, Conda is also installed and clearml-serving is installing cuda version 10.1 for which gpu drivers should be >= 418.39 so I guess version mismatch is not the problem and currently I can't update gpu drivers since other processes are running.

And also I tried overriding clearml.conf file and changed default docker image by modifying below line
image: "nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04"to this:
image: "nvidia/cuda:11.1-cudnn8-runtime-ubuntu18.04"But still same same error the provided PTX was compiled with an unsupported toolchain occurred while launching triton-engine

  
  
Posted 2 years ago

So instead of updating gpu drivers can we install a lower compatible version of CUDA inside docker for clearml-serving?

Also when I checked log file I found this
agent.default_docker.image = nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 agent.enable_task_env = false agent.git_user =  agent.default_python = 3.8 agent.cuda_version = 112This might be a dumb question but I'm confused with CUDA version being installed here, is it 10.1(from first line) or 11.2(from last line)?

  
  
Posted 2 years ago

That is a good question, usually the cuda version is automatically detected, unless you overrride it with the conf file or OS env. What's the setup? Are you using as package manager ? (conda actually installs CUDA drivers, if the original Task was executed on a machine with conda, it will take the CUDA version automatically, reason is to match the CUDA/Torch/TF)

  
  
Posted 2 years ago

By default clearml-serving is installing triton version 21.03, can we somehow override this to install some other version. I tried to configure but could not find anything related to tritonserver in clearml.conf file. So can you please guide me on this?

  
  
Posted 2 years ago

Hi AstonishingWorm64
I think you are correct, there is external interface to change the docker.
Could you open a GitHub issue so we do not forget to add an interface for that ?
As a temp hack, you can manually clone "triton serving engine" and edit the container image (under the execution Tab).
wdyt?

  
  
Posted 2 years ago

yeah sounds good

  
  
Posted 2 years ago

The latest image seems to require drivers on the host 460+
try this one:
https://docs.nvidia.com/deeplearning/triton-inference-server/release-notes/rel_20-12.html#rel_20-12

  
  
Posted 2 years ago

I already shared log from UI, anyways I'm sharing log for recently tried experiment please find the attachment

  
  
Posted 2 years ago

AstonishingWorm64 can you share the full log (In the UI under Results/Console there is a download button)?

  
  
Posted 2 years ago
493 Views
19 Answers
2 years ago
one year ago
Tags
Similar posts