Answered

Hey, Just Trying Out Clearml-Serving And Getting The Following Error

Hey,

Just trying out clearml-serving and getting the following error the provided PTX was compiled with an unsupported toolchain in the clearml-serving-triton container. My guess is it's something from the converting PyTorch code to TorchScript. I'm getting this error when trying the examples/pytorch . SuccessfulKoala55 or JitteryCoyote63 would be great if you can help 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					RobustFox47
				
					0
					 × 1

Votes Newest

Answers 18

Thanks RobustRat47 !
Should we put somewhere this requirement ? (i.e. nvidia drivers) ?
Is this really a must ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

RobustRat47
What exactly is the error you are getting ? (I remember only the latest Triton solved some issue there)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I'm using "allegroai/clearml-serving-triton:latest" container I was just debugging using the base image

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					RobustFox47
				
					0
					 × 1

RobustRat47 what's the Triton container you are using ?
BTW, the Triton error is:
model_repository_manager.cc:1152] failed to load 'test_model_pytorch' version 1: Internal: unable to create stream: the provided PTX was compiled with an unsupported toolchain.https://github.com/triton-inference-server/server/issues/3877

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

$ curl -X 'POST' ' ' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{ "url": " " }' {"digit":5}

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					RobustFox47
				
					0
					 × 1

I'll add a more detailed response once it's working

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					RobustFox47
				
					0
					 × 1

Yes already tried that but it seems there's some form of mismatch with a C/C++ lib.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					RobustFox47
				
					0
					 × 1

Just for ref if anyone has this issue. I had to update my cuda drivers to 510 on system os

` docker run --gpus=0 -it nvcr.io/nvidia/tritonserver:22.02-py3

=============================
== Triton Inference Server ==

NVIDIA Release 22.02 (build 32400308)

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:

ERROR: This container was built for NVIDIA Driver Release 510.39 or later, but
version 470.103.01 was detected and compatibility mode is UNAVAILABLE.

   [[Forward compatibility was attempted on non supported HW (CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE) cuInit()=804]]

root@36a9fc676a25:/opt/tritonserver# `

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					RobustFox47
				
					0
					 × 1

I can raise this as an issue on the repo if that is useful?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					RobustFox47
				
					0
					 × 1

I can raise this as an issue on the repo if that is useful?

I think this is a good idea, at least increased visibility 🙂
Please do 🙏

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Okay just for clarity...

Originally, my Nvidia drivers were running on an incompatible version for the triton server
This container was built for NVIDIA Driver Release 510.39 or later, but version 470.103.01 was detected and compatibility mode is UNAVAILABLE.
To fix this issue I updated the drivers on my base OS i.e.
sudo apt install nvidia-driver-510 -y sudo reboot
Then it worked. The docker-compose logs from clearml-serving-triton container did not make this clear (i.e. by running docker-compose -f docker/docker-compose-triton-gpu.yml logs -f ) might be good to throw this as an error in the logs 🙂

AgitatedDove14 let me know if there's anything else I can provide that is useful for you.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					RobustFox47
				
					0
					 × 1

Still debugging.... That fixed the issue with the
`nvcr.io/nvidia/tritonserver:22.02-py3` container which now returns
` =============================
== Triton Inference Server ==

NVIDIA Release 22.02 (build 32400308)

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:

root@c702b766ba35:/opt/tritonserver# `I'm now testing if the clearml-serving repo works. I'll keep this thread updated 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					RobustFox47
				
					0
					 × 1

Hi RobustRat47

My guess is it's something from the converting PyTorch code to TorchScript. I'm getting this error when trying the

I think you are correct see here:
https://github.com/allegroai/clearml-serving/blob/d15bfcade54c7bdd8f3765408adc480d5ceb4b45/examples/pytorch/train_pytorch_mnist.py#L136
you have to convert the model to TorchScript for Triton to serve it

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

RobustRat47 are you saying updating the nvidia drivers solved the issue ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

It might only be a req for the docker/docker-compose-triton-gpu.yml file but I'd need to check

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					RobustFox47
				
					0
					 × 1

The latest commit to the repo is 22.02-py3 ( https://github.com/allegroai/clearml-serving/blob/d15bfcade54c7bdd8f3765408adc480d5ceb4b45/clearml_serving/engines/triton/Dockerfile#L2 ) I will have a look at versions now 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					RobustFox47
				
					0
					 × 1

Yay 🥳

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					RobustFox47
				
					0
					 × 1

Notice that we are using the same version:
https://github.com/allegroai/clearml-serving/blob/d15bfcade54c7bdd8f3765408adc480d5ceb4b45/clearml_serving/engines/triton/Dockerfile#L2
The reason was that previous version did not support torchscript, (similar error you reported)
My question is, why don't you use the "allegroai/clearml-serving-triton:latest" container ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

2K Views

18 Answers

3 years ago

2 years ago

Answers 18

=============================== Triton Inference Server ==

Still debugging.... That fixed the issue with thenvcr.io/nvidia/tritonserver:22.02-py3 container which now returns` =============================== Triton Inference Server ==

=============================
== Triton Inference Server ==

Still debugging.... That fixed the issue with the
`nvcr.io/nvidia/tritonserver:22.02-py3` container which now returns
` =============================
== Triton Inference Server ==