Hi There! Can Anybody Help Me With Specifying The 'Platform' For A Model In Clearml-Serving. I Am Using The K8S Clearml-Serving Setup (Version 1.3.1). I Already Tried A Bunch Of Variants Like

Answered

Hi there! Can anybody help me with specifying the 'platform' for a model in clearml-serving. I am using the k8s clearml-serving setup (version 1.3.1).

I already tried a bunch of variants like
clearml-serving --id 9902b88513644fa0bb89eba35f1f4d99 model add --engine triton --endpoint "advanced_basic_classifier.pytorch" --preprocess "src/preprocessing/preprocess.py" --model-id 837276fc8d8a443fb91f48d722300b0a ...

... --aux-config ".\config.pbtxt" or
... --input-size 1 64 --input-name "INPUT__0" --input-type float32 --output-size 1 11 --output-name "OUTPUT__0" --output-type float32 --aux-config ".\config.pbtxt" or
... --input-size 1 64 --input-name "INPUT__0" --input-type float32 --output-size 1 11 --output-name "OUTPUT__0" --output-type float32 --aux-config platform="pytorch_libtorch"
... --input-size 1 64 --input-name "INPUT__0" --input-type float32 --output-size 1 11 --output-name "OUTPUT__0" --output-type float32 --aux-config platform=\\"pytorch_libtorch\\"
But I always get the error
Poll failed for model directory 'advanced_basic_classifier.pytorch': unexpected 'platform' and 'backend' pair, got:, pytorch
Which is probably because the 'platform' is always put within a 'auxiliary_cfg' entry within the endpoint configuration and not as the top level, how it should.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NuttyCamel41
				
					0
					 × 1

Votes Newest

Answers 13

Hi @<1523701205467926528:profile|AgitatedDove14> thanks for your hint! I already convert it to torch script using tracing. Everything around the model should be fine, since it already worked with the docker clearml-serving setup.
I think the real issue is that I am not able to specify a platform for the model, as the error above tells me that no platform is given no matter how I try to pass it.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NuttyCamel41
				
					0
					 × 1

Hi @<1523701205467926528:profile|AgitatedDove14> the config.pbtxt for 1. looks like this: (because I do not specify input and output type and size within the command)

  backend: "pytorch"
  platform: "pytorch_libtorch"
  input [
    {
      name: "INPUT__0"
      data_type: TYPE_FP32
      dims: [1, 64]
    }
  ]
  output [
    {
      name: "OUTPUT__0"
      data_type: TYPE_FP32
      dims: [1, 11]
    }
  ]

while the config.ptxt for 2. looks like this: (because everything else is already specified in the command)

  backend: "pytorch"
  platform: "pytorch_libtorch"

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NuttyCamel41
				
					0
					 × 1

I think the real issue is that I am not able to specify a platform for the model,

None
there is no need to specify it, remove it from the config.pbtxt - the clearml-serving will automatically add the background

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

What do you mean by "How are you creating the model?"? I executed a pytorch model training saved a traced version of the model so that saved with the executed task. This was also no problem with the docker container setup.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NuttyCamel41
				
					0
					 × 1

Hi @<1523701205467926528:profile|AgitatedDove14> , now there are some interesting things happening: Like I wrote before I got the error message but one minute later the model was added successfully nonetheless. The log says

E0603 09:43:01.652550 41 model_repository_manager.cc:996] Poll failed for model directory 'test_model_pytorch': Invalid model name: Could not determine backend for model 'test_model_pytorch' with no backend in model configuration. Expected model name of the form 'model.<backend_name>'.
I0603 09:44:01.654376 41 model_lifecycle.cc:459] loading: test_model_pytorch:1
I0603 09:44:02.619246 41 libtorch.cc:1983] TRITONBACKEND_Initialize: pytorch
I0603 09:44:02.619271 41 libtorch.cc:1993] Triton TRITONBACKEND API version: 1.10
I0603 09:44:02.619278 41 libtorch.cc:1999] 'pytorch' TRITONBACKEND API version: 1.10
I0603 09:44:02.619304 41 libtorch.cc:2032] TRITONBACKEND_ModelInitialize: test_model_pytorch (version 1)
W0603 09:44:02.619939 41 libtorch.cc:284] skipping model configuration auto-complete for 'test_model_pytorch': not supported for pytorch backend
I0603 09:44:02.620389 41 libtorch.cc:313] Optimized execution is enabled for model instance 'test_model_pytorch'
I0603 09:44:02.620404 41 libtorch.cc:332] Cache Cleaning is disabled for model instance 'test_model_pytorch'
I0603 09:44:02.620411 41 libtorch.cc:349] Inference Mode is disabled for model instance 'test_model_pytorch'
I0603 09:44:02.620418 41 libtorch.cc:444] NvFuser is not specified for model instance 'test_model_pytorch'
I0603 09:44:02.620474 41 libtorch.cc:2076] TRITONBACKEND_ModelInstanceInitialize: test_model_pytorch (CPU device 0)
I0603 09:44:02.665851 41 model_lifecycle.cc:693] successfully loaded 'test_model_pytorch' version 1

So why is it that for the models I try to register no loading process is started?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NuttyCamel41
				
					0
					 × 1

Hi @<1526371965655322624:profile|NuttyCamel41>
so sorry I just realized I have not answered it it!

I just tried the pytorch example from the clearml-serving repo and got the error about the wrong model name

okay that is odd, are you using the exact same containers / docker-compose? what is the difference ?

I0603 09:44:02.665851 41 model_lifecycle.cc:693] successfully loaded 'test_model_pytorch' version 1

does that mean that even though there is a warning there you can curl to the end point and it would work?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Ohh! I see now
@<1526371965655322624:profile|NuttyCamel41> the "backend: "pytorch" is not really supported because it does not use the optimized Triron engine (which is the reason to run Triron server)
In order to use pytorch you need to convert it to torchscript and then deploy, see example here:
None
None

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi @<1523701205467926528:profile|AgitatedDove14> , exactly!
I just tried the pytorch example from the clearml-serving repo and got the error about the wrong model name Poll failed for model directory 'test_model_pytorch': Invalid model name: Could not determine backend for model 'test_model_pytorch' with no backend in model configuration. Expected model name of the form 'model.<backend_name>'.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NuttyCamel41
				
					0
					 × 1

Hi @<1523701205467926528:profile|AgitatedDove14> , thanks for coming back to my issue. Unfortunately I have a lot of other stuff on my desk right now so I have to postpone finishing this issue. I will reach out to you again as soon as possible (especially if I was able to find a solution).

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NuttyCamel41
				
					0
					 × 1

Hi @<1526371965655322624:profile|NuttyCamel41>
How are you creating the model? specifically what do you have in "config.pbtxt"
specifically any python code should be in the pre/post processing code (actually not running on the GPU instance)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi @<1523701205467926528:profile|AgitatedDove14> you are right for the docker setup. But with the k8s setup I get the error Poll failed for model directory 'advanced_basic_classifier.pytorch': unexpected 'platform' and 'backend' pair, got:, pytorch when I do not specify the platform, which sounds like I should specify the platform.

Btw if I do not name the model after the 'model.<backend_name>' convention then I get this error
Poll failed for model directory 'advanced_basic_classifier': Invalid model name: Could not determine backend for model 'advanced_basic_classifier' with no backend in model configuration. Expected model name of the form 'model.<backend_name>'.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NuttyCamel41
				
					0
					 × 1

I'm assuming those errors are from the triton containers? where you able to run the simple pytorch mnist example serving from the repo?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

My pre- and postprocessing code should be correct, because it already worked when I used the docker container clearml-serving setup. But in case you want to have a look, here it is:

  				
Posted 
	one year ago

					More
				  		
  Report
		
					NuttyCamel41
				
					0
					 × 1

Write your answer

1K Views

13 Answers

one year ago