Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi There! Can Anybody Help Me With Specifying The 'Platform' For A Model In Clearml-Serving. I Am Using The K8S Clearml-Serving Setup (Version 1.3.1). I Already Tried A Bunch Of Variants Like

Hi there! Can anybody help me with specifying the 'platform' for a model in clearml-serving. I am using the k8s clearml-serving setup (version 1.3.1).

I already tried a bunch of variants like
clearml-serving --id 9902b88513644fa0bb89eba35f1f4d99 model add --engine triton --endpoint "advanced_basic_classifier.pytorch" --preprocess "src/preprocessing/preprocess.py" --model-id 837276fc8d8a443fb91f48d722300b0a ...

  • ... --aux-config ".\config.pbtxt" or
  • ... --input-size 1 64 --input-name "INPUT__0" --input-type float32 --output-size 1 11 --output-name "OUTPUT__0" --output-type float32 --aux-config ".\config.pbtxt" or
  • ... --input-size 1 64 --input-name "INPUT__0" --input-type float32 --output-size 1 11 --output-name "OUTPUT__0" --output-type float32 --aux-config platform="pytorch_libtorch"
  • ... --input-size 1 64 --input-name "INPUT__0" --input-type float32 --output-size 1 11 --output-name "OUTPUT__0" --output-type float32 --aux-config platform=\\"pytorch_libtorch\\"
    But I always get the error
    Poll failed for model directory 'advanced_basic_classifier.pytorch': unexpected 'platform' and 'backend' pair, got:, pytorch
    Which is probably because the 'platform' is always put within a 'auxiliary_cfg' entry within the endpoint configuration and not as the top level, how it should.
  
  
Posted 6 months ago
Votes Newest

Answers 13


Hi @<1523701205467926528:profile|AgitatedDove14> , now there are some interesting things happening: Like I wrote before I got the error message but one minute later the model was added successfully nonetheless. The log says

E0603 09:43:01.652550 41 model_repository_manager.cc:996] Poll failed for model directory 'test_model_pytorch': Invalid model name: Could not determine backend for model 'test_model_pytorch' with no backend in model configuration. Expected model name of the form 'model.<backend_name>'.
I0603 09:44:01.654376 41 model_lifecycle.cc:459] loading: test_model_pytorch:1
I0603 09:44:02.619246 41 libtorch.cc:1983] TRITONBACKEND_Initialize: pytorch
I0603 09:44:02.619271 41 libtorch.cc:1993] Triton TRITONBACKEND API version: 1.10
I0603 09:44:02.619278 41 libtorch.cc:1999] 'pytorch' TRITONBACKEND API version: 1.10
I0603 09:44:02.619304 41 libtorch.cc:2032] TRITONBACKEND_ModelInitialize: test_model_pytorch (version 1)
W0603 09:44:02.619939 41 libtorch.cc:284] skipping model configuration auto-complete for 'test_model_pytorch': not supported for pytorch backend
I0603 09:44:02.620389 41 libtorch.cc:313] Optimized execution is enabled for model instance 'test_model_pytorch'
I0603 09:44:02.620404 41 libtorch.cc:332] Cache Cleaning is disabled for model instance 'test_model_pytorch'
I0603 09:44:02.620411 41 libtorch.cc:349] Inference Mode is disabled for model instance 'test_model_pytorch'
I0603 09:44:02.620418 41 libtorch.cc:444] NvFuser is not specified for model instance 'test_model_pytorch'
I0603 09:44:02.620474 41 libtorch.cc:2076] TRITONBACKEND_ModelInstanceInitialize: test_model_pytorch (CPU device 0)
I0603 09:44:02.665851 41 model_lifecycle.cc:693] successfully loaded 'test_model_pytorch' version 1

So why is it that for the models I try to register no loading process is started?

  
  
Posted 6 months ago

Hi @<1523701205467926528:profile|AgitatedDove14> , exactly!
I just tried the pytorch example from the clearml-serving repo and got the error about the wrong model name Poll failed for model directory 'test_model_pytorch': Invalid model name: Could not determine backend for model 'test_model_pytorch' with no backend in model configuration. Expected model name of the form 'model.<backend_name>'.

  
  
Posted 6 months ago

Hi @<1526371965655322624:profile|NuttyCamel41>
How are you creating the model? specifically what do you have in "config.pbtxt"
specifically any python code should be in the pre/post processing code (actually not running on the GPU instance)

  
  
Posted 6 months ago

I'm assuming those errors are from the triton containers? where you able to run the simple pytorch mnist example serving from the repo?

  
  
Posted 6 months ago

Hi @<1523701205467926528:profile|AgitatedDove14> you are right for the docker setup. But with the k8s setup I get the error Poll failed for model directory 'advanced_basic_classifier.pytorch': unexpected 'platform' and 'backend' pair, got:, pytorch when I do not specify the platform, which sounds like I should specify the platform.

Btw if I do not name the model after the 'model.<backend_name>' convention then I get this error
Poll failed for model directory 'advanced_basic_classifier': Invalid model name: Could not determine backend for model 'advanced_basic_classifier' with no backend in model configuration. Expected model name of the form 'model.<backend_name>'.

  
  
Posted 6 months ago

Hi @<1523701205467926528:profile|AgitatedDove14> the config.pbtxt for 1. looks like this: (because I do not specify input and output type and size within the command)

  backend: "pytorch"
  platform: "pytorch_libtorch"
  input [
    {
      name: "INPUT__0"
      data_type: TYPE_FP32
      dims: [1, 64]
    }
  ]
  output [
    {
      name: "OUTPUT__0"
      data_type: TYPE_FP32
      dims: [1, 11]
    }
  ]

while the config.ptxt for 2. looks like this: (because everything else is already specified in the command)

  backend: "pytorch"
  platform: "pytorch_libtorch"
  
  
Posted 6 months ago

What do you mean by "How are you creating the model?"? I executed a pytorch model training saved a traced version of the model so that saved with the executed task. This was also no problem with the docker container setup.

  
  
Posted 6 months ago

Hi @<1523701205467926528:profile|AgitatedDove14> , thanks for coming back to my issue. Unfortunately I have a lot of other stuff on my desk right now so I have to postpone finishing this issue. I will reach out to you again as soon as possible (especially if I was able to find a solution).

  
  
Posted 6 months ago

Hi @<1523701205467926528:profile|AgitatedDove14> thanks for your hint! I already convert it to torch script using tracing. Everything around the model should be fine, since it already worked with the docker clearml-serving setup.
I think the real issue is that I am not able to specify a platform for the model, as the error above tells me that no platform is given no matter how I try to pass it.

  
  
Posted 6 months ago

Hi @<1526371965655322624:profile|NuttyCamel41>
so sorry I just realized I have not answered it it!

I just tried the pytorch example from the clearml-serving repo and got the error about the wrong model name

okay that is odd, are you using the exact same containers / docker-compose? what is the difference ?

I0603 09:44:02.665851 41 model_lifecycle.cc:693] successfully loaded 'test_model_pytorch' version 1

does that mean that even though there is a warning there you can curl to the end point and it would work?

  
  
Posted 6 months ago

I think the real issue is that I am not able to specify a platform for the model,

None
there is no need to specify it, remove it from the config.pbtxt - the clearml-serving will automatically add the background

  
  
Posted 6 months ago

My pre- and postprocessing code should be correct, because it already worked when I used the docker container clearml-serving setup. But in case you want to have a look, here it is:

  
  
Posted 6 months ago

Ohh! I see now
@<1526371965655322624:profile|NuttyCamel41> the "backend: "pytorch" is not really supported because it does not use the optimized Triron engine (which is the reason to run Triron server)
In order to use pytorch you need to convert it to torchscript and then deploy, see example here:
None
None

  
  
Posted 6 months ago
614 Views
13 Answers
6 months ago
6 months ago
Tags