I'm assuming those errors are from the triton containers? where you able to run the simple pytorch mnist example serving from the repo?
Hi @<1523701205467926528:profile|AgitatedDove14> thanks for your hint! I already convert it to torch script using tracing. Everything around the model should be fine, since it already worked with the docker clearml-serving setup.
I think the real issue is that I am not able to specify a platform for the model, as the error above tells me that no platform is given no matter how I try to pass it.
Ohh! I see now
@<1526371965655322624:profile|NuttyCamel41> the "backend: "pytorch" is not really supported because it does not use the optimized Triron engine (which is the reason to run Triron server)
In order to use pytorch you need to convert it to torchscript and then deploy, see example here:
None
None
What do you mean by "How are you creating the model?"? I executed a pytorch model training saved a traced version of the model so that saved with the executed task. This was also no problem with the docker container setup.
Hi @<1523701205467926528:profile|AgitatedDove14> you are right for the docker setup. But with the k8s setup I get the error Poll failed for model directory 'advanced_basic_classifier.pytorch': unexpected 'platform' and 'backend' pair, got:, pytorch
when I do not specify the platform, which sounds like I should specify the platform.
Btw if I do not name the model after the 'model.<backend_name>' convention then I get this errorPoll failed for model directory 'advanced_basic_classifier': Invalid model name: Could not determine backend for model 'advanced_basic_classifier' with no backend in model configuration. Expected model name of the form 'model.<backend_name>'.
I think the real issue is that I am not able to specify a platform for the model,
None
there is no need to specify it, remove it from the config.pbtxt - the clearml-serving will automatically add the background
Hi @<1526371965655322624:profile|NuttyCamel41>
so sorry I just realized I have not answered it it!
I just tried the pytorch example from the clearml-serving repo and got the error about the wrong model name
okay that is odd, are you using the exact same containers / docker-compose? what is the difference ?
I0603 09:44:02.665851 41 model_lifecycle.cc:693] successfully loaded 'test_model_pytorch' version 1
does that mean that even though there is a warning there you can curl to the end point and it would work?
Hi @<1523701205467926528:profile|AgitatedDove14> , now there are some interesting things happening: Like I wrote before I got the error message but one minute later the model was added successfully nonetheless. The log says
E0603 09:43:01.652550 41 model_repository_manager.cc:996] Poll failed for model directory 'test_model_pytorch': Invalid model name: Could not determine backend for model 'test_model_pytorch' with no backend in model configuration. Expected model name of the form 'model.<backend_name>'.
I0603 09:44:01.654376 41 model_lifecycle.cc:459] loading: test_model_pytorch:1
I0603 09:44:02.619246 41 libtorch.cc:1983] TRITONBACKEND_Initialize: pytorch
I0603 09:44:02.619271 41 libtorch.cc:1993] Triton TRITONBACKEND API version: 1.10
I0603 09:44:02.619278 41 libtorch.cc:1999] 'pytorch' TRITONBACKEND API version: 1.10
I0603 09:44:02.619304 41 libtorch.cc:2032] TRITONBACKEND_ModelInitialize: test_model_pytorch (version 1)
W0603 09:44:02.619939 41 libtorch.cc:284] skipping model configuration auto-complete for 'test_model_pytorch': not supported for pytorch backend
I0603 09:44:02.620389 41 libtorch.cc:313] Optimized execution is enabled for model instance 'test_model_pytorch'
I0603 09:44:02.620404 41 libtorch.cc:332] Cache Cleaning is disabled for model instance 'test_model_pytorch'
I0603 09:44:02.620411 41 libtorch.cc:349] Inference Mode is disabled for model instance 'test_model_pytorch'
I0603 09:44:02.620418 41 libtorch.cc:444] NvFuser is not specified for model instance 'test_model_pytorch'
I0603 09:44:02.620474 41 libtorch.cc:2076] TRITONBACKEND_ModelInstanceInitialize: test_model_pytorch (CPU device 0)
I0603 09:44:02.665851 41 model_lifecycle.cc:693] successfully loaded 'test_model_pytorch' version 1
So why is it that for the models I try to register no loading process is started?
Hi @<1523701205467926528:profile|AgitatedDove14> , exactly!
I just tried the pytorch example from the clearml-serving repo and got the error about the wrong model name Poll failed for model directory 'test_model_pytorch': Invalid model name: Could not determine backend for model 'test_model_pytorch' with no backend in model configuration. Expected model name of the form 'model.<backend_name>'.
My pre- and postprocessing code should be correct, because it already worked when I used the docker container clearml-serving setup. But in case you want to have a look, here it is:
Hi @<1523701205467926528:profile|AgitatedDove14> the config.pbtxt for 1. looks like this: (because I do not specify input and output type and size within the command)
backend: "pytorch"
platform: "pytorch_libtorch"
input [
{
name: "INPUT__0"
data_type: TYPE_FP32
dims: [1, 64]
}
]
output [
{
name: "OUTPUT__0"
data_type: TYPE_FP32
dims: [1, 11]
}
]
while the config.ptxt for 2. looks like this: (because everything else is already specified in the command)
backend: "pytorch"
platform: "pytorch_libtorch"
Hi @<1523701205467926528:profile|AgitatedDove14> , thanks for coming back to my issue. Unfortunately I have a lot of other stuff on my desk right now so I have to postpone finishing this issue. I will reach out to you again as soon as possible (especially if I was able to find a solution).
Hi @<1526371965655322624:profile|NuttyCamel41>
How are you creating the model? specifically what do you have in "config.pbtxt"
specifically any python code should be in the pre/post processing code (actually not running on the GPU instance)