Found the custom backend aspect of Triton - https://github.com/triton-inference-server/python_backend
Is that the right way?
Hi TrickySheep9 , can you provide more info on your specific use-case?
Hey SuccessfulKoala55 Like I mentioned, I have a spacy ner model that I need to serve for inference.
Ah, just saw from the example that even that is doing the config pbtxt stuff - https://github.com/allegroai/clearml-serving/blob/main/examples/keras/keras_mnist.py#L51
Hi TrickySheep9 , is this model register in your clearml app?
Yes. It's a pickle file that I have added via OutputModel
So for adding a model for serve with endpoint you can use
clearml-serving triton --endpoint "<your endpoint>" --model-project "<your project>" --model-name "<your model name>"
when the model is getting updated, is should use the new one
But you have to do config.pbtxt stuff right?
Here’s an example error I get trying it out on one of the example models:Error: Requested Model project=ClearML Examples name=autokeras imdb example with scalars tags=None not found. 'config.pbtxt' could not be inferred. please provide specific config.pbtxt definition.
'config.pbtxt' could not be inferred. please provide specific config.pbtxt definition.
This basically means there is no configuration on how to serve the mode, i.e. size/type of lower (input) layer and output layer.
You can wither store the configuration on the creating Task, like is done here:
https://github.com/allegroai/clearml-serving/blob/b5f5d72046f878bd09505606ca1147d93a5df069/examples/keras/keras_mnist.py#L51
Or you can provide it as standalone file when registering the model with clearml-serving, an example for config.pbtxt :
https://github.com/triton-inference-server/server/blob/main/qa/python_models/identity_fp32/config.pbtxt
Just to confirm AgitatedDove14 - clearml doesn’t do any “magic” in regard to this for tensorflow, pytorch etc right?
And other question is clearml-serving ready for serious use?
clearml doesn’t do any “magic” in regard to this for tensorflow, pytorch etc right?
No 😞 and if you have an idea on how, that will be great.
Basically the problem is that there is no "standard" way to know which layer is in/out
And other question is clearml-serving ready for serious use?
Define serious use? KFserving support is in the pipeline, if that helps.
Notice that clearml-serving is basically a control plane for the serving engine, not to neglect the importance of it, the heavy lifting is done by Triton 🙂 (or any other backend we will integrate with, maybe Seldon)