1- Thanks, I need to experiment on that code. I guess I should use the AI model in the "process" function, correct?
2- Regarding the RestAPI authentication, I have a multi-tenant application, so I need different auth keys for each endpoint. Any ideas?
3- I admit I don't know the best practices for authorizing users with serving engines. If I am creating my RestAPI, that is easy and flexibel. But how people solve this for TorcheServe, Triton etc? Using load balancer auth as you suggested?
Thanks, these are good tips. I will think on that. Other approach is creating my own API on the top of clearml-serving endpoints and there I control each tenant authentication.
These are maybe good features to include in ClearML: https://docs.bentoml.org/en/latest/guides/securing_endpoints.html or https://docs.bentoml.org/en/latest/guides/server.html .
Thank you for your attention and the short discussion!
yes, every tenant has their own serving endpoint
AgitatedDove14 The pipeline example is a bit unclear, I would like to know how to serve a model, even if I do not use any serving engine. Can you please formulate a more complete example on that? Besides that, how can I manage authorization between multiple endpoints? How can I manage API keys to control who can access my endpoints?
Many thanks!
Hello AgitatedDove14 , i will summarize the responses:
1- Exactly, custom code
2 and 3 - I want to manage access control over the RestAPI
AgitatedDove14 is it possible that every endpoint controls its own JWT token in clearml-serving? In my multi-tenant application, the tenants can only access their own endpoints, that is why I need to authenticate them before using it.
Regarding the load-balancer authorizing users with JWT, do you have links/resources where I can take a deeper look? Thanks
Agreed! I was trying to avoid this, because I wanted that each tenant acess directly the serving endpoint, to maximize performance. But I guess I will loose just a few ms separating auth layer and execution layer.
Besides that, what are your impressions on these serving engines? Are they much better than just creating my own API + ONNX or even my own API + normal Pytorch inference?
For example, if I decide to use clearml-serving --engine custom
, what would be the drawbacks?