Is there any progress made on the clearml-serving repo?
Hi JitteryCoyote63
yes, things are progressing slower than expected, I'm expecting actual work will be pushed in early Jan. On the bright side we are trying to work closely with TorchServing team and Nvidia Triton to expand capabilities.
Currently it seems the setup will be "proxy server container" for per-post processing, then serving engine container (Triton/Torch), with monitoring container as control plan (i.e. collecting stats and storing the model state, as is today).
The main hurdles are:
Deciding on abstract class for the proxy server (basically allowing users to write pre/post python code Connecting the proxy/serving machines (i.e. configuring external endpoints and making sure internally requested are routed). We are trying to think how this could easily be done, but currently the only solution we can think of is setting (connecting) a k8s cluster ...Feel free to help in both outstanding points, it will really accelerate the process.
What did you have in mind?
- Yes the challenge is mostly around defining the interface. Regarding packaging, I'm thinking a similar approach to the pipeline decorator, wdyt?
- Clearml agents will be running on k8s, but the main caveat is that I cannot think of a way to help with the deployment, at the end it will be kubectl that users will have to call in order to spin the containers with the agents, maybe a simple CLI to do that for you?
Hi AgitatedDove14 , that’s super exciting news! 🤩 🚀
Regarding the two outstanding points:
In my case, I’d maintain a client python package that takes care of the pre/post processing of each request, so that I only send the raw data to the inference service and I post process the raw output of the model returned by the inference service. But I understand why it might be desirable for the users to have these steps happening on the server. What is challenging in this context? Defining how the user should ship this code? or what the abstract class should look like? K8s sounds like the right choice, the tricky part being to abstract that away from the user. Maybe have a service task monitoring the cluster and scaling up when needed: the service task spins up (similar to aws autoscaler) new clearml-agents acting as k8s nodes and that connect to the master node?