Hi AgitatedDove14 , that’s super exciting news! 🤩 🚀
Regarding the two outstanding points:
In my case, I’d maintain a client python package that takes care of the pre/post processing of each request, so that I only send the raw data to the inference service and I post process the raw output of the model returned by the inference service. But I understand why it might be desirable for the users to have these steps happening on the server. What is challenging in this context? Defining how the user should ship this code? or what the abstract class should look like? K8s sounds like the right choice, the tricky part being to abstract that away from the user. Maybe have a service task monitoring the cluster and scaling up when needed: the service task spins up (similar to aws autoscaler) new clearml-agents acting as k8s nodes and that connect to the master node?