Sorry IntriguedGoldfish14 just noticed your reply
Yes two inference container, running simultaneously on the cluster. As you said, each one with its own environment (assuming here that the requirements of the models collide)
Make sense
I see, but to actually serve both models/sessions at the same time, it would require two inference containers, as each inference container can only serve one session at a time?
Hi IntriguedGoldfish14
Yes the way to do that is just use the custom engine example as you did, also correct on the env var to add catboost to the container
You can of course create your own custom container from the base one and pre install any required package, to speedup the container spin time
One of the design decisions was to support multiple models from a single container, that means that there needs to be one environment for all of them, the main issue is if some packages collide, but I think this is relatively rare, is this an issue for you?
Yes that is an issue for me, even if we could centralize an environment today, it leaves a concern whenever we add a model that possible package changes are going to cause issues with older models.
yeah changing the environment on the fly is tricky, it basically means spinning an internal http service per model...
Notice you can have many clearml-serving-sessions, they are not limited, so this means you can always spin new serving with new environments. The limitation is changing an environment on the fly
Would the recommendation be to spin up multiple inference containers?
kind of, yes spin multiple clearml-serving-session, essentially each session has it's own environment, and in that environment you can add / remove models on the fly
Hi,
Yes that is an issue for me, even if we could centralize an environment today, it leaves a concern whenever we add a model that possible package changes are going to cause issues with older models. Also it would be nice to have a more direct link between the saved model objects and their serving environment.
Would the recommendation be to spin up multiple inference containers? Also is there a built in way to separate the preprocessing and model inference into separate containers? Part of the package issue is on the preprocessing side.