What would be the best way to add another model from another project say C to the same triton server serving the previous model?
You can add multiple call to
cleaml-serving , each one with a new endpoint and a new project/model to watch, then when you launch it it will setup all endpoints on a single Triton server (the model optimization loading is taken care by Triton anyhow)
Suppose that a new model version 2 is trained, but it does not fulfill our target metrics, is it possible to just save the model to model repo and not serve it, if a model version 1 is already being served?
Sure, just do not "publish" the model, it will be stored in the model repository, fully accessible but the clearml-serving will not serve it 🙂
What would be the best way to get all the models trained using a certain Task, I know we can use query_models to filter models based on Project and Task, but is it the best way?
On the Task object itself you have all the models.
- Suppose that the serving project A is serving some model version 1 and a new model is trained and it starts serving model version 2, but on runtime due to some reason reason we need to revert to model version 1, what would be the best way to achieve the above?
If you archive the model, then the cleaml-session will pick the "latest" non-archived model, essentially reverting to the previous version. Also notice that it supports multiple versions on a single endpoint (again also a feature of Triton that it exposes and manages)