Reputation
Badges 1
4 × Eureka!Thank you for your answer, I added 100s models in the serving session, and when I send a post request it loads the willing model to perform an inference. I would like to be able to send a request to unload the model (because I cannot load all the models in gpu, only 7-8) or as @<1690896098534625280:profile|NarrowWoodpecker99> suggests add a timeout ? Or unload all the models if the gpu memory reach a limit ? Do you have a suggestion on how I could achieve that? Thanks!
Thank you for your answer, for the moment I am doing request_processor.add_endpoint(...) to create an endpoint to be used with triton engine. I do not need to do add_model_monitoring ? what is the advantage of adding a model monitoring instead of an endpoint?
Hi thank you for your answer, this command call the method func_model_remove which remove the endpoint , model_monitoring and canary_endpoint but it does not remove the OutputModel and the py_code_mymodel.py (preprocessing) from the serving service
It does but not the OutputModel and the preprocess artifact. I managed to do it by adding:
if _task.artifacts.get(model_endpoint.preprocess_artifact):
_task.delete_artifacts([model_endpoint.preprocess_artifact])
Model.remove(model_endpoint.model_id)
Maybe this should be add to the func_model_remove method?