Reputation
Badges 1
4 × Eureka!Thank you for your answer, I added 100s models in the serving session, and when I send a post request it loads the willing model to perform an inference. I would like to be able to send a request to unload the model (because I cannot load all the models in gpu, only 7-8) or as @<1690896098534625280:profile|NarrowWoodpecker99> suggests add a timeout ? Or unload all the models if the gpu memory reach a limit ? Do you have a suggestion on how I could achieve that? Thanks!
Thank you for your answer, for the moment I am doing request_processor.add_endpoint(...)
to create an endpoint to be used with triton engine. I do not need to do add_model_monitoring
? what is the advantage of adding a model monitoring instead of an endpoint?
It does but not the OutputModel and the preprocess artifact. I managed to do it by adding:
if _task.artifacts.get(model_endpoint.preprocess_artifact):
_task.delete_artifacts([model_endpoint.preprocess_artifact])
Model.remove(model_endpoint.model_id)
Maybe this should be add to the func_model_remove
method?
Hi thank you for your answer, this command call the method func_model_remove
which remove the endpoint
, model_monitoring
and canary_endpoint
but it does not remove the OutputModel
and the py_code_mymodel.py
(preprocessing) from the serving service