Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello, How Do You Manage To Unload A Model From Clearml-Serving Api? I Am Trying To Unload A Model Through Grpc Via

Hello, how do you manage to unload a model from clearml-serving API?
I am trying to unload a model through grpc via clearml-serving because the models are loaded when I send a request to the endpoint but never unload (the gpu memory keep increasing when I infer with a new model). clearml-serving does not seem to have solution to unload the models. So I was wondering how I should adapt the cleaml-serving-triton to unload the model. Or should I adapt TritonPreprocessRequest to be able to unload models on specific request?

  
  
Posted 2 months ago
Votes Newest

Answers 3


Hi @<1683648242530652160:profile|ApprehensiveSeaturtle9>

I send a request to the endpoint but never unload (the gpu memory keep increasing when I infer with a new model).

They are not unloaded after the request is done. see discussion here: None
You can however remove the model from the serving session (but I do not think this is what you meant)
I'm assuming you want to run multiple models on a single GPU with not enough memory ?

  
  
Posted 2 months ago

I would like to be able to send a request to unload the model (because I cannot load all the models in gpu, only 7-8) o

Hmm is this part of the gRPC interface of Triton? if it is, we should be able to add that quite easily,

  
  
Posted 2 months ago

Thank you for your answer, I added 100s models in the serving session, and when I send a post request it loads the willing model to perform an inference. I would like to be able to send a request to unload the model (because I cannot load all the models in gpu, only 7-8) or as @<1690896098534625280:profile|NarrowWoodpecker99> suggests add a timeout ? Or unload all the models if the gpu memory reach a limit ? Do you have a suggestion on how I could achieve that? Thanks!

  
  
Posted 2 months ago