Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello, How Do You Manage To Unload A Model From Clearml-Serving Api? I Am Trying To Unload A Model Through Grpc Via

Hello, how do you manage to unload a model from clearml-serving API?
I am trying to unload a model through grpc via clearml-serving because the models are loaded when I send a request to the endpoint but never unload (the gpu memory keep increasing when I infer with a new model). clearml-serving does not seem to have solution to unload the models. So I was wondering how I should adapt the cleaml-serving-triton to unload the model. Or should I adapt TritonPreprocessRequest to be able to unload models on specific request?

  
  
Posted one year ago
Votes Newest

Answers 3


I would like to be able to send a request to unload the model (because I cannot load all the models in gpu, only 7-8) o

Hmm is this part of the gRPC interface of Triton? if it is, we should be able to add that quite easily,

  
  
Posted one year ago

Thank you for your answer, I added 100s models in the serving session, and when I send a post request it loads the willing model to perform an inference. I would like to be able to send a request to unload the model (because I cannot load all the models in gpu, only 7-8) or as @<1690896098534625280:profile|NarrowWoodpecker99> suggests add a timeout ? Or unload all the models if the gpu memory reach a limit ? Do you have a suggestion on how I could achieve that? Thanks!

  
  
Posted one year ago

Hi @<1683648242530652160:profile|ApprehensiveSeaturtle9>

I send a request to the endpoint but never unload (the gpu memory keep increasing when I infer with a new model).

They are not unloaded after the request is done. see discussion here: None
You can however remove the model from the serving session (but I do not think this is what you meant)
I'm assuming you want to run multiple models on a single GPU with not enough memory ?

  
  
Posted one year ago