Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Unanswered
For Clearml Serving, If I Am Trying To Deploy 100 Models On A Gpu That Can Handle 5 Concurrently, But Each One Will Be Sporadically Used (Fine Tuned Models Trained For Different Customers), Can Clearml-Serving Automatically Load And Unload Models Based Up


I checked Triton and found these references:

  • None
  • NoneIt appears that "they sell that" as Triton Management Service, part of None . It is possible to do through their API, but would need to be explicit. Moreover, there are likely a few different algorithms that could be used to maximize usage and minimize downtime. It would be nice to have at least a simple algorithm baked into ClearML for serving models at a smallish scale, such as:
  • Assume:- All models are of the same size when loaded
  • The max number of instances of an individual model is 1- Config:- Number of seconds to assess usage over (rule of thumb -> 5x model loading time?)
  • Auto-unload model if not being used for x minutes (default 5?)
  • Number of models that need to be unloaded before x minutes required to adding new auto-scaled instance (default 5?)- Load in the model with the largest number of elements in it's queue - and only pull in one at a time
  • If not enough space, unload the model with the oldest "last inference" time if it is over n (60?) seconds ago
  • Else, unload the model that has an empty queue and also has the least number of incoming requests over the past n (60?) seconds
  • If the frequency of unloading models is greater than the threshold, add another auto-scaled instance
  • If the loaded models can fit on fewer instances than are currently scaled, gracefully consolidate
  
  
Posted one year ago
127 Views
0 Answers
one year ago
one year ago