Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
EmbarrassedWalrus44
Moderator
0 Questions, 4 Answers
  Active since 16 June 2024
  Last activity 6 months ago

Reputation

0
0 Hi Everyone, I'M Using Clearml-Serving With Triton And Have A Couple Of Questions Regarding Model Management:

Hi Martin . Thanks for the answer . Ah so the delay in unloading cause a timeout . That speed depends on model sizes, right?

As a workaround, how about more
simple approach of unloading of the least used models after X minutes of sitting unused - enough to free up memory for any model to load? Hope that makes sense . This would not work under heavy loads, but eg we have models used once a week only . They would just stay unloaded until use - and could be offloaded afterwards .

6 months ago
0 Hi Everyone, I'M Using Clearml-Serving With Triton And Have A Couple Of Questions Regarding Model Management:

Thanks for asking about this - I have the exact same issue. Could the Triton model management API be used to load/unload the models?
https://github.com/triton-inference-server/server/issues/5345

6 months ago
0 Hi Everyone, I'M Using Clearml-Serving With Triton And Have A Couple Of Questions Regarding Model Management:

Unless you set a very long time out . Usually all models load in less than 1 min, smaller ones much faster . Would not work for huge llm style models .

6 months ago
0 Hi Everyone, I'M Using Clearml-Serving With Triton And Have A Couple Of Questions Regarding Model Management:

The models that fit into around 8-24Gb mem are quite common, at least here . If they are used rarely, and you have a lot, that is a lot of wasted gpu ressources . They can take about 10-40 secs to load . Hot swapping would be ideal, but as a fallback, unloading least used models to keep enough VMEM free to load any model on request . Tricky issue!

6 months ago