Unanswered
Hi Everyone,
I'M Using Clearml-Serving With Triton And Have A Couple Of Questions Regarding Model Management:
Hi Martin . Thanks for the answer . Ah so the delay in unloading cause a timeout . That speed depends on model sizes, right?
As a workaround, how about more
simple approach of unloading of the least used models after X minutes of sitting unused - enough to free up memory for any model to load? Hope that makes sense . This would not work under heavy loads, but eg we have models used once a week only . They would just stay unloaded until use - and could be offloaded afterwards .
78 Views
0
Answers
6 months ago
6 months ago