Unanswered
Hi Everyone,
I'M Using Clearml-Serving With Triton And Have A Couple Of Questions Regarding Model Management:
The models that fit into around 8-24Gb mem are quite common, at least here . If they are used rarely, and you have a lot, that is a lot of wasted gpu ressources . They can take about 10-40 secs to load . Hot swapping would be ideal, but as a fallback, unloading least used models to keep enough VMEM free to load any model on request . Tricky issue!
61 Views
0
Answers
5 months ago
5 months ago