Unanswered
Hi Everyone,
I'M Using Clearml-Serving With Triton And Have A Couple Of Questions Regarding Model Management:
Hi @<1713001673095385088:profile|EmbarrassedWalrus44>
So Triton has load/unload model, but these are slowwww, meaning you cannot use them inside a request (you'll just hit the request timeout every time it tries to load the model)
as you can see this is classified as "wish-list" , this is not trivial to implement and requires large CPU RAM to store the entire model, so "loading" becomes moving CPU to GPU memory (which also is not the fastest but the best you can do). As far as I understand there is no "target date" to this feature 😞
54 Views
0
Answers
4 months ago
4 months ago