For Clearml Serving, If I Am Trying To Deploy 100 Models On A Gpu That Can Handle 5 Concurrently, But Each One Will Be Sporadically Used (Fine Tuned Models Trained For Different Customers), Can Clearml-Serving Automatically Load And Unload Models Based Up

Unanswered

It appears that "they sell that" as Triton Management Service, part of

. It is possible to do through their API, but would need to be explicit.

We support that, but this is Not dynamically loaded, this is just removing and adding models, this does not unload them from the GRAM.
That's the main issue. when we unload the model, it is unloaded, to do dynamic, they need to be able to save it in RAM and unload it from GRAM, that's the feature that is missing on all Triton deployments.
Does that make sense ?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

325 Views

0 Answers

2 years ago