For Clearml Serving, If I Am Trying To Deploy 100 Models On A Gpu That Can Handle 5 Concurrently, But Each One Will Be Sporadically Used (Fine Tuned Models Trained For Different Customers), Can Clearml-Serving Automatically Load And Unload Models Based Up

Unanswered

Let's see if I understand:

Triton server deployments only have manual, static deployment of models for inferencing (without enterprise)
ClearML can load and unload models based upon usage, but has to do so from the hard drive
Triton server does not support saving models off to normal RAM for faster loading/unloading
Therefore, currently, we can deploy 100 models when only 5 can be concurrently loaded, but when they are unloaded/loaded (automatically by ClearML), it will take a few seconds because it is being read from the the SSD, depending on the size.
If this is the case, that should be acceptable for our application.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					StrangePelican34
				
					0
					 × 1

301 Views

0 Answers

2 years ago