Hi Everyone, I'M Using Clearml-Serving With Triton And Have A Couple Of Questions Regarding Model Management:

Unanswered

The models that fit into around 8-24Gb mem are quite common, at least here . If they are used rarely, and you have a lot, that is a lot of wasted gpu ressources . They can take about 10-40 secs to load . Hot swapping would be ideal, but as a fallback, unloading least used models to keep enough VMEM free to load any model on request . Tricky issue!

  				
Posted 
	one year ago

					More
				  		
  Report
		
					EmbarrassedWalrus44
				
					0

193 Views

0 Answers

one year ago