Hi Everyone, I Wanted To Inquire If It'S Possible To Have Some Type Of Model Unloading. I Know There Was A Discussion Here About It, But After Reviewing It, I Didn'T Find An Answer. So, I Am Curious: Is It Possible To Explicitly Unload A Model (By Calling

Unanswered

Suppose that I have three models and these models can't be loaded simultaneously on GPU memory(

Oh!!!

For now, this is the behavior I observe: Suppose I have two models, A and B. ....

Correct

Yes this is a current limitation of the Triton backend BUT!
we are working on a new version that does Exactly what you mentioned (because it is such a common case where in some cases models are not being used that frequently)
The main caveat is the loading time, re-loading models from dist takes way too much time at the moment (meaning you might get a timeout on the request), and we are trying to accelerate the process (for example cache model in RAM instead of GPU memory). But we made good progress and I'm sure the next version will be able to address that

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

239 Views

0 Answers

one year ago