Hi Everyone, I'M Using Clearml-Serving With Triton And Have A Couple Of Questions Regarding Model Management:

Unanswered

Hi @<1690896098534625280:profile|NarrowWoodpecker99>

Once a model is loaded into GPU memory for the first time, does it stay loaded across subsequent requests,

yes it does.

Are there configuration options available that allow us to control this behavior?

I'm assuming your're thinking dynamic loading/unloading models from memory based on requests?
I wish Triton added that 🙂 this is not trivial and in reality to be fast enough the model has to leave in RAM then moved to GPU (which actually takes a while)

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

195 Views

0 Answers

one year ago