Unanswered
			
			
 
			
	
		
			
		
		
		
		
	
			
 				
					
		
		Hello Everyone! I'M Trying To Deploy Online Model With Clearml-Serving. For This Model, There Is A Need To Process Incoming Requests In A Queue Because The Model Inference Requires A Gpu And It Takes About One Minute To Serve One Request, While More Than
Hello everyone! I'm trying to deploy online model with clearml-serving. For this model, there is a need to process incoming requests in a queue because the model inference requires a GPU and it takes about one minute to serve one request, while more than 10 requests can come in simultaneously. How can I in model deployed with clearml-serving set up a limited queue? I couldn't find any information about queues usage for clearml-serving in the documentation.
2K Views
				0
Answers
				
					 
	one year ago
				
					
						 
	one year ago
					
					 Tags
					
			