Hey Clearml Community. A While Back I Was Asking How One Can Perform Inference On A Video With Clearml-Serving, Which Includes An Ensemble, Preprocessing, And Postprocessing. Back Then

Answered

Hey ClearML community. A while back I was asking how one can perform inference on a video with clearml-serving, which includes an ensemble, preprocessing, and postprocessing.

Back then AgitatedDove14 suggested that we override the process() function as well, and set it up so each frame is asynchronously sent to the model, by copy pasting the original process() function here , and calling it _process() and sending each frame individually, and eventually await for it for every batch_size.

However, we’ve came across some serious performance issues compared to setting this up on vanila Triton.

I’m not entirely sure why, but the gRPC client setup that I’ve seen from the examples is different that the one used in ClearML-serving. For instance, each frame (image) takes ~2 seconds just to flatten() ( link ).

Overall, inference on clearML takes ~ 16seconds for a single batch (size=8) on ClearML using the above approach, and only like 0.2s on Triton . GPU usage is also substantially less and infrequent on the clearML side.

We’d like to continue and even improve this community , I just wanted to bring this up and brainstorm, and get any insights one might have. Thanks!

  				
Posted 
	one year ago

					More  		
  Report
		
					TimelyRabbit96
				
					0
					 × 1

Votes Newest

Answers 2

This is basically what I follow for setting up my own Triton server:

None

  				
Posted 
	one year ago

					More  		
  Report
		
					TimelyRabbit96
				
					0
					 × 1

This is the gist of our current setup using the recommended approach

  				
Posted 
	one year ago

					More  		
  Report
		
					TimelyRabbit96
				
					0
					 × 1

Write your answer

1K Views

2 Answers

one year ago