Hello! Question About

Answered

hello! question about clearml-serving :

Trying to do model inference on a video, so first step in Preprocess class is to extract frames. However, once this is done, we have ~700 frames, and the max batch_size we can set is ~8-16. How can we deal with this?

  				
Posted 
	one year ago

					More  		
  Report
		
					TimelyRabbit96
				
					0
					 × 1

Votes Newest

Answers 23

Notice this is per frame (single) not per 8

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi TimelyRabbit96

Trying to do model inference on a video, so first step in

Preprocess

class is to extract frames.

Basically this depends on the RestAPI, usually would will be sending a link to data to be processed and returned Synchronously
What you should have a custom endpoint doing the extraction, send Raw data into another endpoint doing the model inference, basically think "pipeline" end points:
None

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I see, very interesting. I know this is a psedo-code, but are you suggesting sending the requests to Triton frame-by-frame?

Or perhaps np_frame = np.array(frame) itself could be a slice of the total_frames ?

Like:

Dataset: [700, x, y, 3]
Batch: [8, x, y, 3]

I think that makes sense, and in the end deploy this endpoint like the pipeline example.

  				
Posted 
	one year ago

					More  		
  Report
		
					TimelyRabbit96
				
					0
					 × 1

So actually while we’re at it, we also need to return back a string from the model, which would be where the results are uploaded to (S3).

I was able to send back a URL with Triton directly, but the input/output shape mapping doesn’t seem to support strings in Clearml. I have opened an issue for it: None

Am i missing something?

  				
Posted 
	one year ago

					More  		
  Report
		
					TimelyRabbit96
				
					0
					 × 1

Or we need to setup the dependencies every time the experiment is run?

  				
Posted 
	one year ago

					More  		
  Report
		
					HealthyDove84
				
					0

HealthyDove84 if you want you can PR a fix, it should be very simple basically:
None

        elif np_dtype == str:
            return "STRING"
        elif np_dtype == np.object_ or np_dtype.type == np.bytes_:
            return "BYTES"
        return None

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

can we use a currently setup virtualenv by any chance?

You mean, if the cleamrl-agent needs to setup a new venv each time? are you running in docker mode ?
(by default it is caching the venv so the second time it is using a precached full venv, installing nothing)

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

notice that even inside docker the venv is cached on the host machine 🙂

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Gotcha, thanks a lot AgitatedDove14 . One issue that I see is that the Dockerfile inside the agent container is what's being used and doesn't seem like it can be replaced by any of these:

      CLEARML_AGENT_DEFAULT_BASE_DOCKER: "nvidia/cuda:11.6.1-runtime-ubuntu20.04"
      TRAINS_AGENT_DEFAULT_BASE_DOCKER: "nvidia/cuda:11.6.1-runtime-ubuntu20.04"
      TRAINS_DOCKER_IMAGE: "nvidia/cuda:11.6.1-runtime-ubuntu20.04"

Are we missing something?

  				
Posted 
	one year ago

					More  		
  Report
		
					HealthyDove84
				
					0

no mention of STRING type ...

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

So actually while we’re at it, we also need to return back a string from the model, which would be where the results are uploaded to (S3).

Is this being returned from your Triton Model? or the pre/post processing code?

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

and of course if your docker has packages preinstalled they are automatically used (not reinstalled)

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

, but are you suggesting sending the requests to Triton frame-by-frame?

yes! trition backend will do the autobatching, and in an enterprise deployment the gRPC loadbalancer will split it across multiple GPU nodes 🙂

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Great, will try that, thanks AgitatedDove14 !

  				
Posted 
	one year ago

					More  		
  Report
		
					HealthyDove84
				
					0

Nevermind, figured it out, it was using a cached container for some reason 🙂

  				
Posted 
	one year ago

					More  		
  Report
		
					HealthyDove84
				
					0

Thanks for your reponse AgitatedDove14 , this would be from the model. Something like the TYPE_STRING that Triton accepts.

  				
Posted 
	one year ago

					More  		
  Report
		
					HealthyDove84
				
					0

Hi AgitatedDove14 , thanks for the always-fast response! 🙂

Yep so I am sending a link to a S3 bucket, and setup Triton ensemble within clearml-serving.

This is the gist of what i’m doing:

None

so essentially i am sending raw data, but i can only send the first 8 frames (L45) since i can’t really send the data in a list or something?

  				
Posted 
	one year ago

					More  		
  Report
		
					TimelyRabbit96
				
					0
					 × 1

Perfect, thank you so much!! 🙏

HealthyDove84 This is how we’d tackle the video-to-frame ratio issue

  				
Posted 
	one year ago

					More  		
  Report
		
					TimelyRabbit96
				
					0
					 × 1

I see, actually what you should do is a fully custom endpoint,

preprocessing -> doenload video
processing -> extract frames and send them to Triton with gRPC (see below how)
post processing, return a human readable answer
Regrading the processing itself, what you need is to take this function (copy paste):
None
have it as internal _process(numpy_frame) and then have something along the lines of this pseudo code

def process(...):
  results_batch = []
  for frame in my_video_frame_extractor(file_name_here)
    np_frame = np.array(frame)
    result = self.executor.submit(self._process, data=np_frame)
    results_batch += [result]
    
    if len(results_batch) == BATCH_SIZE:
        # collect all the results back
        # and clear the batch
        results_batch = []

This will scale horizontally the GPU pods, as well as autobatch the inference 🙂

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

AgitatedDove14 Regarding the clearml-agent, can we use a currently setup virtualenv by any chance?

  				
Posted 
	one year ago

					More  		
  Report
		
					HealthyDove84
				
					0

I see, trying to A/B test the virtualenv vs docker.

  				
Posted 
	one year ago

					More  		
  Report
		
					HealthyDove84
				
					0

Something like the TYPE_STRING that Triton accepts.

I saw the github issue, this is so odd , look at the triton python package:
https://github.com/triton-inference-server/client/blob/4297c6f5131d540b032cb280f1e[…]1fe2a0744f8e1/src/python/library/tritonclient/utils/init.py

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

One issue that I see is that the Dockerfile inside the agent container

Not sure I follow, these are settings for the default container to be used when the agent spins a Task for you.
How are you running the agent itself ?

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

1K Views

23 Answers

one year ago