Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello! Question About

hello! question about clearml-serving :

Trying to do model inference on a video, so first step in Preprocess class is to extract frames. However, once this is done, we have ~700 frames, and the max batch_size we can set is ~8-16. How can we deal with this?

  
  
Posted one year ago
Votes Newest

Answers 23


, but are you suggesting sending the requests to Triton frame-by-frame?

yes! trition backend will do the autobatching, and in an enterprise deployment the gRPC loadbalancer will split it across multiple GPU nodes 🙂

  
  
Posted one year ago

Hi @<1547028116780617728:profile|TimelyRabbit96>

Trying to do model inference on a video, so first step in

Preprocess

class is to extract frames.

Basically this depends on the RestAPI, usually would will be sending a link to data to be processed and returned Synchronously
What you should have a custom endpoint doing the extraction, send Raw data into another endpoint doing the model inference, basically think "pipeline" end points:
None

  
  
Posted one year ago

I see, actually what you should do is a fully custom endpoint,

  • preprocessing -> doenload video
  • processing -> extract frames and send them to Triton with gRPC (see below how)
  • post processing, return a human readable answer
    Regrading the processing itself, what you need is to take this function (copy paste):
    None
    have it as internal _process(numpy_frame) and then have something along the lines of this pseudo code
def process(...):
  results_batch = []
  for frame in my_video_frame_extractor(file_name_here)
    np_frame = np.array(frame)
    result = self.executor.submit(self._process, data=np_frame)
    results_batch += [result]
    
    if len(results_batch) == BATCH_SIZE:
        # collect all the results back
        # and clear the batch
        results_batch = []

This will scale horizontally the GPU pods, as well as autobatch the inference 🙂

  
  
Posted one year ago

can we use a currently setup virtualenv by any chance?

You mean, if the cleamrl-agent needs to setup a new venv each time? are you running in docker mode ?
(by default it is caching the venv so the second time it is using a precached full venv, installing nothing)

  
  
Posted one year ago

@<1560074028276781056:profile|HealthyDove84> if you want you can PR a fix, it should be very simple basically:
None

        elif np_dtype == str:
            return "STRING"
        elif np_dtype == np.object_ or np_dtype.type == np.bytes_:
            return "BYTES"
        return None
  
  
Posted one year ago

Perfect, thank you so much!! 🙏

@<1560074028276781056:profile|HealthyDove84> This is how we’d tackle the video-to-frame ratio issue

  
  
Posted one year ago

Gotcha, thanks a lot @<1523701205467926528:profile|AgitatedDove14> . One issue that I see is that the Dockerfile inside the agent container is what's being used and doesn't seem like it can be replaced by any of these:

      CLEARML_AGENT_DEFAULT_BASE_DOCKER: "nvidia/cuda:11.6.1-runtime-ubuntu20.04"
      TRAINS_AGENT_DEFAULT_BASE_DOCKER: "nvidia/cuda:11.6.1-runtime-ubuntu20.04"
      TRAINS_DOCKER_IMAGE: "nvidia/cuda:11.6.1-runtime-ubuntu20.04" 

Are we missing something?

  
  
Posted one year ago

Nevermind, figured it out, it was using a cached container for some reason 🙂

  
  
Posted one year ago

notice that even inside docker the venv is cached on the host machine 🙂

  
  
Posted one year ago

Thanks for your reponse @<1523701205467926528:profile|AgitatedDove14> , this would be from the model. Something like the TYPE_STRING that Triton accepts.

  
  
Posted one year ago

and of course if your docker has packages preinstalled they are automatically used (not reinstalled)

  
  
Posted one year ago

Hi @<1523701205467926528:profile|AgitatedDove14> , thanks for the always-fast response! 🙂

Yep so I am sending a link to a S3 bucket, and setup Triton ensemble within clearml-serving.

This is the gist of what i’m doing:

None

so essentially i am sending raw data, but i can only send the first 8 frames (L45) since i can’t really send the data in a list or something?

  
  
Posted one year ago

One issue that I see is that the Dockerfile inside the agent container

Not sure I follow, these are settings for the default container to be used when the agent spins a Task for you.
How are you running the agent itself ?

  
  
Posted one year ago

@<1523701205467926528:profile|AgitatedDove14> Regarding the clearml-agent, can we use a currently setup virtualenv by any chance?

  
  
Posted one year ago

I see, trying to A/B test the virtualenv vs docker.

  
  
Posted one year ago

So actually while we’re at it, we also need to return back a string from the model, which would be where the results are uploaded to (S3).

Is this being returned from your Triton Model? or the pre/post processing code?

  
  
Posted one year ago

Great, will try that, thanks @<1523701205467926528:profile|AgitatedDove14> !

  
  
Posted one year ago

Or we need to setup the dependencies every time the experiment is run?

  
  
Posted one year ago

Something like the TYPE_STRING that Triton accepts.

I saw the github issue, this is so odd , look at the triton python package:
https://github.com/triton-inference-server/client/blob/4297c6f5131d540b032cb280f1e[…]1fe2a0744f8e1/src/python/library/tritonclient/utils/init.py

  
  
Posted one year ago

So actually while we’re at it, we also need to return back a string from the model, which would be where the results are uploaded to (S3).

I was able to send back a URL with Triton directly, but the input/output shape mapping doesn’t seem to support strings in Clearml. I have opened an issue for it: None

Am i missing something?

  
  
Posted one year ago

no mention of STRING type ...

  
  
Posted one year ago

I see, very interesting. I know this is a psedo-code, but are you suggesting sending the requests to Triton frame-by-frame?

Or perhaps np_frame = np.array(frame) itself could be a slice of the total_frames ?

Like:

Dataset: [700, x, y, 3]
Batch: [8, x, y, 3]

I think that makes sense, and in the end deploy this endpoint like the pipeline example.

  
  
Posted one year ago

Notice this is per frame (single) not per 8

  
  
Posted one year ago
916 Views
23 Answers
one year ago
one year ago
Tags