Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello! Question About

hello! question about clearml-serving :

Trying to do model inference on a video, so first step in Preprocess class is to extract frames. However, once this is done, we have ~700 frames, and the max batch_size we can set is ~8-16. How can we deal with this?

  
  
Posted one year ago
Votes Newest

Answers 23


Thanks for your reponse @<1523701205467926528:profile|AgitatedDove14> , this would be from the model. Something like the TYPE_STRING that Triton accepts.

  
  
Posted one year ago

Great, will try that, thanks @<1523701205467926528:profile|AgitatedDove14> !

  
  
Posted one year ago

I see, very interesting. I know this is a psedo-code, but are you suggesting sending the requests to Triton frame-by-frame?

Or perhaps np_frame = np.array(frame) itself could be a slice of the total_frames ?

Like:

Dataset: [700, x, y, 3]
Batch: [8, x, y, 3]

I think that makes sense, and in the end deploy this endpoint like the pipeline example.

  
  
Posted one year ago

Something like the TYPE_STRING that Triton accepts.

I saw the github issue, this is so odd , look at the triton python package:
https://github.com/triton-inference-server/client/blob/4297c6f5131d540b032cb280f1e[…]1fe2a0744f8e1/src/python/library/tritonclient/utils/init.py

  
  
Posted one year ago

Or we need to setup the dependencies every time the experiment is run?

  
  
Posted one year ago

Hi @<1523701205467926528:profile|AgitatedDove14> , thanks for the always-fast response! 🙂

Yep so I am sending a link to a S3 bucket, and setup Triton ensemble within clearml-serving.

This is the gist of what i’m doing:

None

so essentially i am sending raw data, but i can only send the first 8 frames (L45) since i can’t really send the data in a list or something?

  
  
Posted one year ago

, but are you suggesting sending the requests to Triton frame-by-frame?

yes! trition backend will do the autobatching, and in an enterprise deployment the gRPC loadbalancer will split it across multiple GPU nodes 🙂

  
  
Posted one year ago

can we use a currently setup virtualenv by any chance?

You mean, if the cleamrl-agent needs to setup a new venv each time? are you running in docker mode ?
(by default it is caching the venv so the second time it is using a precached full venv, installing nothing)

  
  
Posted one year ago

One issue that I see is that the Dockerfile inside the agent container

Not sure I follow, these are settings for the default container to be used when the agent spins a Task for you.
How are you running the agent itself ?

  
  
Posted one year ago

Nevermind, figured it out, it was using a cached container for some reason 🙂

  
  
Posted one year ago

Notice this is per frame (single) not per 8

  
  
Posted one year ago

Gotcha, thanks a lot @<1523701205467926528:profile|AgitatedDove14> . One issue that I see is that the Dockerfile inside the agent container is what's being used and doesn't seem like it can be replaced by any of these:

      CLEARML_AGENT_DEFAULT_BASE_DOCKER: "nvidia/cuda:11.6.1-runtime-ubuntu20.04"
      TRAINS_AGENT_DEFAULT_BASE_DOCKER: "nvidia/cuda:11.6.1-runtime-ubuntu20.04"
      TRAINS_DOCKER_IMAGE: "nvidia/cuda:11.6.1-runtime-ubuntu20.04" 

Are we missing something?

  
  
Posted one year ago

@<1560074028276781056:profile|HealthyDove84> if you want you can PR a fix, it should be very simple basically:
None

        elif np_dtype == str:
            return "STRING"
        elif np_dtype == np.object_ or np_dtype.type == np.bytes_:
            return "BYTES"
        return None
  
  
Posted one year ago

and of course if your docker has packages preinstalled they are automatically used (not reinstalled)

  
  
Posted one year ago

notice that even inside docker the venv is cached on the host machine 🙂

  
  
Posted one year ago

I see, trying to A/B test the virtualenv vs docker.

  
  
Posted one year ago

So actually while we’re at it, we also need to return back a string from the model, which would be where the results are uploaded to (S3).

I was able to send back a URL with Triton directly, but the input/output shape mapping doesn’t seem to support strings in Clearml. I have opened an issue for it: None

Am i missing something?

  
  
Posted one year ago

So actually while we’re at it, we also need to return back a string from the model, which would be where the results are uploaded to (S3).

Is this being returned from your Triton Model? or the pre/post processing code?

  
  
Posted one year ago

Perfect, thank you so much!! 🙏

@<1560074028276781056:profile|HealthyDove84> This is how we’d tackle the video-to-frame ratio issue

  
  
Posted one year ago

no mention of STRING type ...

  
  
Posted one year ago

@<1523701205467926528:profile|AgitatedDove14> Regarding the clearml-agent, can we use a currently setup virtualenv by any chance?

  
  
Posted one year ago

I see, actually what you should do is a fully custom endpoint,

  • preprocessing -> doenload video
  • processing -> extract frames and send them to Triton with gRPC (see below how)
  • post processing, return a human readable answer
    Regrading the processing itself, what you need is to take this function (copy paste):
    None
    have it as internal _process(numpy_frame) and then have something along the lines of this pseudo code
def process(...):
  results_batch = []
  for frame in my_video_frame_extractor(file_name_here)
    np_frame = np.array(frame)
    result = self.executor.submit(self._process, data=np_frame)
    results_batch += [result]
    
    if len(results_batch) == BATCH_SIZE:
        # collect all the results back
        # and clear the batch
        results_batch = []

This will scale horizontally the GPU pods, as well as autobatch the inference 🙂

  
  
Posted one year ago

Hi @<1547028116780617728:profile|TimelyRabbit96>

Trying to do model inference on a video, so first step in

Preprocess

class is to extract frames.

Basically this depends on the RestAPI, usually would will be sending a link to data to be processed and returned Synchronously
What you should have a custom endpoint doing the extraction, send Raw data into another endpoint doing the model inference, basically think "pipeline" end points:
None

  
  
Posted one year ago
906 Views
23 Answers
one year ago
one year ago
Tags