Nevermind, figured it out, it was using a cached container for some reason 🙂
One issue that I see is that the Dockerfile inside the agent container
Not sure I follow, these are settings for the default container to be used when the agent spins a Task for you.
How are you running the agent itself ?
Gotcha, thanks a lot @<1523701205467926528:profile|AgitatedDove14> . One issue that I see is that the Dockerfile inside the agent container is what's being used and doesn't seem like it can be replaced by any of these:
CLEARML_AGENT_DEFAULT_BASE_DOCKER: "nvidia/cuda:11.6.1-runtime-ubuntu20.04"
TRAINS_AGENT_DEFAULT_BASE_DOCKER: "nvidia/cuda:11.6.1-runtime-ubuntu20.04"
TRAINS_DOCKER_IMAGE: "nvidia/cuda:11.6.1-runtime-ubuntu20.04"
Are we missing something?
and of course if your docker has packages preinstalled they are automatically used (not reinstalled)
notice that even inside docker the venv is cached on the host machine 🙂
I see, trying to A/B test the virtualenv vs docker.
can we use a currently setup virtualenv by any chance?
You mean, if the cleamrl-agent needs to setup a new venv each time? are you running in docker mode ?
(by default it is caching the venv so the second time it is using a precached full venv, installing nothing)
Or we need to setup the dependencies every time the experiment is run?
@<1523701205467926528:profile|AgitatedDove14> Regarding the clearml-agent, can we use a currently setup virtualenv by any chance?
Great, will try that, thanks @<1523701205467926528:profile|AgitatedDove14> !
@<1560074028276781056:profile|HealthyDove84> if you want you can PR a fix, it should be very simple basically:
None
elif np_dtype == str:
return "STRING"
elif np_dtype == np.object_ or np_dtype.type == np.bytes_:
return "BYTES"
return None
Something like the TYPE_STRING that Triton accepts.
I saw the github issue, this is so odd , look at the triton python package:
https://github.com/triton-inference-server/client/blob/4297c6f5131d540b032cb280f1e[…]1fe2a0744f8e1/src/python/library/tritonclient/utils/init.py
Thanks for your reponse @<1523701205467926528:profile|AgitatedDove14> , this would be from the model. Something like the TYPE_STRING that Triton accepts.
So actually while we’re at it, we also need to return back a string from the model, which would be where the results are uploaded to (S3).
Is this being returned from your Triton Model? or the pre/post processing code?
So actually while we’re at it, we also need to return back a string from the model, which would be where the results are uploaded to (S3).
I was able to send back a URL with Triton directly, but the input/output shape mapping doesn’t seem to support strings in Clearml. I have opened an issue for it: None
Am i missing something?
Perfect, thank you so much!! 🙏
@<1560074028276781056:profile|HealthyDove84> This is how we’d tackle the video-to-frame ratio issue
Notice this is per frame (single) not per 8
, but are you suggesting sending the requests to Triton frame-by-frame?
yes! trition backend will do the autobatching, and in an enterprise deployment the gRPC loadbalancer will split it across multiple GPU nodes 🙂
I see, very interesting. I know this is a psedo-code, but are you suggesting sending the requests to Triton frame-by-frame?
Or perhaps np_frame = np.array(frame)
itself could be a slice of the total_frames
?
Like:
Dataset: [700, x, y, 3]
Batch: [8, x, y, 3]
I think that makes sense, and in the end deploy this endpoint like the pipeline example.
I see, actually what you should do is a fully custom endpoint,
- preprocessing -> doenload video
- processing -> extract frames and send them to Triton with gRPC (see below how)
- post processing, return a human readable answer
Regrading the processing itself, what you need is to take this function (copy paste):
None
have it as internal_process(numpy_frame)
and then have something along the lines of this pseudo code
def process(...):
results_batch = []
for frame in my_video_frame_extractor(file_name_here)
np_frame = np.array(frame)
result = self.executor.submit(self._process, data=np_frame)
results_batch += [result]
if len(results_batch) == BATCH_SIZE:
# collect all the results back
# and clear the batch
results_batch = []
This will scale horizontally the GPU pods, as well as autobatch the inference 🙂
Hi @<1523701205467926528:profile|AgitatedDove14> , thanks for the always-fast response! 🙂
Yep so I am sending a link to a S3 bucket, and setup Triton ensemble within clearml-serving.
This is the gist of what i’m doing:
so essentially i am sending raw data, but i can only send the first 8 frames (L45) since i can’t really send the data in a list or something?
Hi @<1547028116780617728:profile|TimelyRabbit96>
Trying to do model inference on a video, so first step in
Preprocess
class is to extract frames.
Basically this depends on the RestAPI, usually would will be sending a link to data to be processed and returned Synchronously
What you should have a custom endpoint doing the extraction, send Raw data into another endpoint doing the model inference, basically think "pipeline" end points:
None