@<1523701118159294464:profile|ExasperatedCrab78> , would you have any idea about above? Triton itself supports ensembling, was wondering if we can somehow support this as well?
Hi @<1547028116780617728:profile|TimelyRabbit96> Awesome that you managed to get it working!
@<1547028116780617728:profile|TimelyRabbit96>
Pipelines has little to do with serving, so let's not focus on that for now.
Instead, if you need a ensemble_scheduling
block, you can use the CLI's --aux-config
command to add any extra stuff that needs to be in the config.pbtxt
For example here, under the Setup section step 2, we use the --aux-config
flag to add a dynamic batching block: None
Hi there!
Technically there should be nothing stopping you from deploying a python backend model. I just checked the source code and ClearML basically just downloads the model artifact and renames it based on the inferred type of model.
As far as I'm aware (could def be wrong here!), the Triton Python backend essentially requires a folder containing e.g. a model.py
file. I propose the following steps:
- Given the code above, if you package the
model.py
file as a folder in clearml, clearml-serving will detect this and simply extract the folder in the right place for you. Then you have to adjust theconfig.pbtxt
using the command line arguments to properly load the python file - If this does not work, an extra ifelse check should be added in the code above, also checking for "python" in the framework, similar to e.g. pytorch or onnx
- However it is done, once the python file is in the right position and the
config.pbtxt
is properly setup, triton should just take it from there and everything should work as expected
Could you try this approach? If this works, it would be an interesting example to add to the repo! Thanks 😄
oh actually it seems like this is possible already from the code!
I can see pipelines, but not sure if it applies to Triton directly, more of a DAG approach?
Hey! Thanks for all the work you're putting in and the awesome feedback 😄
So, it's weird you get the shm error, this is most likely our fault for not configuring the containers correctly 😞 The containers are brought up using the docker-compose file, so you'll have to add it in there. The service you want is called clearml-serving-triton
, you can find it here .
Check the docker docs here for the right key to add in the docker compose. It looks like it's called shm_size
and set it to something higher. On the other hand, if I'm not mistaken, setting ipc: host
instead should also work and is probably better for performance! Would you mind adding that? So adding ipc: host
to the clearml-serving-triton
service on the same level as image
or ports
for example
Thanks for your response! I see, yep from an initial view it could work. Will certainly give it a try 🙂
However, to give you more context, in order to setup an ensemble within Triton, you also need to add a ensemble_scheduling
block to the config.pbtxt
file, which would be something like this:
I’m guessing this’ll be difficult given the current functionality of the CLI?
Thank you for all the answers! Yep that worked, though is it usually safe to add this option? Instead of --shm-size
Also, now I managed to send an image through curl using a local image (@img.png in curl). Seems to work through this! Getting the same gRPC limit size , but seems like there’s a new commit that addressed it! 🎉
okay sorry for spamming here, but i feel like other ppl would find this useful, so i was able to deploy the ensemble model, and i guess to complete this, i would need to individually add all the other “endpoints” independently right?
As in, to reach something like below within Triton:
Yes, you will indeed need to add all ensemble endpoints separately 🙂
Hi @<1523701118159294464:profile|ExasperatedCrab78> , so I’ve started looking into setting up the TritonBackends now, as we first discussed.
I was able to structure the folders correctly, and deploy the endopints. However, when I spin up the containers, I get the following error:
clearml-serving-triton | | detection_preprocess | 1 | UNAVAILABLE: Internal: Unable to initialize shared memory key 'triton_python_backend_shm_region_1' to requested size (67108864 bytes). If you are running Triton inside docker, use '--shm-size' flag to control the shared memory region size. Each Python backend model instance requires at least 64MBs of shared memory. Error: No such file or directory
I then wanted to debug this a little further, to see if this is the issue. Passed --t-log-verbose=2
in CLEARML_TRITON_HELPER_ARGS
to get more logs, but triton didn’t like it:
tritonserver: unrecognized option '--t_log_verbose=2'
Usage: tritonserver [options]
...
So wondering, is there any way to increase the shared memory size as well? I believe we have to do this when running/starting the container? But i couldn’t figure out how the container is brought up when doing it directly:
docker run --name triton --gpus=all -it --shm-size=512m -p8000:8000 -p8001:8001 -p8002:8002 -v $(pwd)/model_repository:/models image_path
I see, yep aux-config
seems useful for sure. Would it be possible to pass a file perhaps to replace config.pbtxt
completely? Formatting all the input/output shapes, and now the ensemble stuff is starting to get quite complicated 🙂
@<1523701118159294464:profile|ExasperatedCrab78> So this is something I mean. If you think it’d be okay, I can properly implement this: