Reputation
Badges 1
26 × Eureka!@<1523701205467926528:profile|AgitatedDove14> Okay we got to the bottom of this. This was actually because of the load balancer timeout settings we had, which was also 30 seconds and confusing us.
We didn’t end up needing the above configs after all.
I’m not exactly sure yet, but the instance seems to breaking down which was I thought about this. Will investigate further and let you know.
@<1657918706052763648:profile|SillyRobin38>
Seems like this still doesn’t solve the problem, how can we verify this setting has been applied correctly? Other than checking the clearml.conf file on the container that is
oh actually it seems like this is possible already from the code!
This is the gist of our current setup using the recommended approach
I see, very interesting. I know this is a psedo-code, but are you suggesting sending the requests to Triton frame-by-frame?
Or perhaps np_frame = np.array(frame) itself could be a slice of the total_frames ?
Like:
Dataset: [700, x, y, 3]
Batch: [8, x, y, 3]
I think that makes sense, and in the end deploy this endpoint like the pipeline example.
Hi @<1523701118159294464:profile|ExasperatedCrab78> , so I’ve started looking into setting up the TritonBackends now, as we first discussed.
I was able to structure the folders correctly, and deploy the endopints. However, when I spin up the containers, I get the following error:
clearml-serving-triton | | detection_preprocess | 1 | UNAVAILABLE: Internal: Unable to initialize shared memory key 'triton_python_backend_shm_region_1' to requested size (67108864 bytes). If yo...
@<1523701118159294464:profile|ExasperatedCrab78> So this is something I mean. If you think it’d be okay, I can properly implement this:
If you’re wondering about the case where no optional config.pbtxt is provided, I guess the logic would be pretty much the same as above:
model_name = f"{model_name}_{version}"
But then after looking at create_config_pbtxt() , it seems like this is not being constructed at all, making me realize that this may have been optional - [confirming name is an optional propery](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#name...
I’m running it through docker-compose . Tried both with and without triton
Hmm still facing same issue actually…
print("This runs!")
predict_a = self.send_request(endpoint="/test_model_sklearn_a/", version=None, data=data)
predict_b = self.send_request(endpoint="/test_model_sklearn_b/", version=None, data=data)
print("Doesn't get here", predict_a, predict_b)
And still, hitting the endpoints independently using curls works. Are you able to replicate this?
@<1523701205467926528:profile|AgitatedDove14> this file is not getting mounted when using the docker-compose file for the clearml-serving pipeline, do we also have to mount it somehow?
The only place I can see this file being used is in the README, like so:
Spin the inference container:
docker run -v ~/clearml.conf:/root/clearml.conf -p 8080:8080 -e CLEARML_SERVING_TASK_ID=<service_id> -e CLEARML_SERVING_POLL_FREQ=5 clearml-serving-inference:latest
This is basically what I follow for setting up my own Triton server:
Hi @<1523701205467926528:profile|AgitatedDove14> , thanks for the always-fast response! 🙂
Yep so I am sending a link to a S3 bucket, and setup Triton ensemble within clearml-serving.
This is the gist of what i’m doing:
so essentially i am sending raw data, but i can only send the first 8 frames (L45) since i can’t really send the data in a list or something?
ahh yepp, that makes sense! Thank you so much!
Thanks @<1523701205467926528:profile|AgitatedDove14> , this seems to solve the issue. I guess the main issue is that the delimiter is a _ instead of / . This did work, however, as you can see from the model endpoint deployment snippet, we also provide a custom aux-config file. We also had to make sure to update the name inside config.pbtxt so that Triton is happy:
From
name: "mmdet"
TO:
name: "mmdet_VERSION" -> "mmdet_1"
I can see pipelines, but not sure if it applies to Triton directly, more of a DAG approach?
Yep, that makes sense. @<1671689437261598720:profile|FranticWhale40> plz give that a try
- Haven’t changed it, although I did change the host port to 8081 instead of 8080? Everything else seems to work fine tho.
- Sorry what do you mean? I basically just followed the tutorial
Thanks for your response! I see, yep from an initial view it could work. Will certainly give it a try 🙂
However, to give you more context, in order to setup an ensemble within Triton, you also need to add a ensemble_scheduling block to the config.pbtxt file, which would be something like this:
I’m guessing this’ll be diffic...
I see, yep aux-config seems useful for sure. Would it be possible to pass a file perhaps to replace config.pbtxt completely? Formatting all the input/output shapes, and now the ensemble stuff is starting to get quite complicated 🙂
@<1523701118159294464:profile|ExasperatedCrab78> , would you have any idea about above? Triton itself supports ensembling, was wondering if we can somehow support this as well?
I see. So what would be the reason for one using a load balancer in this case? 🙂
okay sorry for spamming here, but i feel like other ppl would find this useful, so i was able to deploy the ensemble model, and i guess to complete this, i would need to individually add all the other “endpoints” independently right?
As in, to reach something like below within Triton:
So actually while we’re at it, we also need to return back a string from the model, which would be where the results are uploaded to (S3).
I was able to send back a URL with Triton directly, but the input/output shape mapping doesn’t seem to support strings in Clearml. I have opened an issue for it: None
Am i missing something?
I see, okay so using
shm_size: '2gb'
we still need to modify the infrence logic to register and input and output on shmem, no?
Hi @<1523701205467926528:profile|AgitatedDove14> , I already did the scikit learn examples, it works.
Also both endpoint_a and endpoint_b work when hitting them directly within the pipeline example. But not the pipeline itself.
Or rather any pointers to debug the problem further? Our GCP instances have a pretty fast internet connection, and we haven’t faced that problem on those instances. It’s only on this specific local machine that we’re facing this truncated download.
I say truncated because we checked the model.onnx size on the container, and it was for example 110MB whereas the original one is around 160MB.
Thank you for all the answers! Yep that worked, though is it usually safe to add this option? Instead of --shm-size
Also, now I managed to send an image through curl using a local image (@img.png in curl). Seems to work through this! Getting the same gRPC limit size , but seems like there’s a new commit that addressed it! 🎉
@<1657918706052763648:profile|SillyRobin38> ^
Perfect, thank you so much!! 🙏
@<1560074028276781056:profile|HealthyDove84> This is how we’d tackle the video-to-frame ratio issue