Initially it was complaining about it, but then when I did the connect_configuration it started working
Was wondering how it can handle 10s, 100s of models.
Yes, it supports dynamically loading/unloading models based on requests
(load balancing multiple nodes is disconnected from it, but assuming they are under diff endpoints, the load balancer can be configured to route accordingly)
Do we launch multiple gorups of these in different projects?
Actually Triton can serve multiple models and the endpoints/models are controlled from the clearml-serving.
The only issue is adding a load-balancer in front of multiple nodes to balance the requests between them. wdyt?
I think you are correct, it seems like it is missing requirements to boto/azure/google (I will make sure this is added). In the meantime, you can stop the "triton serving engine" Task, reset it, add boto3 to the installed packages and relaunch.
That said your main issue might be packaging the python model. Basically you need to create a model from the entire folder (with whatever there is inside the folder), then Triton should be able to run it (if the config.pbtxt is correct).m = OutputModel() m.update_weights_package(weights_path='path/goes/here/')
Also btw, is this supposed to be screenshot from community verison
Hmm seems like screenshot from an enterprise version, I'll ask them to update 🙂
I am also not understanding how clearml-serving is doing the version for models in triton.
Basically you have two Tasks, one is the "controller" checking model changes and updating itself.
The other is the engine, checking on the "controller" Task, which models it needs to download/configure and replaces them.
This way you can have multiple engines controlled from the same "controller" Task
Model says PACKAGE, that means it’s fine right?
I used .update_weights(path)
with path being the model
dir containing the model.py annd the config.pbtxt. Should I use update_weights_package
?
The agent ip? Generally what’s the expected pattern to deploy and scale this for multiple models?
Yes the agent's IP, and with multiple agents, one would probably use k8s for the nodes, then configure ingest. This is the next step for the cleaml-serving, adding support for KFServing or manually configuring the ingest. wdyt?
On my to do list, but will have to wait for later this week (feel free to ping on this thread to remind me).
Regrading the issue at hand, let me check the requirements it is using.
forking and using the latest code fixes the boto issue at least
Think I will have to fork and play around with it 🙂
Yes, I have no experience with triton does it do lazy loading? Was wondering how it can handle 10s, 100s of models. If we load balance across a set of these engine containers with say 100 models and all of these models get traffic but distribution is not even, each of those engine container will load all those 100 models?
I am also not understanding how clearml-serving is doing the version for models in triton.
The agent ip? Generally what’s the expected pattern to deploy and scale this for multiple models?
Think I will have to fork and play around with itÂ
NICE! (BTW: if you manage to get it working I'll be more than happy to help push the PR)
Maybe the quickest win is to store just the .py as model ?
Got the engine running.
curl <serving-engine-ip>:8000/v2/models/keras_mnist/versions/1
What’s the serving-engine-ip supposed to be?
For now that's a quick thing, but for actual use I will need a proper model (pkl) and the .py
Without some sort of automation on top feels a bit fragile
Should I useÂ
update_weights_package
Yes
BTW, config.pbtxt you should pass when "registering" the endpoint with the CLI
That makes sense - one part I am confused on is - The Triton engine container hosts all the models right? Do we launch multiple gorups of these in different projects?
AgitatedDove14 - looks like the serving is doing the savemodel stuff?
https://github.com/allegroai/clearml-serving/blob/main/clearml_serving/serving_service.py#L554
AgitatedDove14 - it does have boto but the clearml-serving installation and code refers to older commit hash and hence the task was not using them - https://github.com/allegroai/clearml-serving/blob/main/clearml_serving/serving_service.py#L217
Also btw, is this supposed to be screenshot from community verison? https://github.com/manojlds/clearml-serving/blob/main/docs/webapp_screenshots.gif