Can Someone Help Me With Deploying This Example Model (From Triton Inference Server) Deployed In Clearml-Serving? Too Many Random Errors For Me To Figure It Out

Answered

Can someone help me with deploying this example model (from triton inference server) deployed in clearml-serving? Too many random errors for me to figure it out

https://github.com/triton-inference-server/server/tree/main/qa/python_models/add_sub

For now seeing this error in the triton serving engine task:

File "/opt/tritonserver/backends/python/startup.py", line 382, in <module> python_host = PythonHost(module_path=FLAGS.model_path) File "/opt/tritonserver/backends/python/startup.py", line 172, in __init__ spec.loader.exec_module(module) File "<frozen importlib._bootstrap_external>", line 779, in exec_module File "<frozen importlib._bootstrap_external>", line 915, in get_code File "<frozen importlib._bootstrap_external>", line 972, in get_data FileNotFoundError: [Errno 2] No such file or directory: '/models/test-project-nbdev/16/model.py'
I created the model as follows:

Have a model folder with the config.pbtxt and model.py from the example
task.connect_configuration(configuration=Path('model/config.pbtxt'), name='config.pbtxt') output_model.save("model")

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

Votes Newest

Answers 30

Got the engine running.

curl <serving-engine-ip>:8000/v2/models/keras_mnist/versions/1What’s the serving-engine-ip supposed to be?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

Progress with boto3 added, but fails:

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

Model says PACKAGE, that means it’s fine right?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

AgitatedDove14 - it does have boto but the clearml-serving installation and code refers to older commit hash and hence the task was not using them - https://github.com/allegroai/clearml-serving/blob/main/clearml_serving/serving_service.py#L217

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

Also btw, is this supposed to be screenshot from community verison? https://github.com/manojlds/clearml-serving/blob/main/docs/webapp_screenshots.gif

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

It did pick it from the task?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

Also btw, is this supposed to be screenshot from community verison

Hmm seems like screenshot from an enterprise version, I'll ask them to update 🙂

I am also not understanding how clearml-serving is doing the version for models in triton.

Basically you have two Tasks, one is the "controller" checking model changes and updating itself.
The other is the engine, checking on the "controller" Task, which models it needs to download/configure and replaces them.
This way you can have multiple engines controlled from the same "controller" Task

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Yeah

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

damn I think this is the issue:
https://github.com/allegroai/clearml-serving/blob/b5f5d72046f878bd09505606ca1147d93a5df069/clearml_serving/serving_service.py#L553

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

mode.savemodel ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Initially it was complaining about it, but then when I did the connect_configuration it started working

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

The agent ip? Generally what’s the expected pattern to deploy and scale this for multiple models?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

Do we launch multiple gorups of these in different projects?

Actually Triton can serve multiple models and the endpoints/models are controlled from the clearml-serving.
The only issue is adding a load-balancer in front of multiple nodes to balance the requests between them. wdyt?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

For now that's a quick thing, but for actual use I will need a proper model (pkl) and the .py

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

On my to do list, but will have to wait for later this week (feel free to ping on this thread to remind me).
Regrading the issue at hand, let me check the requirements it is using.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Should I use

update_weights_package

Yes
BTW, config.pbtxt you should pass when "registering" the endpoint with the CLI

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I used .update_weights(path) with path being the model dir containing the model.py annd the config.pbtxt. Should I use update_weights_package ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

Think I will have to fork and play around with it 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

don’t know what’s happening there

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

I think you are correct, it seems like it is missing requirements to boto/azure/google (I will make sure this is added). In the meantime, you can stop the "triton serving engine" Task, reset it, add boto3 to the installed packages and relaunch.
That said your main issue might be packaging the python model. Basically you need to create a model from the entire folder (with whatever there is inside the folder), then Triton should be able to run it (if the config.pbtxt is correct).
m = OutputModel() m.update_weights_package(weights_path='path/goes/here/')

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

yes I'm with you, we need to fix it asap

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

forking and using the latest code fixes the boto issue at least

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

AgitatedDove14 - looks like the serving is doing the savemodel stuff?

https://github.com/allegroai/clearml-serving/blob/main/clearml_serving/serving_service.py#L554

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

Yes, I have no experience with triton does it do lazy loading? Was wondering how it can handle 10s, 100s of models. If we load balance across a set of these engine containers with say 100 models and all of these models get traffic but distribution is not even, each of those engine container will load all those 100 models?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

Think I will have to fork and play around with it

NICE! (BTW: if you manage to get it working I'll be more than happy to help push the PR)
Maybe the quickest win is to store just the .py as model ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Was wondering how it can handle 10s, 100s of models.

Yes, it supports dynamically loading/unloading models based on requests
(load balancing multiple nodes is disconnected from it, but assuming they are under diff endpoints, the load balancer can be configured to route accordingly)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Without some sort of automation on top feels a bit fragile

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

That makes sense - one part I am confused on is - The Triton engine container hosts all the models right? Do we launch multiple gorups of these in different projects?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

I am also not understanding how clearml-serving is doing the version for models in triton.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

The agent ip? Generally what’s the expected pattern to deploy and scale this for multiple models?

Yes the agent's IP, and with multiple agents, one would probably use k8s for the nodes, then configure ingest. This is the next step for the cleaml-serving, adding support for KFServing or manually configuring the ingest. wdyt?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

844 Views

30 Answers

3 years ago

one year ago