Answered

I'M New To Clearml And I'D Like To Deploy An Inference Service Based On My Trained Model, Something Like What Bentoml Does Wrapping Flask Api... Is There A Way To Do It Within Clearml?

I'm new to ClearML and I'd like to deploy an inference service based on my trained model, something like what BentoML does wrapping Flask API... is there a way to do it within ClearML?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ContemplativeBeetle39
				
					0
					 × 1

Votes Newest

Answers 4

ContemplativeCockroach39 unfortunately No directly as part of clearml 😞
I can recommend the Nvidia triton serving (I'm hoping we will have the out-of-the-box integration soon)
mean while you can manually run it , see docs:
https://developer.nvidia.com/nvidia-triton-inference-server
docker here
https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi ContemplativeCockroach39
Assuming you wrap your model with a flask app (or using any other serving solution), usually you need:
Get the model Add some metrics on runtime performance package in a dockerGetting a pretrained model is straight forward one you know either the creating Task or the Model ID
` from clearml import Task, Model
model_file_from_task = Task.get_task(task_id).models['output'][-1].get_local_copy()

or

model_file_from_model = Model(model_id=<moedl_id>).get_local_copy() Add performance metrics : from clearml import Task
task = Task.init(project_name='inference', task_name='runtime')
task.get_logger().report_scalar(title='performance', series='latency', value=0.123, iteration=some_counter_here) Once you run it once you have a Task of the inference code in the system, you can either enqueue to a clearml-agent, or package as a standalone docker. Packaging to a docker clearml-agent build --id <task_id_here> --docker --target docker_image_name `

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

GrumpyPenguin23 , AgitatedDove14 thanks for replying! basically i'm looking for a real time inference endpoint exposing a prediction API method, something like:
curl -i \ --header "Content-Type: application/json" \ --request POST \ --data '[[5.1, 3.5, 1.4, 0.2]]' \

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					ContemplativeBeetle39
				
					0
					 × 1

It depends on what you mean by deployment, and what kind of inference you plan to do (ie rt vs batched etc)
But overall currently serving itself is not handled by the open source offering, mainly because there are so many variables and frameworks to consider.
Can you share some more details about the capabilities you are looking for? Some essentials like staging and model versioning are handled very well...

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					GrumpyPenguin23
				
					0
					 × 1

Write your answer

1K Views

4 Answers

3 years ago

one year ago