GrumpyPenguin23 , AgitatedDove14 thanks for replying! basically i'm looking for a real time inference endpoint exposing a prediction API method, something like:curl -i \ --header "Content-Type: application/json" \ --request POST \ --data '[[5.1, 3.5, 1.4, 0.2]]' \
Hi ContemplativeCockroach39
Assuming you wrap your model with a flask app (or using any other serving solution), usually you need:
Get the model Add some metrics on runtime performance package in a dockerGetting a pretrained model is straight forward one you know either the creating Task or the Model ID
` from clearml import Task, Model
model_file_from_task = Task.get_task(task_id).models['output'][-1].get_local_copy()
or
model_file_from_model = Model(model_id=<moedl_id>).get_local_copy() Add performance metrics :
from clearml import Task
task = Task.init(project_name='inference', task_name='runtime')
task.get_logger().report_scalar(title='performance', series='latency', value=0.123, iteration=some_counter_here) Once you run it once you have a Task of the inference code in the system, you can either enqueue to a clearml-agent, or package as a standalone docker. Packaging to a docker
clearml-agent build --id <task_id_here> --docker --target docker_image_name `
It depends on what you mean by deployment, and what kind of inference you plan to do (ie rt vs batched etc)
But overall currently serving itself is not handled by the open source offering, mainly because there are so many variables and frameworks to consider.
Can you share some more details about the capabilities you are looking for? Some essentials like staging and model versioning are handled very well...
ContemplativeCockroach39 unfortunately No directly as part of clearml 😞
I can recommend the Nvidia triton serving (I'm hoping we will have the out-of-the-box integration soon)
mean while you can manually run it , see docs:
https://developer.nvidia.com/nvidia-triton-inference-server
docker here
https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver