Hi Clearml Team Members! Is There Any Progress Made On The Clearml-Serving Repo? I’D Love To Start Using It But I Lack A Straightforward Get Started Example. My Use Case Is The Following:

Answered

Hi ClearML team members!

Is there any progress made on the clearml-serving repo? I’d love to start using it but I lack a straightforward get started example. My use case is the following:
deploying several torchscript models with nvidia triton and use them/monitor/roll out new ones. Have the possibility to scale the nvidia-triton instances (multiple agents/seldon?)
I am thinking about doing it manually (setting up the triton server, preparing the models, etc.), but to me it seems like clearml-serving is exactly covering this and I wouldn’t like to miss the chance to use the integrated solution you propose. Is there any time estimation as to when we could expect some work on clearml-serving?

  				
Posted 
	3 years ago

					More  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

Votes Newest

Answers 3

Yes the challenge is mostly around defining the interface. Regarding packaging, I'm thinking a similar approach to the pipeline decorator, wdyt?
Clearml agents will be running on k8s, but the main caveat is that I cannot think of a way to help with the deployment, at the end it will be kubectl that users will have to call in order to spin the containers with the agents, maybe a simple CLI to do that for you?

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Is there any progress made on the clearml-serving repo?

Hi JitteryCoyote63
yes, things are progressing slower than expected, I'm expecting actual work will be pushed in early Jan. On the bright side we are trying to work closely with TorchServing team and Nvidia Triton to expand capabilities.
Currently it seems the setup will be "proxy server container" for per-post processing, then serving engine container (Triton/Torch), with monitoring container as control plan (i.e. collecting stats and storing the model state, as is today).
The main hurdles are:
Deciding on abstract class for the proxy server (basically allowing users to write pre/post python code Connecting the proxy/serving machines (i.e. configuring external endpoints and making sure internally requested are routed). We are trying to think how this could easily be done, but currently the only solution we can think of is setting (connecting) a k8s cluster ...Feel free to help in both outstanding points, it will really accelerate the process.
What did you have in mind?

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi AgitatedDove14 , that’s super exciting news! 🤩 🚀
Regarding the two outstanding points:
In my case, I’d maintain a client python package that takes care of the pre/post processing of each request, so that I only send the raw data to the inference service and I post process the raw output of the model returned by the inference service. But I understand why it might be desirable for the users to have these steps happening on the server. What is challenging in this context? Defining how the user should ship this code? or what the abstract class should look like? K8s sounds like the right choice, the tricky part being to abstract that away from the user. Maybe have a service task monitoring the cluster and scaling up when needed: the service task spins up (similar to aws autoscaler) new clearml-agents acting as k8s nodes and that connect to the master node?

  				
Posted 
	3 years ago

					More  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

Write your answer

1K Views

3 Answers

3 years ago

2 years ago