Can We Use The Simple Docker-Compose.Yml File For Clearml Serving On A Huggingface Model (Not Processed To Tensorrt)?

Answered

Can we use the simple docker-compose.yml file for clearml serving on a huggingface model (not processed to tensorrt)?

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

Votes Newest

Answers 12

I would like to know if it is possible to run any pytorch model on the basic docker compose file ? Without triton?

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

Sure! This is an example of running a custom model. It basically boils down to defining a preprocess, process and postprocess function. Inside the process function can be anything, including just a basic call to huggingface to run inference 🙂
I have not tested this myself mind you, but I see no reason why it wouldn't work!
In fact, I think even Triton itself supports running on CPU these days, so you still have the option :)

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Sorry, I jumped the gun before I fully understood your question 🙂 So with simple docker compose file, you mean you don't want to use docker-compose-triton.yaml file and so want to run the huggingface model on CPU instead of Triton?

Or do you want to know if the general docker compose version is able to handle a huggingface model?

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

I basically would like to know if we can serve the model without tensorrt format which is highly efficient but more complicated to get.

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

In production, we should use the clearml-helm-charts right? Docker-compose in the clearml-serving is more for local testing

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

That wasn't my intention! Not a dumb question, just a logical one 😄

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

As I understand it, vertical scaling means giving each container more resources to work with. This should always be possible in a k8s context, because you decide which types of machines go in your pool and your define the requirements for each container yourself 🙂 So if you want to set the container to use 10.000 CPUs feel free! Unless you mean something else with this, in which case please counter!

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Thank you! I will try this 🙂

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

Sorry to come back to this! Regarding the Kubernetes Serving helm chart, I can see horyzontal scaling of docker containers. What about vertical scaling? Is it implemented? More specifically, where is defined the SKU of the VMs in use?

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

Thanks, my question is dumb indeed 🙂 Thanks for the reply !

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

Usually those models are Pytorch right? So, yeah, you should be able to, feel free to follow the Pytorch example if you want to know how 🙂

  				
Posted 
	2 years ago

					More  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Prerequisites, PyTorch models require Triton engine support, please use docker-compose-triton.yml / docker-compose-triton-gpu.yml or if running on Kubernetes, the matching helm chart.

  				
Posted 
	2 years ago

					More  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

Write your answer

1K Views

12 Answers

2 years ago