Can We Use The Simple Docker-Compose.Yml File For Clearml Serving On A Huggingface Model (Not Processed To Tensorrt)?

Answered

Can we use the simple docker-compose.yml file for clearml serving on a huggingface model (not processed to tensorrt)?

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

Votes Newest

Answers 12

Sorry, I jumped the gun before I fully understood your question 🙂 So with simple docker compose file, you mean you don't want to use docker-compose-triton.yaml file and so want to run the huggingface model on CPU instead of Triton?

Or do you want to know if the general docker compose version is able to handle a huggingface model?

  				
Posted 
	one year ago

					More  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Sorry to come back to this! Regarding the Kubernetes Serving helm chart, I can see horyzontal scaling of docker containers. What about vertical scaling? Is it implemented? More specifically, where is defined the SKU of the VMs in use?

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

Usually those models are Pytorch right? So, yeah, you should be able to, feel free to follow the Pytorch example if you want to know how 🙂

  				
Posted 
	one year ago

					More  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Prerequisites, PyTorch models require Triton engine support, please use docker-compose-triton.yml / docker-compose-triton-gpu.yml or if running on Kubernetes, the matching helm chart.

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

I basically would like to know if we can serve the model without tensorrt format which is highly efficient but more complicated to get.

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

Thanks, my question is dumb indeed 🙂 Thanks for the reply !

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

Thank you! I will try this 🙂

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

As I understand it, vertical scaling means giving each container more resources to work with. This should always be possible in a k8s context, because you decide which types of machines go in your pool and your define the requirements for each container yourself 🙂 So if you want to set the container to use 10.000 CPUs feel free! Unless you mean something else with this, in which case please counter!

  				
Posted 
	one year ago

					More  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

In production, we should use the clearml-helm-charts right? Docker-compose in the clearml-serving is more for local testing

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

I would like to know if it is possible to run any pytorch model on the basic docker compose file ? Without triton?

  				
Posted 
	one year ago

					More  		
  Report
		
					SuccessfulRaven86
				
					0
					 × 1

That wasn't my intention! Not a dumb question, just a logical one 😄

  				
Posted 
	one year ago

					More  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Sure! This is an example of running a custom model. It basically boils down to defining a preprocess, process and postprocess function. Inside the process function can be anything, including just a basic call to huggingface to run inference 🙂
I have not tested this myself mind you, but I see no reason why it wouldn't work!
In fact, I think even Triton itself supports running on CPU these days, so you still have the option :)

  				
Posted 
	one year ago

					More  		
  Report
		
					ExasperatedCrab78
				
					0
					 × 1

Write your answer

1K Views

12 Answers

one year ago