Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Can We Use The Simple Docker-Compose.Yml File For Clearml Serving On A Huggingface Model (Not Processed To Tensorrt)?

Can we use the simple docker-compose.yml file for clearml serving on a huggingface model (not processed to tensorrt)?

  
  
Posted one year ago
Votes Newest

Answers 12


Sorry to come back to this! Regarding the Kubernetes Serving helm chart, I can see horyzontal scaling of docker containers. What about vertical scaling? Is it implemented? More specifically, where is defined the SKU of the VMs in use?

  
  
Posted one year ago

That wasn't my intention! Not a dumb question, just a logical one 😄

  
  
Posted one year ago

Sorry, I jumped the gun before I fully understood your question 🙂 So with simple docker compose file, you mean you don't want to use docker-compose-triton.yaml file and so want to run the huggingface model on CPU instead of Triton?

Or do you want to know if the general docker compose version is able to handle a huggingface model?

  
  
Posted one year ago

Prerequisites, PyTorch models require Triton engine support, please use docker-compose-triton.yml / docker-compose-triton-gpu.yml or if running on Kubernetes, the matching helm chart.

  
  
Posted one year ago

I would like to know if it is possible to run any pytorch model on the basic docker compose file ? Without triton?

  
  
Posted one year ago

Thank you! I will try this 🙂

  
  
Posted one year ago

Sure! This is an example of running a custom model. It basically boils down to defining a preprocess, process and postprocess function. Inside the process function can be anything, including just a basic call to huggingface to run inference 🙂
I have not tested this myself mind you, but I see no reason why it wouldn't work!
In fact, I think even Triton itself supports running on CPU these days, so you still have the option :)

  
  
Posted one year ago

As I understand it, vertical scaling means giving each container more resources to work with. This should always be possible in a k8s context, because you decide which types of machines go in your pool and your define the requirements for each container yourself 🙂 So if you want to set the container to use 10.000 CPUs feel free! Unless you mean something else with this, in which case please counter!

  
  
Posted one year ago

Usually those models are Pytorch right? So, yeah, you should be able to, feel free to follow the Pytorch example if you want to know how 🙂

  
  
Posted one year ago

Thanks, my question is dumb indeed 🙂 Thanks for the reply !

  
  
Posted one year ago

I basically would like to know if we can serve the model without tensorrt format which is highly efficient but more complicated to get.

  
  
Posted one year ago

In production, we should use the clearml-helm-charts right? Docker-compose in the clearml-serving is more for local testing

  
  
Posted one year ago
653 Views
12 Answers
one year ago
one year ago
Tags