Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello Channel, I Have A Question Regarding Clearml Serving In Production. I Have Different Environments, And Different Models Each Of Them Linked To A Use Case. I Would Like To Spin Up One Kubernetes Cluster (From Triton Gpu Docker Compose) Taking Into

Hello channel,

I have a question regarding clearml serving in production.

I have different environments, and different models each of them linked to a use case.
I would like to spin up one Kubernetes cluster (from triton gpu docker compose) taking into account all my use cases.
It means that I would have one endpoint per environment and model.

First question: Is this relevant and scalable? Can I have hundreds of endpoints?
Second question: My models are based on different frameworks (Pytorch/Huggingface/ Scikit Learn). Some of them require Triton, others do not. Is it possible to have one kubernetes cluster based on triton EVEN when some models do not use it? Would I need to transfer my scikit learn model file to tensorrt format?

Thank you a lot,

  
  
Posted one year ago
Votes Newest

Answers 4


To be honest, I'm not completely sure as I've never tried hundreds of endpoints myself. In theory, yes it should be possible, Triton, FastAPI and Intel OneAPI (ClearML building blocks) all claim they can handle that kind of load, but again, I've not tested it myself.

To answer the second question, yes! You can basically use the "type" of model to decide where it should be run. You always have the custom model option if you want to run it yourself too 🙂

  
  
Posted one year ago

@<1523701118159294464:profile|ExasperatedCrab78> do you have any inputs for this one? 🙂

  
  
Posted one year ago

Thanks ! So regarding question2, it means that I can spin up a K8s cluster with triton enabled, and by specifiying the type of model while creating the endpoint, it will use or not the triton engine.
Linked to that, Is the triton engine expecting the tensorrt format or is it just an improvement step compared to other model weights ?

Finally, last question ( I swear 😛 ) : How is the serving on Kubernetes flow supposed to look like? Is it something like that:

  • Create endpoint from clearml-serving CLI commands (uploaded to the clearml server)
  • The K8s cluster is running ClearMl serving helm chart, an ingress controller is setup to create limk between outside world and cluster, and user make curl request to this ingress resource relinking the request to the clearml-serving-inference pod ? It is not clear to me. Many thanks
  
  
Posted one year ago

On the helm charts clearml repos, can we use the clearml-serving chart alone ?

  
  
Posted one year ago