Unanswered
Hello Channel,
I Have A Question Regarding Clearml Serving In Production.
I Have Different Environments, And Different Models Each Of Them Linked To A Use Case.
I Would Like To Spin Up One Kubernetes Cluster (From Triton Gpu Docker Compose) Taking Into
Thanks ! So regarding question2, it means that I can spin up a K8s cluster with triton enabled, and by specifiying the type of model while creating the endpoint, it will use or not the triton engine.
Linked to that, Is the triton engine expecting the tensorrt
format or is it just an improvement step compared to other model weights ?
Finally, last question ( I swear 😛 ) : How is the serving on Kubernetes flow supposed to look like? Is it something like that:
- Create endpoint from clearml-serving CLI commands (uploaded to the clearml server)
- The K8s cluster is running ClearMl serving helm chart, an ingress controller is setup to create limk between outside world and cluster, and user make curl request to this ingress resource relinking the request to the clearml-serving-inference pod ? It is not clear to me. Many thanks
193 Views
0
Answers
one year ago
one year ago