Hi, i'm gonna hijack this thread a bit. My community uses ClearML and is looking at various model deployment strategies. We are looking at a seamless integration with Triton but noted they Triton does not support deployment strategies. ClearML-Serving seems to but the strategies are rather limited. Is there a roadmap to expand Clearml-serving?
SubstantialElk6 when you say "Triton does not support deployment strategies" what exactly do you mean?
BTW: updated documentation already up here:
This is the latest clearml-serving (stable release at GTC at the end of the month)
Generally speaking, clearml-sering is a control plane, preprocessing, ML inference, with Nvidia Triton for DL inference (fully transparent).
It allows you to spin an entire fully dynamic & scalable serving on top of k8s cluster. Once you spin the base containers, you can configure them live with a CLI, this includes adding new endpoint model serving including preprocessing code.