Besides that, what are your impressions on these serving engines? Are they much better than just creating my own API + ONNX or even my own API + normal Pytorch inference?
I would separate ML frameworks from DL frameworks.
With ML frameworks, the main advantage is multi-model serving on a single container, which is more cost effective when it comes to multiple model serving. As well as the ability to quickly update models from the clearml model repository (just tag + publish and the endpoint serving ill auto refresh themselves). There is no actual inference performance per model, but globally it is more efficient.
With DL, obviously all the ML advantages hold, but the main value is the fact we separate the preprocessing to a CPU instance and DL to GPU instance, and this is a huge performance boost. On top, we have the fact that the GPU instance can serve multiple models at the same time (again cost effective). The actual DL model inference boost comes from using Triton as an engine, Nvidia works hard for it to be super optimized in inference and they did a great job with it.