Hi VexedCat68
Yes the serving is a bit complicated. Let me try to explain the underlying setup, before going into more details.
clearml-serving CLI is a tool to launch / setup. (it does the configuration and enqueuing not the actual serving) control plan Task -> Storing the state of the serving (i.e. which end points needs to be served, what models are used, collects stats). This Task has no actual communication with the serving requests/replies (Running on the services queue) Serving Task -> actual Task doing the serving (supports multiple instances). This is where the requested are routed to, and where the inference happens. It pulls the configuration from the controlplan Task, and configure itself based on it. it also reports back stats to the controlplan on its performance. This is where The Triton Engine is running, inside the Triton container with clearml-running inside the same container pulling the actual models and feeding them to Triton server (Running on a GPU/CPU queue)
Does that make sense ?