ConvolutedSealion94 Let me try to explain how it works, I hope this will help in debugging.
There are two different entities here
Clearml-server: In this context clearml server acts as a control-plane, it stores configuration on the different endpoints, models, preprocessign code etc. It does Not perform any compute or serving clearml-serving-inference https://github.com/allegroai/clearml-serving/blob/e09e6362147da84e042b3c615f167882a58b8ac7/docker/docker-compose-triton-gpu.yml#L77 . This is the actual container that does the serving, serving multiple models from different endpoint The docker-compose (or helm chart) That spins the clearml-serving-inference
. Since the design supports multiple different sets of clearml-serving-inference
(i.e. each one can server different sets of models, imagine different frameworks, or HW requirements etc.). For each copy of clearml-serving-inference
you need to specify which models it needs to serve, this is the Clearml-Serving Session ID
This is the UID that points to the actual Task that stores the configuration for This specific clearml-serving-inference
. You can have multiple instances of
clearml-serving-inference for load balancing, but I will not get into that here.Basically the CLI (i.e. clearml-serving command line) is configuring the clearml-server (i.e. the controlplane), it does Not however spin the actual serving containers (
clearml-serving-inference ) it configures them. In order to configure a specific
clearml-serving-inference
the CLI needs to specify the correct "clearml-serving sesison ID" that this container was spinned with ( https://github.com/allegroai/clearml-serving/blob/e09e6362147da84e042b3c615f167882a58b8ac7/docker/example.env#L6 )
Does this help ?