Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
1St Of All I Want To Thank The Developers Team Of Clearml - You Are Awesome And So Your Product Is! I Had Some Frustration Setting Up The Stuff And Still Struggle But All In All - It'S The Best Product On The Market Of Mlops. More Of An Information Reques

1st of all I want to thank the developers team of ClearML - you are awesome and so your product is!
I had some frustration setting up the stuff and still struggle but all in all - it's the best product on the market of MLOPS.
More of an information request - clearML install is featuring REDIS, Elastic and MongoDB I have not found in the docs what each one is responsible for.
I guess that :
REDIS is serving a queue for agents Elastic is making all metadata search possible And Mongo is storing some artifatcsCan you change/add details on that?

  
  
Posted 2 years ago
Votes Newest

Answers 4


Thanks Jake SuccessfulKoala55 !
I used to have problems with clearML agents and multi-GPU training with agents - have put it on hold.
Now my problem is with ClearML serving.
I have managed to run a demo https://clear.ml/docs/latest/docs/clearml_serving/clearml_serving_tutorial
But had problems :
clearml-serving --id c605bf64db3740989afdd9bee87e6353 model add --engine sklearn --endpoint "test_model_sklearn" --preprocess "examples/sklearn/preprocess.py" --name "initial model training" --project "serving in action"
This command never got completed successfully since it could not find a model - I am specifying there project and task name that was created while training a model with python3 examples/sklearn/train_model.py
I have managed to go further with specifying model-id parameter.
Otherwise I would get the following:
` $ clearml-serving --id c605bf64db3740989afdd9bee87e6353 model add --engine sklearn --endpoint "test_model_sklearn" --preprocess "examples/sklearn/preprocess.py" --name "initial model training" --project "serving in action"
clearml-serving - CLI for launching ClearML serving engine
Serving service Task c605bf64db3740989afdd9bee87e6353, Adding Model endpoint '/test_model_sklearn/'
Info: syncing model endpoint configuration, state hash=d3290336c62c7fb0bc8eb4046b60bc7f

Error: Could not fine any Model to serve {'project_name': 'serving in action', 'model_name': 'initial model training', 'tags': None, 'only_published': False, 'include_archived': False} `A minor stuff - change the error message to "Could not find any Model".
I think that problem was due to the fact that model name = experiment name + sklearn model - see the screenshot

All in all the test serving worked out and I have a bunch of tasks in DevOps project.
In tutorial I am lacking explanation what is happening under the hood, i.e.:
What does serving service controller do? What does all the containers do for servingContainer clearml-serving-alertmanager Container clearml-serving-inference Container clearml-serving-statistics I have a bunch of tasks running in DevOps projects - some with identical names - is it normal?

  
  
Posted 2 years ago

Hi GentleSwallow91 ! Thanks for the warm words! 🙏 😍
Basically:
Redis is used for various server temporary state caches and workers state management Elastic is used for metrics storage and indexing (logs, scalars, plots, debug images etc.) Mongo is used for all other metadata and artifact references Actual artifact data / debug images and model weights are stored in various object storage solutions, starting with the built-in ClearML fileserver and including S3, GCS, Azure storage and similar solutions.

  
  
Posted 2 years ago

Also, if you want to share any frustration and feedback on setting up, we're always looking to improve and provide a better experience 🙂

  
  
Posted 2 years ago

Hi GentleSwallow91 let me try and answer your questions 😄

The serving service controller is basically, the main Task that controls the serving functionality itself. AFAIK: clearml-serving-alertmanager - a container that runs the alertmanager by prometheus ( https://prometheus.io/docs/alerting/latest/alertmanager/ ) clearml-serving-inference - the container that runs inference code clearml-serving-statistics - I believe that it runs software that reports to the prometheus reporting either generic statistics and user defined ones Are you sure it's not old runs that were not terminated? Or once your terminate your clearml-serving it closes all of them?

  
  
Posted 2 years ago