Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Hi All, We'Ve Been Using Clearml For Quite Some Time Now. Our Deployment Is A Simple Docker Container On A Dedicated Ec2 Instance. More Recently, We Have Been Running Out Of Storage, And It Looks To Be That Elasticsearch Is The Main Culprit. What Data I

Hi all,

We've been using ClearML for quite some time now. Our deployment is a simple docker container on a dedicated EC2 instance. More recently, we have been running out of storage, and it looks to be that ElasticSearch is the main culprit.

What data is stored in ElasticSearch, and are there any tips for implementing some kind of retention policy to ensure ElasticSearch does not consume too much storage? Weary of removing data which is essential for the backend to operate. Thanks in advance!

Posted one year ago
Votes Newest

Answers 2

Hi TenseOstrich47 ,
In CelarML Server ES does not contain management-critical data, but only raw (indexed) data, such as experiment metrics (plots, scalars, logs, debug image references) and performance statistics (queue usage statistics, workers metrics etc.).
Loosing ES data should not destabilize the server, but simply lose some historical data (not that this is a good thing 😕 ).
Since ES does not really provide any retention policy mechanisms, you can implement maintenance scripts yourself, to handle various aspects of data collection.
In general, indices used for queue metrics and workers stats can be safely deleted (they are usually rotated every month so you can probably always delete last-month's indices).
Task data (plots, scalars, logs and debug image references) is not rotated, and as such the only "nice" way of managing retention is deleting old or unwanted tasks (or resetting them, which will essentially clean all indexed data) - you can do that using a cron job that can query the server using the SDK, the Python APIClient or simply using the REST API

Posted one year ago

SuccessfulKoala55 thanks for your help as always. I will try to create a DAG on airflow using the SDK to implement some form of retention policy which removes things that are not necessary. We independently store metadata on artefacts we produce, and mostly use clearml as the experiment manager, so a lot of the events data can be cleared.

Posted one year ago
2 Answers
one year ago
one year ago