Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Hey All, We Have A Self-Hosted Clearml Server, Today We Launched ~40 Workers To Run Training Jobs On Our Queue, However We Started Getting Errors With Elasticsearch Connection Pool Being Full, And Subsequent Timeouts And Failed Tasks. Each Training Task

Hey all,
We have a self-hosted clearml server, today we launched ~40 workers to run training jobs on our queue, however we started getting errors with elasticsearch connection pool being full, and subsequent timeouts and failed tasks.

Each training task is plotting a lot of scalar metrics each step. We noticed the servers disk usage was very high, especially the read. Although CPU and memory usage were up, they didn’t seem concerning.

I’m wondering if anyone has an understanding of how elasticsearch is being utilised in clearnl server, is each single scalar being reported in its own api call followed by its own elasticsearch transaction? Or are metrics collated and processed in batches?

We noticed in the error messages that the elasticseach connection pool is only 10? Is there anyway to increase this?

Thanks in advance 😊

Posted 9 months ago
Votes Newest

Answers 2

I'd like to understand this as well. I moved my data & model versioning to AWS S3. So, can I get rid of the fileserver? Can I use Cloudwatch to work with logs rather than (what I assume is being done by) Elasticsearch?

Posted 9 months ago

Additionally , I’d like to understand what is being stored in elasticsearch vs mongo, redis etc. from my understanding it is the metrics and console logs being stored in elastic?

I’m thinking the solution may be to reduce the amount of metrics logged by averaging them locally and only reporting them once every 60s or so?

Or is there a way to tune the config of elastic, allowing it to handle the high volume of requests

Posted 9 months ago
2 Answers
9 months ago
9 months ago