Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Again, My Clearml Api-Server Is Having A Memory Leak. Each Time I Restart It, Its Ram Consumption Grows Until Getting Oom, Is Not Killed And Make The Ec2 Instance Crash

Hi again, my clearml api-server is having a memory leak. Each time I restart it, its ram consumption grows until getting OOM, is not killed and make the ec2 instance crash

  
  
Posted 2 years ago
Votes Newest

Answers 30


Yeah, should be this:
GET /_search { "aggs": { "tasks": { "terms": { "field": "task" } } } }See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html

  
  
Posted 2 years ago

This https://stackoverflow.com/questions/65109764/wildcard-search-issue-with-long-datatype-in-elasticsearch says long types can be converted to string to do the search

  
  
Posted 2 years ago

well I still see some ES errors in the logs
clearml-apiserver | [2021-07-07 14:02:17,009] [9] [ERROR] [clearml.service_repo] Returned 500 for events.add_batch in 65750ms, msg=General data error: err=('500 document(s) failed to index.', [{'index': {'_index': 'events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b', '_type': '_doc', '_id': 'c2068648d2fe5da975665985f44c20b6', 'status':..., extra_info=[events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b][0]] containing [500] requests and a refresh]

  
  
Posted 2 years ago

however 504 is very extreme, I'm not sure it's related to the timeout on the server side, you might want to increase the ELB timeout

  
  
Posted 2 years ago

So it looks like it tries to register a batch of 500 documents

  
  
Posted 2 years ago

but not as much as the ELB reports

  
  
Posted 2 years ago

more than 120s?

  
  
Posted 2 years ago

SuccessfulKoala55 I deleted all :monitor:machine and :monitor:gpu series, but only deleted ~20M documents out of 320M documents in the events-training_debug_image-xyz . I would like now to understand which experiments contain most of the document to delete them. I would like to aggregate the number of document per experiment. Is there a way do that using the ES REST api?

  
  
Posted 2 years ago

SuccessfulKoala55 Thanks to that I was able to identify the most expensive experiments. How can I count the number of documents for a specific series? Ie. I suspect that the loss, that is logged every iteration, is responsible for most of the documents logged, and I want to make sure of that

  
  
Posted 2 years ago

Maybe the agent could be adapted to have a max_batch_size parameter?

  
  
Posted 2 years ago

You might need to specify number of buckets if you don't get all of the experiments, but since it's a single shard, I think it'll be ordered by descending bucket size anyway

  
  
Posted 2 years ago

Just do a sub-aggregation for the metric field (and if you like more details, a sub-sub aggregation for the variant field)

  
  
Posted 2 years ago

but the issue was the shard not being active, it's not the number of documents

  
  
Posted 2 years ago

Thanks a lot, I will play with that!

  
  
Posted 2 years ago

500 is relatively low...

  
  
Posted 2 years ago

Something like that?
curl "localhost:9200/events-training_stats_scalar-adx3r00cad1bdfvsw2a3b0sa5b1e52b/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "bool": { "must": [ { "match": { "variant": "loss_model" } }, { "match": { "task": "8f88e4b8cff84f23bde74ed4b7213ec6" } } ] } }, "aggs": { "series": { "terms": { "field": "iter" } } } } '

  
  
Posted 2 years ago

Why not do:
curl "localhost:9200/events-training_stats_scalar-adx3r00cad1bdfvsw2a3b0sa5b1e52b/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "bool": { "must": [ { "match": { "variant": "loss_model" } } ] } }, "aggs": { "terms": { "field": "task" } } }For all tasks?

  
  
Posted 2 years ago

Now, I know the experiments having the most metrics. I want to downsample these metrics by 10, ie only keep iterations that are multiple of 10. How can I query (to delete) only the documents ending with 0?

  
  
Posted 2 years ago

I guess using a delete by query with a match on the field value suffix or something similar?

  
  
Posted 2 years ago

Here I have to do it for each task, is there a way to do it for all tasks at once?

  
  
Posted 2 years ago

Hmm, that's something I don't know ๐Ÿ™‚

  
  
Posted 2 years ago

Why do you do aggs on the "iter" field?

  
  
Posted 2 years ago

Ha nice, good one! Thanks!

  
  
Posted 2 years ago

From what I can find there's a prefix query, but not a suffix - this can be done using a regex or a wildcard, but that's relatively expensive

  
  
Posted 2 years ago

"Can only use wildcard queries on keyword and text fields - not on [iter] which is of type [long]"

  
  
Posted 2 years ago

But I would need to reindex everything right? Is that a expensive operation?

  
  
Posted 2 years ago

There is no way to filter on long types? I canโ€™t believe it

  
  
Posted 2 years ago

Ok, I guess Iโ€™ll just delete the whole loss series. Thanks!

  
  
Posted 2 years ago

Reindex is very expensive ๐Ÿ™‚

  
  
Posted 2 years ago

Same for regexp, damn

  
  
Posted 2 years ago
686 Views
30 Answers
2 years ago
one year ago
Tags