Hmm, that's something I don't know ๐
There is no way to filter on long types? I canโt believe it
"Can only use wildcard queries on keyword and text fields - not on [iter] which is of type [long]"
I guess using a delete by query with a match on the field value suffix or something similar?
Ok, I guess Iโll just delete the whole loss series. Thanks!
but the issue was the shard not being active, it's not the number of documents
So it looks like it tries to register a batch of 500 documents
This https://stackoverflow.com/questions/65109764/wildcard-search-issue-with-long-datatype-in-elasticsearch says long types can be converted to string to do the search
But I would need to reindex everything right? Is that a expensive operation?
You might need to specify number of buckets if you don't get all of the experiments, but since it's a single shard, I think it'll be ordered by descending bucket size anyway
SuccessfulKoala55 Thanks to that I was able to identify the most expensive experiments. How can I count the number of documents for a specific series? Ie. I suspect that the loss, that is logged every iteration, is responsible for most of the documents logged, and I want to make sure of that
From what I can find there's a prefix query, but not a suffix - this can be done using a regex or a wildcard, but that's relatively expensive
Just do a sub-aggregation for the  metric  field (and if you like more details, a sub-sub aggregation for the  variant  field)
Yeah, should be this:GET /_search { "aggs": { "tasks": { "terms": { "field": "task" } } } }See  https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html
Why do you do  aggs  on the "iter" field?
Something like that?curl "localhost:9200/events-training_stats_scalar-adx3r00cad1bdfvsw2a3b0sa5b1e52b/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "bool": { "must": [ { "match": { "variant": "loss_model" } }, { "match": { "task": "8f88e4b8cff84f23bde74ed4b7213ec6" } } ] } }, "aggs": { "series": { "terms": { "field": "iter" } } } } '
well I still see some ES errors in the logsclearml-apiserver | [2021-07-07 14:02:17,009] [9] [ERROR] [clearml.service_repo] Returned 500 for events.add_batch in 65750ms, msg=General data error: err=('500 document(s) failed to index.', [{'index': {'_index': 'events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b', '_type': '_doc', '_id': 'c2068648d2fe5da975665985f44c20b6', 'status':..., extra_info=[events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b][0]] containing [500] requests and a refresh]
Now, I know the experiments having the most metrics. I want to downsample these metrics by 10, ie only keep iterations that are multiple of 10. How can I query (to delete) only the documents ending with 0?
SuccessfulKoala55  I deleted all  :monitor:machine  and  :monitor:gpu  series, but only deleted ~20M documents out of 320M documents in the  events-training_debug_image-xyz  . I would like now to understand which experiments contain most of the document to delete them. I would like to aggregate the number of document per experiment. Is there a way do that using the ES REST api?
Here I have to do it for each task, is there a way to do it for all tasks at once?
however 504 is very extreme, I'm not sure it's related to the timeout on the server side, you might want to increase the ELB timeout
Why not do:curl "localhost:9200/events-training_stats_scalar-adx3r00cad1bdfvsw2a3b0sa5b1e52b/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "bool": { "must": [ { "match": { "variant": "loss_model" } } ] } }, "aggs": { "terms": { "field": "task" } } }For all tasks?
Maybe the agent could be adapted to have a max_batch_size parameter?