There is no way to filter on long types? I canโt believe it
"Can only use wildcard queries on keyword and text fields - not on [iter] which is of type [long]"
I guess using a delete by query with a match on the field value suffix or something similar?
This https://stackoverflow.com/questions/65109764/wildcard-search-issue-with-long-datatype-in-elasticsearch says long types can be converted to string to do the search
SuccessfulKoala55 Thanks to that I was able to identify the most expensive experiments. How can I count the number of documents for a specific series? Ie. I suspect that the loss, that is logged every iteration, is responsible for most of the documents logged, and I want to make sure of that
Something like that?curl "localhost:9200/events-training_stats_scalar-adx3r00cad1bdfvsw2a3b0sa5b1e52b/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "bool": { "must": [ { "match": { "variant": "loss_model" } }, { "match": { "task": "8f88e4b8cff84f23bde74ed4b7213ec6" } } ] } }, "aggs": { "series": { "terms": { "field": "iter" } } } } '
Now, I know the experiments having the most metrics. I want to downsample these metrics by 10, ie only keep iterations that are multiple of 10. How can I query (to delete) only the documents ending with 0?
Here I have to do it for each task, is there a way to do it for all tasks at once?
however 504 is very extreme, I'm not sure it's related to the timeout on the server side, you might want to increase the ELB timeout
So it looks like it tries to register a batch of 500 documents
Hmm, that's something I don't know ๐
But I would need to reindex everything right? Is that a expensive operation?
well I still see some ES errors in the logsclearml-apiserver | [2021-07-07 14:02:17,009] [9] [ERROR] [clearml.service_repo] Returned 500 for events.add_batch in 65750ms, msg=General data error: err=('500 document(s) failed to index.', [{'index': {'_index': 'events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b', '_type': '_doc', '_id': 'c2068648d2fe5da975665985f44c20b6', 'status':..., extra_info=[events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b][0]] containing [500] requests and a refresh]
Why do you do  aggs  on the "iter" field?
Yeah, should be this:GET /_search { "aggs": { "tasks": { "terms": { "field": "task" } } } }See  https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html
but the issue was the shard not being active, it's not the number of documents
Ok, I guess Iโll just delete the whole loss series. Thanks!
Just do a sub-aggregation for the  metric  field (and if you like more details, a sub-sub aggregation for the  variant  field)
Maybe the agent could be adapted to have a max_batch_size parameter?
SuccessfulKoala55  I deleted all  :monitor:machine  and  :monitor:gpu  series, but only deleted ~20M documents out of 320M documents in the  events-training_debug_image-xyz  . I would like now to understand which experiments contain most of the document to delete them. I would like to aggregate the number of document per experiment. Is there a way do that using the ES REST api?
You might need to specify number of buckets if you don't get all of the experiments, but since it's a single shard, I think it'll be ordered by descending bucket size anyway
From what I can find there's a prefix query, but not a suffix - this can be done using a regex or a wildcard, but that's relatively expensive
Why not do:curl "localhost:9200/events-training_stats_scalar-adx3r00cad1bdfvsw2a3b0sa5b1e52b/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "bool": { "must": [ { "match": { "variant": "loss_model" } } ] } }, "aggs": { "terms": { "field": "task" } } }For all tasks?