SuccessfulKoala55 I deleted all :monitor:machine
and :monitor:gpu
series, but only deleted ~20M documents out of 320M documents in the events-training_debug_image-xyz
. I would like now to understand which experiments contain most of the document to delete them. I would like to aggregate the number of document per experiment. Is there a way do that using the ES REST api?
however 504 is very extreme, I'm not sure it's related to the timeout on the server side, you might want to increase the ELB timeout
There is no way to filter on long types? I canโt believe it
Maybe the agent could be adapted to have a max_batch_size parameter?
But I would need to reindex everything right? Is that a expensive operation?
You might need to specify number of buckets if you don't get all of the experiments, but since it's a single shard, I think it'll be ordered by descending bucket size anyway
but the issue was the shard not being active, it's not the number of documents
So it looks like it tries to register a batch of 500 documents
Why not do:curl "localhost:9200/events-training_stats_scalar-adx3r00cad1bdfvsw2a3b0sa5b1e52b/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "bool": { "must": [ { "match": { "variant": "loss_model" } } ] } }, "aggs": { "terms": { "field": "task" } } }
For all tasks?
SuccessfulKoala55 Thanks to that I was able to identify the most expensive experiments. How can I count the number of documents for a specific series? Ie. I suspect that the loss, that is logged every iteration, is responsible for most of the documents logged, and I want to make sure of that
This https://stackoverflow.com/questions/65109764/wildcard-search-issue-with-long-datatype-in-elasticsearch says long types can be converted to string to do the search
Just do a sub-aggregation for the metric
field (and if you like more details, a sub-sub aggregation for the variant
field)
"Can only use wildcard queries on keyword and text fields - not on [iter] which is of type [long]"
Ok, I guess Iโll just delete the whole loss series. Thanks!
well I still see some ES errors in the logsclearml-apiserver | [2021-07-07 14:02:17,009] [9] [ERROR] [clearml.service_repo] Returned 500 for events.add_batch in 65750ms, msg=General data error: err=('500 document(s) failed to index.', [{'index': {'_index': 'events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b', '_type': '_doc', '_id': 'c2068648d2fe5da975665985f44c20b6', 'status':..., extra_info=[events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b][0]] containing [500] requests and a refresh]
Why do you do aggs
on the "iter" field?
Something like that?curl "localhost:9200/events-training_stats_scalar-adx3r00cad1bdfvsw2a3b0sa5b1e52b/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "bool": { "must": [ { "match": { "variant": "loss_model" } }, { "match": { "task": "8f88e4b8cff84f23bde74ed4b7213ec6" } } ] } }, "aggs": { "series": { "terms": { "field": "iter" } } } } '
From what I can find there's a prefix query, but not a suffix - this can be done using a regex or a wildcard, but that's relatively expensive
Yeah, should be this:GET /_search { "aggs": { "tasks": { "terms": { "field": "task" } } } }
See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html
I guess using a delete by query with a match on the field value suffix or something similar?
Hmm, that's something I don't know ๐
Now, I know the experiments having the most metrics. I want to downsample these metrics by 10, ie only keep iterations that are multiple of 10. How can I query (to delete) only the documents ending with 0?
Here I have to do it for each task, is there a way to do it for all tasks at once?