Basically, deleting worker_stats_*
and queue_metrics_*
is perfectly safe. I think you'll solve your space issues by deleting those 🙂
What do you mean by drop of many GB? Can you please elaborate on what happens exactly?
I know that elastic can sometimes create disk corruptions and requires regular backups..
SubstantialElk6 it basically depends on the amount of data you store there... There's no server-side process that should suddenly impact the ES storage. I would start by listing the ES indices and deleting any old ones that are not needed any more (for example, old queue metrics and worker stats)
Thanks SuccessfulKoala55 , how might I do this clean up? Does this increase with more use of ClearML? And to add, we save all artifacts onto a remote S3 server.
I would suggest to first look at the indices list and decide. In general - if that data is related to experiments, and you do not want to delete them (which makes sense), than yes - more disk space.
Well, some indices contain experiment data (metrics) which you clean up by deleting (or resetting) experiments.
Other indices, which are indeed added over time, hold historical data and can be deleted.
You can start by doing curl
http://localhost:9200/_cat/indices?v=true to see the list of indices - you can post it here if you'd like 🙂
Thank you SuccessfulKoala55 Is there a flag in docker compose that we can include to only let elastisearch store 2-3 months of indices and clear it when it's longer than 3 months?
Hi SuccessfulKoala55 can I check is it possible to remove events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b
? what does it actually store
I'm afraid elasticsearch doesn't have this option, but it can be handled by a small daily (or monthly) maintenance Cron script using a few simple curl commands
ok thanks. this would mean that increasing the disk space for my ClearML is the only option as we are not at liberty to delete.
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2021-10 rDH57uOvTOCoRpUv53Ub2g 1 1 12288020 0 768.8mb 768.8mb yellow open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2022-01 lBQrjobDSf-7peKdOX8tlw 1 1 11067622 0 681.9mb 681.9mb yellow open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2021-11 2U6CliNdTiqd0VaSiTdBLQ 1 1 10634974 0 650.8mb 650.8mb yellow open events-plot- rJWReTYsSTKpFkps1AB1qA 1 1 161 0 362.8kb 362.8kb red open events-log-d1bd92a3b039400cbafc60a7a5b1e52b PSIKjKrKR9OsCVJ4IFd78w 1 1 yellow open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2021-12 GlKMe1iSTa-s0L1HibtHcQ 1 1 10357452 0 630.5mb 630.5mb yellow open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2022-02 2GLFKyrrR3O0ikq2eSw9Pw 1 1 6172245 0 373.5mb 373.5mb yellow open events-log- iAbKcLsrQ1ecVlD4vfeIFg 1 1 1387 0 314.7kb 314.7kb yellow open events-training_debug_image- ZlQoHuAfSh2nlCm00PmZEg 1 1 196 0 124.8kb 124.8kb yellow open events-plot-d1bd92a3b039400cbafc60a7a5b1e52b RJ81T9MsTZ-pNOZNisg1oQ 1 1 317444 10 4gb 4gb yellow open events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b lsf6IJ95RbasjxoLbdNAgw 1 1 106971071 2107378 13.9gb 13.9gb yellow open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2021-03 Vgr_NJ07RYGTDog_l1Lsaw 1 1 12914 0 676.8kb 676.8kb yellow open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2021-04 7DvSIRnpRguKwIAOMh7I7A 1 1 715087 0 37.4mb 37.4mb yellow open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2021-10 XzYvsbNxReuikWH_aPj92A 1 1 26028975 0 1.9gb 1.9gb yellow open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2022-01 8IhLc__BTDCUSamUOikGOA 1 1 20578483 0 1.4gb 1.4gb yellow open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2021-11 x6rpG5c3S4uBsnYGIrlfmg 1 1 20925616 1 1.5gb 1.5gb yellow open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2022-02 DO1iDZ2EQCOtBITunz1hRw 1 1 9236665 0 629mb 629mb yellow open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2021-12 qxyRBfWoSTScLi_maCW-JA 1 1 22121020 0 1.6gb 1.6gb yellow open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2021-07 _sbQX0HAT3WHj5CCOHwjGw 1 1 5404825 0 322.8mb 322.8mb yellow open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2021-08 7RAQ1VtDSR2H-ZmnBI0WUg 1 1 8850700 0 534.5mb 534.5mb yellow open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2021-05 PGlYuKRJQSeaXYPy46KmXw 1 1 1219959 0 67.6mb 67.6mb yellow open events-training_debug_image-d1bd92a3b039400cbafc60a7a5b1e52b _-NjgfrjQVGt6Xu12VjF6w 1 1 933336 27243 169.4mb 169.4mb yellow open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2021-06 3s4gCWojToqKTXe1TamzAA 1 1 1531239 0 88.8mb 88.8mb yellow open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2021-06 21l9rDA2TkuMK0vtj2YUfg 1 1 3620978 0 259.5mb 259.5mb yellow open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2021-07 GysxIqZTRNanaprJFY_xLA 1 1 13820105 0 1gb 1gb yellow open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2021-09 eGmfj0rCT6aajIc5rZ88jw 1 1 12272473 0 765.7mb 765.7mb yellow open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2021-08 QXBh1RSGTguL2_h59YLOvw 1 1 19336383 0 1.4gb 1.4gb yellow open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2021-09 zdjlsiBmTqarhYexRLC9aQ 1 1 23450059 0 1.6gb 1.6gb yellow open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2021-03 Lq01ZCD1QQi8ABCQNHJ4yQ 1 1 13008 0 786.6kb 786.6kb yellow open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2021-04 LxztXhOlTkyVb7eXOkP3bA 1 1 947917 0 61.6mb 61.6mb yellow open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2021-05 Ke8b8Gd9Sy6xlw6T5vyhpA 1 1 1553156 0 99.7mb 99.7mb yellow open events-training_stats_scalar- HSpWH1c9T52EsbnVSCYL_w 1 1 3312 0 455.9kb 455.9kb
This is the logs that we extracted from the elastic-search docker image. Can I ask which are safe to delete?