Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, We Have Recurring Disk Space Issues On Our Clearml Server (Drop Of Many Gb In A Few Days). After Some Analysis, We Noted

Hi, we have recurring disk space issues on our ClearML server (Drop of many GB in a few days). After some analysis, we noted /opt/clearml/data/elastic_7 to be the issues. Our ClearML version is 1.1.1-135 , 1.1.1-2.14.
Is this common? What can we do to limit this. Looks like index and translog under elastic_7 folder has the worst impact thus far.

  
  
Posted 2 years ago
Votes Newest

Answers 11


health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2021-10 rDH57uOvTOCoRpUv53Ub2g 1 1 12288020 0 768.8mb 768.8mb yellow open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2022-01 lBQrjobDSf-7peKdOX8tlw 1 1 11067622 0 681.9mb 681.9mb yellow open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2021-11 2U6CliNdTiqd0VaSiTdBLQ 1 1 10634974 0 650.8mb 650.8mb yellow open events-plot- rJWReTYsSTKpFkps1AB1qA 1 1 161 0 362.8kb 362.8kb red open events-log-d1bd92a3b039400cbafc60a7a5b1e52b PSIKjKrKR9OsCVJ4IFd78w 1 1 yellow open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2021-12 GlKMe1iSTa-s0L1HibtHcQ 1 1 10357452 0 630.5mb 630.5mb yellow open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2022-02 2GLFKyrrR3O0ikq2eSw9Pw 1 1 6172245 0 373.5mb 373.5mb yellow open events-log- iAbKcLsrQ1ecVlD4vfeIFg 1 1 1387 0 314.7kb 314.7kb yellow open events-training_debug_image- ZlQoHuAfSh2nlCm00PmZEg 1 1 196 0 124.8kb 124.8kb yellow open events-plot-d1bd92a3b039400cbafc60a7a5b1e52b RJ81T9MsTZ-pNOZNisg1oQ 1 1 317444 10 4gb 4gb yellow open events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b lsf6IJ95RbasjxoLbdNAgw 1 1 106971071 2107378 13.9gb 13.9gb yellow open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2021-03 Vgr_NJ07RYGTDog_l1Lsaw 1 1 12914 0 676.8kb 676.8kb yellow open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2021-04 7DvSIRnpRguKwIAOMh7I7A 1 1 715087 0 37.4mb 37.4mb yellow open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2021-10 XzYvsbNxReuikWH_aPj92A 1 1 26028975 0 1.9gb 1.9gb yellow open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2022-01 8IhLc__BTDCUSamUOikGOA 1 1 20578483 0 1.4gb 1.4gb yellow open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2021-11 x6rpG5c3S4uBsnYGIrlfmg 1 1 20925616 1 1.5gb 1.5gb yellow open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2022-02 DO1iDZ2EQCOtBITunz1hRw 1 1 9236665 0 629mb 629mb yellow open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2021-12 qxyRBfWoSTScLi_maCW-JA 1 1 22121020 0 1.6gb 1.6gb yellow open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2021-07 _sbQX0HAT3WHj5CCOHwjGw 1 1 5404825 0 322.8mb 322.8mb yellow open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2021-08 7RAQ1VtDSR2H-ZmnBI0WUg 1 1 8850700 0 534.5mb 534.5mb yellow open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2021-05 PGlYuKRJQSeaXYPy46KmXw 1 1 1219959 0 67.6mb 67.6mb yellow open events-training_debug_image-d1bd92a3b039400cbafc60a7a5b1e52b _-NjgfrjQVGt6Xu12VjF6w 1 1 933336 27243 169.4mb 169.4mb yellow open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2021-06 3s4gCWojToqKTXe1TamzAA 1 1 1531239 0 88.8mb 88.8mb yellow open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2021-06 21l9rDA2TkuMK0vtj2YUfg 1 1 3620978 0 259.5mb 259.5mb yellow open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2021-07 GysxIqZTRNanaprJFY_xLA 1 1 13820105 0 1gb 1gb yellow open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2021-09 eGmfj0rCT6aajIc5rZ88jw 1 1 12272473 0 765.7mb 765.7mb yellow open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2021-08 QXBh1RSGTguL2_h59YLOvw 1 1 19336383 0 1.4gb 1.4gb yellow open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2021-09 zdjlsiBmTqarhYexRLC9aQ 1 1 23450059 0 1.6gb 1.6gb yellow open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2021-03 Lq01ZCD1QQi8ABCQNHJ4yQ 1 1 13008 0 786.6kb 786.6kb yellow open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2021-04 LxztXhOlTkyVb7eXOkP3bA 1 1 947917 0 61.6mb 61.6mb yellow open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2021-05 Ke8b8Gd9Sy6xlw6T5vyhpA 1 1 1553156 0 99.7mb 99.7mb yellow open events-training_stats_scalar- HSpWH1c9T52EsbnVSCYL_w 1 1 3312 0 455.9kb 455.9kbThis is the logs that we extracted from the elastic-search docker image. Can I ask which are safe to delete?

  
  
Posted 2 years ago

Well, some indices contain experiment data (metrics) which you clean up by deleting (or resetting) experiments.
Other indices, which are indeed added over time, hold historical data and can be deleted.
You can start by doing curl http://localhost:9200/_cat/indices?v=true to see the list of indices - you can post it here if you'd like 🙂

  
  
Posted 2 years ago

ok thanks. this would mean that increasing the disk space for my ClearML is the only option as we are not at liberty to delete.

  
  
Posted 2 years ago

Thank you SuccessfulKoala55 Is there a flag in docker compose that we can include to only let elastisearch store 2-3 months of indices and clear it when it's longer than 3 months?

  
  
Posted 2 years ago

SubstantialElk6 it basically depends on the amount of data you store there... There's no server-side process that should suddenly impact the ES storage. I would start by listing the ES indices and deleting any old ones that are not needed any more (for example, old queue metrics and worker stats)

  
  
Posted 2 years ago

I'm afraid elasticsearch doesn't have this option, but it can be handled by a small daily (or monthly) maintenance Cron script using a few simple curl commands

  
  
Posted 2 years ago

What do you mean by drop of many GB? Can you please elaborate on what happens exactly?

I know that elastic can sometimes create disk corruptions and requires regular backups..

  
  
Posted 2 years ago

Hi SuccessfulKoala55 can I check is it possible to remove events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b ? what does it actually store

  
  
Posted 2 years ago

Basically, deleting worker_stats_* and queue_metrics_* is perfectly safe. I think you'll solve your space issues by deleting those 🙂

  
  
Posted 2 years ago

I would suggest to first look at the indices list and decide. In general - if that data is related to experiments, and you do not want to delete them (which makes sense), than yes - more disk space.

  
  
Posted 2 years ago

Thanks SuccessfulKoala55 , how might I do this clean up? Does this increase with more use of ClearML? And to add, we save all artifacts onto a remote S3 server.

  
  
Posted 2 years ago
907 Views
11 Answers
2 years ago
one year ago
Tags