Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Everyone, Has Anyone Ever Had Issues With An Elasticsearch Index Being Corrupted? We Are Unable To Load The "Scalars" Tab On Any Experiment Without Getting The Error

Hi everyone, has anyone ever had issues with an Elasticsearch index being corrupted? We are unable to load the "scalars" tab on any experiment without getting the error Error 100 : General data error (ApiError(503, 'search_phase_execution_exception', None)) . Diving into this a bit more and running curl None gives us the following

{
  "note" : "No shard was specified in the explain API request, so this response explains a randomly chosen unassigned shard. There may be other unassigned shards in this cluster which cannot be assigned for different reasons. It may not be possible to assign this shard until one of the other shards is assigned correctly. To explain the allocation of other shards (whether assigned or unassigned) you must specify the target shard in the request to this API.",
  "index" : "events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b",
  "shard" : 0,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "CLUSTER_RECOVERED",
    "at" : "2024-08-28T09:57:26.523Z",
    "last_allocation_status" : "no_valid_shard_copy"
  },
  "can_allocate" : "no_valid_shard_copy",
  "allocate_explanation" : "cannot allocate because all found copies of the shard are either stale or corrupt",
  "node_allocation_decisions" : [
    {
      "node_id" : "85c1ZE3gTrqvov4AY2LXnQ",
      "node_name" : "clearml",
      "transport_address" : "172.19.0.4:9300",
      "node_attributes" : {
        "ml.machine_memory" : "67360030720",
        "xpack.installed" : "true",
        "transform.node" : "true",
        "ml.max_open_jobs" : "512",
        "ml.max_jvm_size" : "33285996544"
      },
      "node_decision" : "no",
      "store" : {
        "in_sync" : true,
        "allocation_id" : "0mE00e0yQSyTtJGQSPeJeQ",
        "store_exception" : {
          "type" : "corrupt_index_exception",
          "reason" : "failed engine (reason: [merge failed]) (resource=preexisting_corruption)",
          "caused_by" : {
            "type" : "i_o_exception",
            "reason" : "failed engine (reason: [merge failed])",
            "caused_by" : {
              "type" : "corrupt_index_exception",
              "reason" : "checksum failed (hardware problem?) : expected=f0199c51 actual=508854e (resource=BufferedChecksumIndexInput(MMapIndexInput(path=\"/usr/share/elasticsearch/data/nodes/0/indices/DIrYFcq5SW6yCFUBVwV-SQ/0/index/_lvu1b.cfs\") [slice=_lvu1b.fdt]))"
            }
          }
        }
      }
    }
  ]
}

I can also see the corrupted file with

$ ls /opt/clearml/data/elastic_7/nodes/0/indices/DIrYFcq5SW6yCFUBVwV-SQ/0/index/
corrupted_A6n_6MHlRcyDZ68HdB7B6w ...

Does anyone know why this might have happened/if there is anyway to recover the index to avoid dataloss? Many thanks 🙂

  
  
Posted 5 months ago
Votes Newest

Answers 2


Hi @<1625666182751195136:profile|MysteriousParrot48> , I'm afraid that this looks like a pure ElasticSearch issue, I'd suggest checking on ES forums for help on this

  
  
Posted 5 months ago

Thanks @<1523701070390366208:profile|CostlyOstrich36> , I thought it might be! I'll have a look over there

  
  
Posted 5 months ago
327 Views
2 Answers
5 months ago
5 months ago
Tags