Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi All, I Have An Elasticsearch Problem On My Clearml Server. The Error Message I Get On The Clearml Webapp Is

Hi all, I have an Elasticsearch problem on my ClearML server. The error message I get on the ClearML webapp is General data error (TransportError(503, 'search_phase_execution_exception')) , which appears on any operation that uses elasticsearch. I have looked into the elasticsearch docker container and there is an index with status red. In general this issue occurred after the clearml server ran out of disk storage last night. Logs will be found in the thread

  
  
Posted 3 years ago
Votes Newest

Answers 30


I increased already the memory to 8GB after reading similar issues here on the slack`

Just making sure, how exactly did you do that?

  
  
Posted 3 years ago

That's it? no apparent error?

  
  
Posted 3 years ago

Also, 

 how much memory is allocated for ES? (it's in the docker-compose file)

I increased already the memory to 8GB after reading similar issues here on the slack

  
  
Posted 3 years ago

I meant sudo docker logs clearml-elastic

  
  
Posted 3 years ago

Solving the replica issue now allowed me to get better insights into why the one index is red.
{ "index" : "events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b", "shard" : 0, "primary" : true, "current_state" : "unassigned", "unassigned_info" : { "reason" : "CLUSTER_RECOVERED", "at" : "2021-11-09T22:30:47.018Z", "last_allocation_status" : "no_valid_shard_copy" }, "can_allocate" : "no_valid_shard_copy", "allocate_explanation" : "cannot allocate because all found copies of the shard are either stale or corrupt", "node_allocation_decisions" : [ { "node_id" : "CldaHbiyQWaNcpWtVab35w", "node_name" : "clearml", "transport_address" : "172.28.0.5:9300", "node_attributes" : { "ml.machine_memory" : "34244124672", "xpack.installed" : "true", "ml.max_open_jobs" : "20" }, "node_decision" : "no", "store" : { "in_sync" : true, "allocation_id" : "emdrRuHVQ8asg5LU_HVkGw", "store_exception" : { "type" : "corrupt_index_exception", "reason" : "failed engine (reason: [refresh failed source[refresh_flag_index]]) (resource=preexisting_corruption)", "caused_by" : { "type" : "i_o_exception", "reason" : "failed engine (reason: [refresh failed source[refresh_flag_index]])", "caused_by" : { "type" : "corrupt_index_exception", "reason" : "codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path=\"/usr/share/elasticsearch/data/nodes/0/indices/T5e15fRWTvW69oI3Cm2BeQ/0/index/_e1il.fdt\")))" } } } } } ] }

  
  
Posted 3 years ago

since it is a single node, I guess it will not possible to recover or partially recover the index right?

  
  
Posted 3 years ago

So is this a corrupt storage issue?

  
  
Posted 3 years ago

I will try to recover it, but anyway the learning is to fully separate the fileserver and any output location from mongo, redis and elastic. Also maybe it makes sense the improve the ES setup to have replicas

  
  
Posted 3 years ago

I usually use different partitions. The replicas are always a good idea, but they do require more memory and disk space, so this is not in the default configuration

  
  
Posted 3 years ago

Cool! 🙂

  
  
Posted 3 years ago

I increased already the memory to 8GB after reading similar issues here on the slack`

Just making sure, how exactly did you do that?

docker-compose down
elasticsearch: networks: - backend container_name: clearml-elastic environment: ES_JAVA_OPTS: -Xms8g -Xmx8g `` docker-compose up -d

  
  
Posted 3 years ago

I'm not sure, but it's possible you can't recover it - 100% disk usage is always a major problem

  
  
Posted 3 years ago

Try to restart ES and see if it helps

docker-compose down / up does not help

  
  
Posted 3 years ago

, what version of clearml is your server?

the docker-compose use clearml:latest

  
  
Posted 3 years ago

Very good news!

  
  
Posted 3 years ago

root@ubuntu:/opt/clearml# sudo docker logs clearml-elastic OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release. {"type": "server", "timestamp": "2021-11-09T12:49:13,403Z", "level": "INFO", "component": "o.e.e.NodeEnvironment", "cluster.name": "clearml", "node.name": "clearml", "message": "using [1] data paths, mounts [[/usr/share/elasticsearch/data (//some_ip/clearml-server-data)]], net usable_space [3.4tb], net total_space [6.9tb], types [cifs]" } {"type": "server", "timestamp": "2021-11-09T12:49:13,407Z", "level": "INFO", "component": "o.e.e.NodeEnvironment", "cluster.name": "clearml", "node.name": "clearml", "message": "heap size [7.9gb], compressed ordinary object pointers [true]" } {"type": "server", "timestamp": "2021-11-09T12:49:14,529Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "clearml", "node.name": "clearml", "message": "node name [clearml], node ID [CldaHbiyQWaNcpWtVab35w], cluster name [clearml]" } {"type": "server", "timestamp": "2021-11-09T12:49:14,529Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "clearml", "node.name": "clearml", "message": "version[7.6.2], pid[1], build[default/docker/ef48eb35cf30adf4db14086e8aabd07ef6fb113f/2020-03-26T06:34:37.794943Z], OS[Linux/5.4.0-89-generic/amd64], JVM[AdoptOpenJDK/OpenJDK 64-Bit Server VM/13.0.2/13.0.2+8]" } {"type": "server", "timestamp": "2021-11-09T12:49:14,530Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "clearml", "node.name": "clearml", "message": "JVM home [/usr/share/elasticsearch/jdk]" } {"type": "server", "timestamp": "2021-11-09T12:49:14,530Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "clearml", "node.name": "clearml", "message": "JVM arguments [-Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dio.netty.allocator.numDirectArenas=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.locale.providers=COMPAT, -Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -Djava.io.tmpdir=/tmp/elasticsearch-8140206772120400095, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=logs/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -Des.cgroups.hierarchy.override=/, -Xms8g, -Xmx8g, -XX:MaxDirectMemorySize=4294967296, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/usr/share/elasticsearch/config, -Des.distribution.flavor=default, -Des.distribution.type=docker, -Des.bundled_jdk=true]" }

  
  
Posted 3 years ago

docker-compose down / up does not help

Did you wait for all the other indices to reach yellow status?

  
  
Posted 3 years ago

Yes, this happened when the disk got filled up to 100%

  
  
Posted 3 years ago

ssh into the elasticsearch container identify the id of the index that seem to be broken run /usr/share/elasticsearch/jdk/bin/java -cp lucene-core*.jar -ea:org.apache.lucene… org.apache.lucene.index.CheckIndex /usr/share/elasticsearch/data/nodes/0/indices/your-id/0/index/ -verbose -exorcise This can be dangerous but is the only option if you assume that the data is lost anyway. either running 3. repairs broken segments or it shows as in my case No problems were detected with this index. If it shows "no problems detected" just go to the index folder and remove any file starting with corrupted_* restart elasticsearch the whole cluster turns green

  
  
Posted 3 years ago

using top inside the elasticsearch container shows elastic+ 20  0  17.0g  8.7g 187584 S  2.3 27.2  1:09.18 java that the 8g are reserved. So setting ES_JAVA_OPTS: -Xms8g -Xmx8g should work.

  
  
Posted 3 years ago

Well, I just took a look at the log, and it looks like the configuration is for 1GB only (see -Xms1g, -Xmx1g ) - perhaps that's the reason?

  
  
Posted 3 years ago

Did you wait for all the other indices to reach yellow status?

yes I waited until everything was yellow

  
  
Posted 3 years ago

I'm not entirely sure, but it may help

  
  
Posted 3 years ago

elasticsearch: networks: - backend container_name: clearml-elastic environment: ES_JAVA_OPTS: -Xms8g -Xmx8g bootstrap.memory_lock: "true" cluster.name: clearml cluster.routing.allocation.node_initial_primaries_recoveries: "500" cluster.routing.allocation.disk.watermark.low: 500mb cluster.routing.allocation.disk.watermark.high: 500mb cluster.routing.allocation.disk.watermark.flood_stage: 500mb discovery.zen.minimum_master_nodes: "1" discovery.type: "single-node" http.compression_level: "7" node.ingest: "true" node.name: clearml reindex.remote.whitelist: '*.*' xpack.monitoring.enabled: "false" xpack.security.enabled: "false" ulimits: memlock: soft: -1 hard: -1 nofile: soft: 65536 hard: 65536 image: docker.elastic.co/elasticsearch/elasticsearch:7.6.2 restart: unless-stopped ports: - "9200:9200" volumes: - /storage/data/elastic_7:/usr/share/elasticsearch/data - /usr/share/elasticsearch/logs

  
  
Posted 3 years ago

curl -XPUT -H 'Content-Type: application/json' 'localhost:9200/_settings' -d '{"index" : {"number_of_replicas" : 0}}This command made all my indices beside the broken one which is still red, come green again. It comes from https://stackoverflow.com/questions/63403972/elasticsearch-index-in-red-health/63405623#63405623 .

  
  
Posted 3 years ago

Can you send some more comprehensive log - perhaps there are other messages that are related

which logs do you wish?

  
  
Posted 3 years ago

The output seen above indicates that the index is corrupt and probably lost, but that is not necessary the case

  
  
Posted 3 years ago

That's it? no apparent error?

After the logs on the top there was only logs on "info" level with PluginsService

  
  
Posted 3 years ago

SuccessfulKoala55 so you say deleting other old indices that I don't need could help?

  
  
Posted 3 years ago

 so you say deleting other old indices that I don't need could help?

This did not help, I still have the same issue

  
  
Posted 3 years ago