DefeatedCrab47 this issue has repeated here several times - is caused by low disk space on your server machine causing g elastic search to go into a read-only mode
Just clear up more space on your server disk - by default elastic will switch to this mode when less than 5 percent of the disk is free
It seems to be related to trains-apiserver
, based on the log inside the Docker compose:
trains-apiserver | [2020-11-10 04:40:14,133] [8] [ERROR] [trains.service_repo] Returned 500 for queues.get_next_task in 20ms, msg=General data error: err=('1 document(s) failed to index.', [{'index': {'_index': 'queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2020-11', '_type': '_doc', '_id': 'rkh0sHUBwyiZSyeZUAov', 'status': 403, 'error': {'type': 'cluster_block_exception', 'reason': 'index [queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2020-11] blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];'}, 'data': {'timestamp': 1604983214115, 'queue': '789a8744857746de84db036d65de8c65', 'average_waiting_time': 0, 'queue_length': 0}}}]), extra_info=index [queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2020-11] blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];
Sure. Just for reference, here's a related GitHub issue: https://github.com/allegroai/trains-server/issues/58
SuccessfulKoala55 Thank you. I stared myself dead at trains-apiserver
, but by coincidence I found this message:trains-elastic | {"type": "server", "timestamp": "2020-11-10T06:11:08,956Z", "level": "WARN", "component": "o.e.c.r.a.DiskThresholdMonitor", "cluster.name": "trains", "node.name": "trains", "message": "flood stage disk watermark [95%] exceeded on [QyZ2i1mxTG6yR7uhVWjV9Q][trains][/usr/share/elasticsearch/data/nodes/0] free: 43.3gb[4.7%], all indices on this node will be marked read-only", "cluster.uuid": "sDf_05oOQmm5euASjIp3Fw", "node.id": "QyZ2i1mxTG6yR7uhVWjV9Q" }
So I was about to post that it's likely due to our disk getting full.
Thank you for your insights!
Also, I'll try to make sure that starting from the next version the server will incorporate a better error heuristic (for example adding text saying Check your server disk space
or something to that effect 🙂 )
Even when I do a "clean install" (renamed the /opt/trains
) folder and followed the instructions to setup TRAINS, the error appears.