It seems to be related to trains-apiserver
, based on the log inside the Docker compose:
trains-apiserver | [2020-11-10 04:40:14,133] [8] [ERROR] [trains.service_repo] Returned 500 for queues.get_next_task in 20ms, msg=General data error: err=('1 document(s) failed to index.', [{'index': {'_index': 'queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2020-11', '_type': '_doc', '_id': 'rkh0sHUBwyiZSyeZUAov', 'status': 403, 'error': {'type': 'cluster_block_exception', 'reason': 'index [queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2020-11] blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];'}, 'data': {'timestamp': 1604983214115, 'queue': '789a8744857746de84db036d65de8c65', 'average_waiting_time': 0, 'queue_length': 0}}}]), extra_info=index [queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2020-11] blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];
Even when I do a "clean install" (renamed the /opt/trains
) folder and followed the instructions to setup TRAINS, the error appears.
DefeatedCrab47 this issue has repeated here several times - is caused by low disk space on your server machine causing g elastic search to go into a read-only mode
Just clear up more space on your server disk - by default elastic will switch to this mode when less than 5 percent of the disk is free
SuccessfulKoala55 Thank you. I stared myself dead at trains-apiserver
, but by coincidence I found this message:trains-elastic | {"type": "server", "timestamp": "2020-11-10T06:11:08,956Z", "level": "WARN", "component": "o.e.c.r.a.DiskThresholdMonitor", "cluster.name": "trains", "node.name": "trains", "message": "flood stage disk watermark [95%] exceeded on [QyZ2i1mxTG6yR7uhVWjV9Q][trains][/usr/share/elasticsearch/data/nodes/0] free: 43.3gb[4.7%], all indices on this node will be marked read-only", "cluster.uuid": "sDf_05oOQmm5euASjIp3Fw", "node.id": "QyZ2i1mxTG6yR7uhVWjV9Q" }
So I was about to post that it's likely due to our disk getting full.
Thank you for your insights!
Sure. Just for reference, here's a related GitHub issue: https://github.com/allegroai/trains-server/issues/58
Also, I'll try to make sure that starting from the next version the server will incorporate a better error heuristic (for example adding text saying Check your server disk space
or something to that effect 🙂 )