Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Suddenly All Experiments We Try To Log Run Into An Error. I Think It'S A Server Thing At Our Side, Because As Far As I Know Nothing Changed About Trains (We Didn'T Update Or Anything) And Yesterday It Was Working Well. Can Anyone Provide Some Insights At

Suddenly all experiments we try to log run into an error. I think it's a server thing at our side, because as far as I know nothing changed about Trains (we didn't update or anything) and yesterday it was working well.

Can anyone provide some insights at what exactly is going wrong in the following message?:
2020-11-10 12:56:03,492 - trains.log - WARNING - failed logging task to backend (1 lines, <500/100: events.add_batch/v1.0 (General data error: err=('1 document(s) failed to index.', [{'index': {'_index': 'events-log-d1bd92a3b039400cbafc60a7a5b1e52b', '_type': '_doc', '_id': 'c0c9cbbf1a154690b71f2623b7c15ada', 'status': 403, 'error': {'type': 'cluster_block_exception', 'reason': 'index [events-log-d1bd92a3b039400cbafc60a7a5b1e52b] blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];'}, 'data': {'timestamp': 1604980561463, 'type': 'log', 'task': '9544164723554d3085346f7cd33580d1', 'level': 'info', 'worker': 'ubuntu-user', 'msg': 'TRAINS Task: created new task id=9544164723554d3085346f7cd33580d1\nTRAINS results page: \n======> WARNING! UNCOMMITTED CHANGES IN REPOSITORY <======', '@timestamp': '2020-11-10T03:56:03.481Z', 'metric': '', 'variant': ''}}}]), extra_info=index [events-log-d1bd92a3b039400cbafc60a7a5b1e52b] blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];)>)I have a feeling this part of the message:

'status': 403

provides some useful info.

Any tips on how to debug / where to look to solve this problem?

  
  
Posted 4 years ago
Votes Newest

Answers 7


It seems to be related to trains-apiserver , based on the log inside the Docker compose:

trains-apiserver | [2020-11-10 04:40:14,133] [8] [ERROR] [trains.service_repo] Returned 500 for queues.get_next_task in 20ms, msg=General data error: err=('1 document(s) failed to index.', [{'index': {'_index': 'queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2020-11', '_type': '_doc', '_id': 'rkh0sHUBwyiZSyeZUAov', 'status': 403, 'error': {'type': 'cluster_block_exception', 'reason': 'index [queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2020-11] blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];'}, 'data': {'timestamp': 1604983214115, 'queue': '789a8744857746de84db036d65de8c65', 'average_waiting_time': 0, 'queue_length': 0}}}]), extra_info=index [queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2020-11] blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];

  
  
Posted 4 years ago

Even when I do a "clean install" (renamed the /opt/trains ) folder and followed the instructions to setup TRAINS, the error appears.

  
  
Posted 4 years ago

DefeatedCrab47 this issue has repeated here several times - is caused by low disk space on your server machine causing g elastic search to go into a read-only mode

  
  
Posted 4 years ago

Just clear up more space on your server disk - by default elastic will switch to this mode when less than 5 percent of the disk is free

  
  
Posted 4 years ago

SuccessfulKoala55 Thank you. I stared myself dead at trains-apiserver , but by coincidence I found this message:
trains-elastic | {"type": "server", "timestamp": "2020-11-10T06:11:08,956Z", "level": "WARN", "component": "o.e.c.r.a.DiskThresholdMonitor", "cluster.name": "trains", "node.name": "trains", "message": "flood stage disk watermark [95%] exceeded on [QyZ2i1mxTG6yR7uhVWjV9Q][trains][/usr/share/elasticsearch/data/nodes/0] free: 43.3gb[4.7%], all indices on this node will be marked read-only", "cluster.uuid": "sDf_05oOQmm5euASjIp3Fw", "node.id": "QyZ2i1mxTG6yR7uhVWjV9Q" }So I was about to post that it's likely due to our disk getting full.

Thank you for your insights!

  
  
Posted 4 years ago

Sure. Just for reference, here's a related GitHub issue: https://github.com/allegroai/trains-server/issues/58

  
  
Posted 4 years ago

Also, I'll try to make sure that starting from the next version the server will incorporate a better error heuristic (for example adding text saying Check your server disk space or something to that effect 🙂 )

  
  
Posted 4 years ago
930 Views
7 Answers
4 years ago
one year ago
Tags