Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Greetings! Could You Help Me? I’Ve Just Tried Delete Old Experiment (Year Ago) But Got The Following Error:

Greetings!
could you help me?
I’ve just tried delete old experiment (year ago) but got the following error:
apiserver [2022-06-17 13:36:59,636] [10] [WARNING] [elasticsearch] POST [status: N/A request:60.055s] apiserver Traceback (most recent call last): apiserver File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 449, in _make_request apiserver six.raise_from(e, None) apiserver File "<string>", line 3, in raise_from apiserver File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 444, in _make_request apiserver httplib_response = conn.getresponse() apiserver File "/usr/lib64/python3.6/http/client.py", line 1346, in getresponse apiserver response.begin() apiserver File "/usr/lib64/python3.6/http/client.py", line 307, in begin apiserver version, status, reason = self._read_status() apiserver File "/usr/lib64/python3.6/http/client.py", line 268, in _read_status apiserver line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") apiserver File "/usr/lib64/python3.6/socket.py", line 586, in readinto apiserver return self._sock.recv_into(b) apiserver socket.timeout: timed out apiserver During handling of the above exception, another exception occurred: apiserver Traceback (most recent call last): apiserver File "/usr/local/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py", line 252, in perform_request apiserver method, url, body, retries=Retry(False), headers=request_headers, **kw apiserver File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 786, in urlopen apiserver method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2] apiserver File "/usr/local/lib/python3.6/site-packages/urllib3/util/retry.py", line 525, in increment apiserver raise six.reraise(type(error), error, _stacktrace) apiserver File "/usr/local/lib/python3.6/site-packages/urllib3/packages/six.py", line 770, in reraise apiserver raise value apiserver File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 710, in urlopen apiserver chunked=chunked, apiserver File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 451, in _make_request apiserver self._raise_timeout(err=e, url=url, timeout_value=read_timeout) apiserver File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 341, in _raise_timeout apiserver self, url, "Read timed out. (read timeout=%s)" % timeout_value apiserver urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='elasticsearch-service', port='9200'): Read timed out. (read timeout=60)Can I increase timeout?

  
  
Posted 2 years ago
Votes Newest

Answers 29


Infrastructure in k8s
but when I check healt of cluster, I’ve got green status
curl localhost:9200/_cluster/health
{"cluster_name":"clearml","status":"green","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":41,"active_shards":41,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":0,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":100.0}

  
  
Posted 2 years ago

what interesting, that a new experiments clearml can delete without any problems
but old archived experiments, clearml didn’t want remove

  
  
Posted 2 years ago

sure
First command output
curl -XGET health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2021-10 xjVdUpdDReCv5g11c4IGFw 1 0 10248782 0 536.6mb 536.6mb green open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2021-11 YuxjrptlTh2MlOCU7ykMkA 1 0 13177592 0 695.6mb 695.6mb green open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2022-01 CXZ8edSSR_C3f-264gPSxw 1 0 17178186 0 891.8mb 891.8mb green open events-log-d1bd92a3b039400cbafc60a7a5b1e52b Urte-26hTRmm9syCc3lIGQ 1 0 37510243 6511399 12.8gb 12.8gb green open events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b 70zX8fwURuyXdjHcc6TNaQ 1 0 374684303 24869857 51.4gb 51.4gb green open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2022-04 oY8hM0BUTP6Zki-krHkEJg 1 0 12258567 0 634.5mb 634.5mb green open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2022-05 9FWIKsugQf2XF2asGkZcTA 1 0 10015124 0 513.9mb 513.9mb green open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2021-10 5GouX7CiTqy0KnqLe-jGUQ 1 0 39513094 0 2.4gb 2.4gb green open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2022-01 Nz8T5sd0QNW9dJQM0UoOnw 1 0 40993955 0 2.5gb 2.5gb green open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2022-02 aw6X3LPASLahZ-EMWSkYRA 1 0 15713573 0 807.5mb 807.5mb green open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2021-11 Empmo9cdQ9eYqPiqVakAOA 1 0 39530759 0 2.4gb 2.4gb green open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2021-12 PfrlVBsRSHiBaB-C13AuFw 1 0 8801479 0 459.2mb 459.2mb green open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2022-03 G9gsKlLqTLmSfFRIUKxhpA 1 0 12396061 0 640.1mb 640.1mb green open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2022-02 vJ-XUAEfSbaUS-DlLz23Zg 1 0 37301997 0 2.2gb 2.2gb green open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2021-12 981MwI1nT8KxQJ_Cjkb0uA 1 0 30484228 0 1.9gb 1.9gb green open events-training_debug_image-d1bd92a3b039400cbafc60a7a5b1e52b 2oiWS6VHRuuT6m9OtvOYIg 1 0 135153 56191 31.7mb 31.7mb green open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2022-06 hW4mi0bDQA2S-jM5KXGILQ 1 0 4273551 0 245.4mb 245.4mb green open .geoip_databases iYPbj6vsS0-Tm_PGo49UHw 1 0 41 41 38.9mb 38.9mb green open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2022-03 5MS5I7fGRLGQgM3S8EbF1A 1 0 40349234 0 2.4gb 2.4gb green open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2022-04 1C4QazTaTWyuo8HSNSzRmw 1 0 33531158 0 2gb 2gb green open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2022-05 YPe4zRb7Q92DeaSSvTlGdg 1 0 32807469 0 1.9gb 1.9gb green open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2022-06 hu3N2iQgRGC9xYQi84NCsw 1 0 17636277 0 1.1gb 1.1gb green open events-plot-d1bd92a3b039400cbafc60a7a5b1e52b l4BpBPIeRfyUfodRxIzRtg 1 0 43640 3967 95.6mb 95.6mbSecond command output
index shard prirep state docs store ip node worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2021-11 0 p STARTED 39530759 2.4gb elastic-ip clearml queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2021-12 0 p STARTED 8801479 459.2mb elastic-ip clearml queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2022-03 0 p STARTED 12396061 640.1mb elastic-ip clearml queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2022-05 0 p STARTED 10015124 513.9mb elastic-ip clearml worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2022-05 0 p STARTED 32807469 1.9gb elastic-ip clearml worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2022-04 0 p STARTED 33531158 2gb elastic-ip clearml .ds-.logs-deprecation.elasticsearch-default-2022.01.25-000004 0 p STARTED elastic-ip clearml .ds-.logs-deprecation.elasticsearch-default-2021.12.14-000001 0 p STARTED elastic-ip clearml events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b 0 p STARTED 374684303 51.4gb elastic-ip clearml .ds-ilm-history-5-2022.06.12-000010 0 p STARTED elastic-ip clearml worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2022-03 0 p STARTED 40349234 2.4gb elastic-ip clearml events-plot-d1bd92a3b039400cbafc60a7a5b1e52b 0 p STARTED 43640 95.6mb elastic-ip clearml worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2021-12 0 p STARTED 30484228 1.9gb elastic-ip clearml .ds-.logs-deprecation.elasticsearch-default-2022.02.22-000006 0 p STARTED elastic-ip clearml .ds-.logs-deprecation.elasticsearch-default-2022.04.05-000009 0 p STARTED elastic-ip clearml .ds-ilm-history-5-2022.03.14-000004 0 p STARTED elastic-ip clearml .ds-.logs-deprecation.elasticsearch-default-2022.04.19-000010 0 p STARTED elastic-ip clearml .ds-.logs-deprecation.elasticsearch-default-2021.12.28-000002 0 p STARTED elastic-ip clearml worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2021-10 0 p STARTED 39513094 2.4gb elastic-ip clearml queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2021-11 0 p STARTED 13177592 695.6mb elastic-ip clearml worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2022-06 0 p STARTED 17636637 1.1gb elastic-ip clearml events-training_debug_image-d1bd92a3b039400cbafc60a7a5b1e52b 0 p STARTED 135153 31.7mb elastic-ip clearml .ds-.logs-deprecation.elasticsearch-default-2022.06.15-000014 0 p STARTED elastic-ip clearml queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2021-10 0 p STARTED 10248782 536.6mb elastic-ip clearml events-log-d1bd92a3b039400cbafc60a7a5b1e52b 0 p STARTED 37510244 12.8gb elastic-ip clearml .ds-.logs-deprecation.elasticsearch-default-2022.03.08-000007 0 p STARTED elastic-ip clearml queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2022-06 0 p STARTED 4273551 245.4mb elastic-ip clearml .ds-.logs-deprecation.elasticsearch-default-2022.02.08-000005 0 p STARTED elastic-ip clearml .ds-.logs-deprecation.elasticsearch-default-2022.05.03-000011 0 p STARTED elastic-ip clearml worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2022-02 0 p STARTED 37301997 2.2gb elastic-ip clearml .ds-.logs-deprecation.elasticsearch-default-2022.05.31-000013 0 p STARTED elastic-ip clearml .ds-.logs-deprecation.elasticsearch-default-2022.03.22-000008 0 p STARTED elastic-ip clearml .ds-ilm-history-5-2022.04.13-000006 0 p STARTED elastic-ip clearml worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2022-01 0 p STARTED 40993955 2.5gb elastic-ip clearml .ds-.logs-deprecation.elasticsearch-default-2022.01.11-000003 0 p STARTED elastic-ip clearml queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2022-02 0 p STARTED 15713573 807.5mb elastic-ip clearml queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2022-04 0 p STARTED 12258567 634.5mb elastic-ip clearml .ds-.logs-deprecation.elasticsearch-default-2022.05.17-000012 0 p STARTED elastic-ip clearml .geoip_databases 0 p STARTED 41 38.9mb elastic-ip clearml queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2022-01 0 p STARTED 17178186 891.8mb elastic-ip clearml .ds-ilm-history-5-2022.05.13-000008 0 p STARTED elastic-ip clearml

  
  
Posted 2 years ago

What are the env vars passed to ES in k8s?

  
  
Posted 2 years ago

I’ve tried with these two
>>> client.tasks.get_all(system_tags=["archived"]) +----------------------------------+------------------------------------------------------------+ | id | name | +----------------------------------+------------------------------------------------------------+ | 378c8e80c3dd4ff8901f04f00824acbd | ab-ai-767-easy | | c575db3f302441c6a977f52c060c135d | ab-ai-767-hard |This is output for the first task ab-ai-767-easy
# curl -XGET " " { "completed" : true, "task" : { "node" : "gjlBdFdETTqe3snnYbTcGQ", "id" : 9856290, "type" : "transport", "action" : "indices:data/write/delete/byquery", "status" : { "total" : 0, "updated" : 0, "created" : 0, "deleted" : 0, "batches" : 0, "version_conflicts" : 0, "noops" : 0, "retries" : { "bulk" : 0, "search" : 0 }, "throttled_millis" : 0, "requests_per_second" : -1.0, "throttled_until_millis" : 0 }, "description" : "delete-by-query [events-*-d1bd92a3b039400cbafc60a7a5b1e52b]", "start_time_in_millis" : 1655723441902, "running_time_in_nanos" : 19219813692, "cancellable" : true, "cancelled" : false, "headers" : { } }, "response" : { "took" : 19217, "timed_out" : false, "total" : 0, "updated" : 0, "created" : 0, "deleted" : 0, "batches" : 0, "version_conflicts" : 0, "noops" : 0, "retries" : { "bulk" : 0, "search" : 0 }, "throttled" : "0s", "throttled_millis" : 0, "requests_per_second" : -1.0, "throttled_until" : "0s", "throttled_until_millis" : 0, "failures" : [ ] } }and for the second
root@elasticsearch-7859849f67-8755p:/usr/share/elasticsearch# curl -XPOST -H "Content-Type: application/json" " " -d'{"query": {"term": {"task": "c575db3f302441c6a977f52c060c135d"}}}' {"task":"gjlBdFdETTqe3snnYbTcGQ:9857749"}root@elasticsearch-7859849f67-8755p:/usr/share/elasticsearch# root@elasticsearch-7859849f67-8755p:/usr/share/elasticsearch# curl -XGET " " { "completed" : true, "task" : { "node" : "gjlBdFdETTqe3snnYbTcGQ", "id" : 9857749, "type" : "transport", "action" : "indices:data/write/delete/byquery", "status" : { "total" : 0, "updated" : 0, "created" : 0, "deleted" : 0, "batches" : 0, "version_conflicts" : 0, "noops" : 0, "retries" : { "bulk" : 0, "search" : 0 }, "throttled_millis" : 0, "requests_per_second" : -1.0, "throttled_until_millis" : 0 }, "description" : "delete-by-query [events-*-d1bd92a3b039400cbafc60a7a5b1e52b]", "start_time_in_millis" : 1655723651286, "running_time_in_nanos" : 16276854116, "cancellable" : true, "cancelled" : false, "headers" : { } }, "response" : { "took" : 16276, "timed_out" : false, "total" : 0, "updated" : 0, "created" : 0, "deleted" : 0, "batches" : 0, "version_conflicts" : 0, "noops" : 0, "retries" : { "bulk" : 0, "search" : 0 }, "throttled" : "0s", "throttled_millis" : 0, "requests_per_second" : -1.0, "throttled_until" : "0s", "throttled_until_millis" : 0, "failures" : [ ] } }but, I still see this tasks in the web interface and I see it in output from api
Although in the output above I see that these tasks removed successfully
"completed" : true,

  
  
Posted 2 years ago

- env: - name: bootstrap.memory_lock value: "true" - name: cluster.name value: clearml - name: cluster.routing.allocation.node_initial_primaries_recoveries value: "500" - name: cluster.routing.allocation.disk.watermark.low value: 500mb - name: cluster.routing.allocation.disk.watermark.high value: 500mb - name: cluster.routing.allocation.disk.watermark.flood_stage value: 500mb - name: discovery.zen.minimum_master_nodes value: "1" - name: discovery.type value: "single-node" - name: http.compression_level value: "1" - name: node.ingest value: "true" - name: node.name value: clearml - name: reindex.remote.whitelist value: '*.*' - name: xpack.monitoring.enabled value: "false" - name: xpack.security.enabled value: "false" - name: ES_JAVA_OPTS value: "-Xms8g -Xmx8g -Dlog4j2.formatMsgNoLookups=true"

  
  
Posted 2 years ago

Yeah, we're constantly trying to improve that... 🙂

  
  
Posted 2 years ago

Recently, the free space on pv ended and the cluster switched to read_only_allow_delete. I’ve tried remove old experiments, but it didn’t help and I got the same error.

  
  
Posted 2 years ago

Then I changed the size of the PV and added an extra 50Gb
Looks like it helped and now the service is working, but I still get this bug.

  
  
Posted 2 years ago

Anyway, if there was any additional information for troubleshooting or backups on the site would be very cool.

  
  
Posted 2 years ago

Yet the experiments have stopped normally. In the body of the experiment writes aborted, but at the same time I see it on the dashboard

  
  
Posted 2 years ago

and I still see this error in the logs
[2022-06-20 13:24:27,777] [9] [WARNING] [elasticsearch] POST [status:N/A request :60.060s] Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 449, in _make_request six.raise_from(e, None) File "<string>", line 3, in raise_from File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 444, in _make_request httplib_response = conn.getresponse() File "/usr/lib64/python3.6/http/client.py", line 1346, in getresponse response.begin() File "/usr/lib64/python3.6/http/client.py", line 307, in begin version, status, reason = self._read_status() File "/usr/lib64/python3.6/http/client.py", line 268, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") File "/usr/lib64/python3.6/socket.py", line 586, in readinto return self._sock.recv_into(b) socket.timeout: timed out During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py", line 252, in perform_request method, url, body, retries=Retry(False), headers=request_headers, **kw File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 786, in urlopen method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2] File "/usr/local/lib/python3.6/site-packages/urllib3/util/retry.py", line 525, in increment raise six.reraise(type(error), error, _stacktrace) File "/usr/local/lib/python3.6/site-packages/urllib3/packages/six.py", line 770, in reraise raise value File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 710, in urlopen chunked=chunked, File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 451, in _make_request self._raise_timeout(err=e, url=url, timeout_value=read_timeout) File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 341, in _raise_timeout self, url, "Read timed out. (read timeout=%s)" % timeout_value urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='elasticsearch-service', port='9200'): Read timed out. (read timeout=60)

  
  
Posted 2 years ago

Developers complain that the experiments are long hung in the status of Pending
more than 10 minutes

  
  
Posted 2 years ago

value: "-Xms8g -Xmx8g -Dlog4j2.formatMsgNoLookups=true"I would recommend using at least value: "-Xms16g -Xmx16g -Dlog4j2.formatMsgNoLookups=true"

  
  
Posted 2 years ago

ResponsiveCamel97 , can you send the output of:
curl -XGETand:
curl -XGET

  
  
Posted 2 years ago

apiserver [2022-06-19 08:32:51,912] [10] [WARNING] [urllib3.connectionpool] Connection pool is full, discarding connection: elasticsearch-service. Connection pool size: 10

This is just a warning and can be disregarded - it only means an unused connection is discarded, nothing more.

  
  
Posted 2 years ago

Only when you try to delete these tasks?

  
  
Posted 2 years ago

With what memory setting do you run ES? How much memory and cpu is currently occupied by ES container?

  
  
Posted 2 years ago

I recovered the ES data from the backup
It helped.

  
  
Posted 2 years ago

at the moment ES has the following resources
Limits: cpu: 2 memory: 10G Requests: cpu: 2 memory: 10GWe launched ES with these parameters at the time of the problems

  
  
Posted 2 years ago

And adjusting the pod allocation accordingly

  
  
Posted 2 years ago

It seems your server has issues with the ES service, this should be general and not related to the delete itself - can you try doing sudo docker ps ?

  
  
Posted 2 years ago

Also I tried delete tasks by api, like this:
` >>> from clearml_agent import APIClient

client = APIClient()
client.tasks.get_all(system_tags=["archived"])
+----------------------------------+------------------------------------------------------------+
| id | name |
+----------------------------------+------------------------------------------------------------+
| 41cb804da24747abb362fb5ca0414fe6 | 15.0.95 |
client.tasks.delete('41cb804da24747abb362fb5ca0414fe6')

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.9/site-packages/clearml_agent/backend_api/session/client/client.py", line 374, in new_func
return Response(self.session.send(request_cls(*args, **kwargs)))
File "/usr/local/lib/python3.9/site-packages/clearml_agent/backend_api/session/client/client.py", line 122, in send
raise APIError(result)
clearml_agent.backend_api.session.client.client.APIError: APIError: code 400/101: Invalid task id: id=41cb804da24747abb362fb5ca0414fe6, company=d1bd92a3b039400cbafc60a7a5b1e52b `But It doesn’t work too

  
  
Posted 2 years ago

Hi ResponsiveCamel97 , the shards and indices stats look fine. Can you please try the async delete of the task data? You can run the following line in the shell inside the apiserver container. Just replace <task_id> with your actual task id
curl -XPOST -H "Content-Type: application/json" " " -d'{"query": {"term": {"task": "<task_id>"}}}'You should get in response something like this:
{"task":"p6350SG7STmQALxH-E3CLg:1426125"}Then you can periodically ping ES on the status of the running operation:
curl -XGET " <copy here the ES task that you received above>"Let's see how much time the async delete task will eventually take and what amount of data will be deleted

  
  
Posted 2 years ago

Delete, reset

looks like something with index
index shard time type stage source_host source_node target_host target_node repository snapshot files files_recovered files_percent files_total bytes bytes_recovered bytes_percent bytes_total translog_ops translog_ops_recovered translog_ops_percent events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b 0 2.4h existing_store done n/a n/a 10.18.13.96 clearml n/a n/a 0 0 100.0% 238 0 0 100.0% 55032286631 959750 959750 100.0%very much confuses high recovery time, translog_ops and translog_ops_recovered
We have the same clearml in stage env for tests, and if this clearml restart elasticsearch everything will be fine
index shard time type stage source_host source_node target_host target_node repository snapshot files files_recovered files_percent files_total bytes bytes_recovered bytes_percent bytes_total translog_ops translog_ops_recovered translog_ops_percent events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b 0 5s existing_store done n/a n/a 10.18.11.137 clearml n/a n/a 0 0 100.0% 253 0 0 100.0% 53429363732 0 0 100.0%How to solve this problem with index without deleting it

And many of the following bugs in the API logs
apiserver [2022-06-19 08:32:51,912] [10] [WARNING] [urllib3.connectionpool] Connection pool is full, discarding connection: elasticsearch-service. Connection pool size: 10

  
  
Posted 2 years ago

And developers complain to me that they can’t start experiment
APIError: code 500/100: General data error (ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='elasticsearch-service', port='9200'): Read timed out. (read timeout=60))) Failed deleting old session ffaa2192fb9045359e7c9827ff5e1e55 APIError: code 500/100: General data error (ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='elasticsearch-service', port='9200'): Read timed out. (read timeout=60))) Failed deleting old session 63bd918c23d74108ae1c74a373435f01

  
  
Posted 2 years ago

The tasks themselves will stay until you succeed to delete them from the client. Here we tried to see why deleting their data from ES timed out. From what I see no data was actually deleted (most likely because of the previous delete efforts that actually deleted the data though caused time out in the apiserver). What seems problematic is the amount of time that each operation took (19 and 16 seconds). It may be due to insufficient memory/cpu allocation to ES container or due to the 50Gb index size

  
  
Posted 2 years ago

I just hided elastic IP in the second output

  
  
Posted 2 years ago

ok, lets try
but it’s a lot of resources

  
  
Posted 2 years ago
1K Views
29 Answers
2 years ago
one year ago
Tags
Similar posts