Reputation
Badges 1
36 × Eureka!Delete, reset
looks like something with index
` index shard time type stage source_host source_node target_host target_node repository snapshot files files_recovered files_percent files_total bytes bytes_recovered bytes_percent bytes_total translog_ops translog_ops_recovered translog_ops_percent
events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b 0 2.4h existing_store done n/a n/a 10.18.13.96 cle...
what interesting, that a new experiments clearml can delete without any problems
but old archived experiments, clearml didn’t want remove
Infrastructure in k8s
but when I check healt of cluster, I’ve got green statuscurl localhost:9200/_cluster/health
` {"cluster_name":"clearml","status":"green","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"active_primary_shards":41,"active_shards":41,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":0,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_nu...
I just hided elastic IP in the second output
` - env:
- name: bootstrap.memory_lock
value: "true"
- name: cluster.name
value: clearml
- name: cluster.routing.allocation.node_initial_primaries_recoveries
value: "500"
- name: cluster.routing.allocation.disk.watermark.low
value: 500mb
- name: cluster.routing.allocation.disk.watermark.high
value: 500mb
- name: cluster.routing.allocation.disk.watermark.flood_stage
value: 500mb
...
at the moment ES has the following resourcesLimits: cpu: 2 memory: 10G Requests: cpu: 2 memory: 10G
We launched ES with these parameters at the time of the problems