👍 I would say either deploying elasticsearch cluster consisting of several nodes with replication or doing daily backups:
https://www.elastic.co/guide/en/elasticsearch/reference/7.17/snapshot-restore.html
Apart from it is worth making sure that ES is running in a stable environment (no abrupt restarts) and with enough RAM.
Hi ExasperatedCrocodile76 , what version of the clearml server are you using? You can see it in the bottom right corner of the Settings screen
Hi @<1523701868901961728:profile|ReassuredTiger98> , what version of the apiserver are you using?
It seems that index events-log-d1bd92a3b039400cbafc60a7a5b1e52b got corrupted. In case there are no backups the only choice would be to delete this index from elasticsearch
MassiveHippopotamus56 The data that you posted from the browser developers tool seems coming from the "Headers" tab. Can you please post the data from the "Payload" and "Response" tabs. This is in case you run in Chrome. In other browsers the tabs may have different names
Another option that should work for the upgrade script is to pass an environment variable that disable the xpack (the feature that requires licensing) for the ES5 docker container. It can done as following:
python elastic_upgrade.py --extra-source-env xpack.security.enabled=false
Hi @<1523701260895653888:profile|QuaintJellyfish58> . For the issue #229: we found and fixed the problem. The fix will be available in the coming patch for the v1.14 release. For the issue 228 I requested more info from you in the github
Hi H4dr1en, there is a chance that the problem is that in parallel reindexing of data. You can try to replace parallel=max(docker_resources.cpus // 2, 1)
at line 190 with
parallel=1
I think you will need to remove the /opt/trains/data/elastic_7 folder before script restart
I am not sure about the reasons. What you can do is to backup your /opt/trains/data folder periodically (preferably stopping the docker compose before it). Another possibility is to configure your elasticsearch to run as a cluster with 2 or more nodes on the same or different machine. This will allow elastic to replicate your indices to other nodes.
Hi WittyOwl57 , there is a chance that the reason is in this setting: Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log ...
First it say says about invalid log option that may require further investigation. Second the file that it tries to write to is logs/gc.log and it is not under the $clearml folder where you give the write permissions to the ES user. I would try cancelling JVM logging at all or specifying the full path to the file so that it would be under the folder that has 1000:1000 o...
Can you run 'ls -al' in the /opt/trains/data folder and also in the /opt/trains/data/elastic_7 folder and send the output?
Sure, you delete it with the following command:
curl -XDELETE " http://localhost:9200/events-plot-d1bd92a3b039400cbafc60a7a5b1e52b "
Once deleted it will be automatically recreated by the api server and should see the plots from the new tasks that you run afterwards