Do you see any error in the browser network tab?
Yes, it is safe to put number_of_replicas to 0 and refresh_interval to -1 for the target index before the reindex and then put them back after the reindex is finished
Hi @<1523701868901961728:profile|ReassuredTiger98> , what version of the apiserver are you using?
IdealPanda97 Ok, I see. Can you please run the following command, then restart the docker-compose and see if it makes any difference?sudo chown -R 1000:1000 /opt/trains
Can you run 'ls -al' in the /opt/trains/data folder and also in the /opt/trains/data/elastic_7 folder and send the output?
@<1523701868901961728:profile|ReassuredTiger98> Strange:( in 1.10 we already had the code for clearing ES scrolls created during the task deletion. I would recommend upgrading to the latest release v1.12.1 anyway. In addition you can instruct ES to allow more open scrolls like below. By default it is limited to 500.
If you run the following command 'sudo chown -R 1000:1000 /opt/trains' does it change anything?
Hi JitteryCoyote63 , are you still missing a month of data in the event logs? If you do cat indices do you see the same amount of docs in the original and the new ones?
Can you share all the error info that you get in the network tab?
Hi DilapidatedDucks58 , I am trying to reproduce the "Connection is full warning". Do you override any apiserver environment variables is docker compose? If yes then can you share your version of docker-compose? Do you provide a configuration file for gunicorn? Can you please share it?
This explains the issue I think. The recovery path would be as follows:
Put down the running containers Restore both mongo and elastic data from the backup Run the old version docker containers and make sure that all the data is there Put down the containers Run the upgrade script Start the new version
At some point we switched from Mongo DB v3.6 to v4.4. Upgrading from old versions require a migration of mongo data. Did you run the upgrade script as described below? Were there any errors?
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_mongo44_migration/
SubstantialElk6 Both indices that are red are not critical for the ClearML functioning and can be deleted like this:curl -XDELETE '
' curl -XDELETE '
'
For the analysis of the possible reasons that lead to it can you please collect the full ES logs to the file and send it here?sudo docker logs clearml-elastic > log.txt 2>&1
Hi SoggyBeetle95 , from what version of clearml did you upgrade? About the tasks that disappeared: you do not see these tasks at all or you see these tasks with no results?
Hi SubstantialElk6 , another thing that can be checked is the health of the particular ES indices. Can you please run the below command in the clearml-elastic container and post the results here?curl -XGET
@<1585078752969232384:profile|FantasticDuck7> The best would be to copy this file to the host, edit it and map this file into the container instead of the original one. The single file mapping in the docker-compose file should look like this:
volumes:
- type: bind
source: <the path to the config file on the host>
target: /opt/clearml/apiserver/config/default/services/storage_credentials.conf
You should do it for the async_delete service. Not for the apise...
@<1585078752969232384:profile|FantasticDuck7> What volume mappings do you have for the async_delete service in the docker-compose.yaml file?
Ok, so there is no mapping for the whole config folder or specific config file that you changed. That's why async_delete does not get your updated configuration. You can do one of the following: either add here mapping for the specific file like you did earlier or map the whole config folder like apiserver service does:
- /opt/clearml/config:/opt/clearml/config
The second way is probably more flexible
Hi VexedPeacock35 , I suspect that Elasticsearch works too hard and periodically misses timeouts on recording events. How much memory and CPU is it using? Can you increase the memory that is allocated to it and see whether this helps?
Actually the task logs will be lost. The tasks themselves and their reported metrics and plots would stay. The command is the following:curl -XDELETE localhost:9200/events-log-d1bd92a3b039400cbafc60a7a5b1e52b
Are you running them on the computer that hosts the server docker containers. What is the port binding for elasticsearch in your docker compose?
Hi IdealPanda97 , can you share the logs for the 'elastic-upgrade-7' docker container? According to the upgrade log there was some problem with Elasticsearch during indices copy.
Yes, the command would be like this: curl -XDELETE " http://localhost:9200/queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2020-08 "
If you decide to delete the "red" indices then you can proceed with the command above issuing it for each problematic index. The queue metrics index is not very important but the second one "events-logs" contains all the log messages produced by your tasks in August. You will still have debug images and scalar metrics reported by these tasks but the log messages ...
If it returns an OK result then rerun the upgrade process again.
Setting up an elastic cluster requires some devops. You can search for "setup elasticsearch 7 cluster" in the internet and there are some tutorials there. Stopping your docker-compose once in a certain period of time and backing up the /opt/trains/data folder is more straightforward and it would backup also the data that we store in mongodb.
Here is the thread with solving the same issue: https://allegroai-trains.slack.com/archives/CTK20V944/p1596724607016500
Hi RattyFish27 , it seems that there is some issue with Elasticsearch cluster. Can you please run the following commands on the server and paste here their output?curl -XGET
curl -XGET
Sorry, I did not write it properly. You need to run the following curl command from the command line:
curl -XPOST ' http://localhost:9200/_xpack/license/start_basic '
It seems that index events-log-d1bd92a3b039400cbafc60a7a5b1e52b got corrupted. In case there are no backups the only choice would be to delete this index from elasticsearch