Hi H4dr1en, there is a chance that the problem is that in parallel reindexing of data. You can try to replace parallel=max(docker_resources.cpus // 2, 1)
at line 190 with
parallel=1
I think you will need to remove the /opt/trains/data/elastic_7 folder before script restart
There should be a log file in the directory where you run the script. It contains more info. Can you please send me the log?
The volumes section of elasticsearch service looks OK to me:
/opt/trains/data/elastic_7:/usr/share/elasticsearch/data
Hi IdealPanda97 , can you please check your available disk space and available RAM? According to the logs all the services (Elastic, Mongo, Redis) fail to start
I mean it is not possible to open v3.6 data in version 4.4. That's why the steps 3-10 are there
Enjoy the new version:) Would still be interesting to see what caused ES7 to stop responding.
MassiveHippopotamus56 The data that you posted from the browser developers tool seems coming from the "Headers" tab. Can you please post the data from the "Payload" and "Response" tabs. This is in case you run in Chrome. In other browsers the tabs may have different names
Ok, so there is no mapping for the whole config folder or specific config file that you changed. That's why async_delete does not get your updated configuration. You can do one of the following: either add here mapping for the specific file like you did earlier or map the whole config folder like apiserver service does:
- /opt/clearml/config:/opt/clearml/config
The second way is probably more flexible
Thanks for the update. What can be seen from the log is that for some reason after copying of couple of indices Elasticsearch 7 becomes unavailable. I think we can find the reasons in the Elasticsearch 7 logs. I can send you the instructions on how to proceed (it requires a minimal change to the upgrade script so that the upgrade containers are not removed after the script run and inspection of ES7 logs)
For running of the old version of Trains the same setting can be added to elasticsearch environment section in the docker compose
IdealPanda97 Ok, I see. Can you please run the following command, then restart the docker-compose and see if it makes any difference?sudo chown -R 1000:1000 /opt/trains
Great:) The let's try to get the logs. Maybe we can get without changing the upgrade script. Please run 'sudo docker ps -a' if you see the exited container with name 'elastic-upgrade-7' then please save its logs to the file with the below command and send the file to me:
docker logs <container_id_for_elastic-upgrade-7> >& elastic_logs.txt
Hi @<1523701868901961728:profile|ReassuredTiger98> , how exactly do you override the values in storage_credentials file? Do you prepare a new docker image with the changed file or map this file from outside with the volume mapping in the docker compose or through the env variables? What is also important is that you do this override for the async_delete service. It is the service that actually uses the storage credentials. Not the apiserver itself
Sure, you delete it with the following command:
curl -XDELETE " http://localhost:9200/events-plot-d1bd92a3b039400cbafc60a7a5b1e52b "
Once deleted it will be automatically recreated by the api server and should see the plots from the new tasks that you run afterwards
IdealPanda97 What can be seen now is that some of the indices (at least queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2020-08) are in the corrupted state. This can be the result of abnormal termination of ES or some other situation. The queue metrics index is not particularly important but there maybe other indices that are also corrupted. To map the cluster and indices state you can issue the following commands (with the running ES5 docker container). Look for the "red" statuses in the out...
IdealPanda97 It seems that expired ES5 license is the reason for both the upgrade failing and for inability to run the Trains v0.15. The license is free but the ways to renew it are different between ES5 and ES6/7. For the ES5 the procedure is more complicated and described in the medium article that I sent earlier. In the attached thread another user has applied it and it solved the issue. The article describe 2 possible solutions: turning of the xpack when running Elastic and retrieving th...
If you run the following command 'sudo chown -R 1000:1000 /opt/trains' does it change anything?
There is a "License expired" message for the Elasticsearch 5. Try running the following command when your old trains docker is up:
http://localhost:9200/_xpack/license/start_basic
Thanks! In this log it mentions that the source elastic 5 has failed during the reindex process. Can you also share the logs from the 'elastic-upgrade' service?
If it returns an OK result then rerun the upgrade process again.
Another option that should work for the upgrade script is to pass an environment variable that disable the xpack (the feature that requires licensing) for the ES5 docker container. It can done as following:
python elastic_upgrade.py --extra-source-env xpack.security.enabled=false
Can you run 'ls -al' in the /opt/trains/data folder and also in the /opt/trains/data/elastic_7 folder and send the output?
Oh, I see:( it turned out that --extra-source-env option was not officially released yet. But the script that supports it can be downloaded from here: https://github.com/allegroai/trains-server/files/5080286/upgrade.zip
Sorry, I did not write it properly. You need to run the following curl command from the command line:
curl -XPOST ' http://localhost:9200/_xpack/license/start_basic '
Hi IdealPanda97 , can you share the logs for the 'elastic-upgrade-7' docker container? According to the upgrade log there was some problem with Elasticsearch during indices copy.
Here is the thread with solving the same issue: https://allegroai-trains.slack.com/archives/CTK20V944/p1596724607016500
Hi @<1523707653782507520:profile|MelancholyElk85> , what version of the apiserver are you using?
We found the issue. It will be fixed in the upcoming patch for the open-v1.14 release
Hi UnevenDolphin73 . how many artifacts do you have on this task? We are storing task metadata in Mongo and there is a limit of 16Mb per a single document. While the artifact itself is not stored under the task there is some metadata (notably the uri and display_data/preview) that is stored for each artifact