Please run these commands and see if you have any "red" statuses in the output:
curl " http://localhost:9200/_cluster/health?pretty "
curl " http://localhost:9200/_cluster/health?level=indices&pretty "
@<1523701066867150848:profile|JitteryCoyote63> The requirements list the client library that apiserver uses to access the Elasticsearch. This library is capable of working with both Elasticsearch 7 and 8
If you run the following command 'sudo chown -R 1000:1000 /opt/trains' does it change anything?
MassiveHippopotamus56 The data that you posted from the browser developers tool seems coming from the "Headers" tab. Can you please post the data from the "Payload" and "Response" tabs. This is in case you run in Chrome. In other browsers the tabs may have different names
Thanks for the update. What can be seen from the log is that for some reason after copying of couple of indices Elasticsearch 7 becomes unavailable. I think we can find the reasons in the Elasticsearch 7 logs. I can send you the instructions on how to proceed (it requires a minimal change to the upgrade script so that the upgrade containers are not removed after the script run and inspection of ES7 logs)
Hi QuaintJellyfish58 , I am investigating the issue. Can you please also send the request and response from projects.get_all when you are in the Team's Work view (the case where there is no undefined project)?
Enjoy the new version:) Would still be interesting to see what caused ES7 to stop responding.
There should be a log file in the directory where you run the script. It contains more info. Can you please send me the log?
SubstantialBaldeagle49 This should collect the logs: 'sudo docker logs trains-apiserver >& apiserver.logs'
SubstantialBaldeagle49 The log looks OK. Where do you see the error?
Setting up an elastic cluster requires some devops. You can search for "setup elasticsearch 7 cluster" in the internet and there are some tutorials there. Stopping your docker-compose once in a certain period of time and backing up the /opt/trains/data folder is more straightforward and it would backup also the data that we store in mongodb.
Hi @<1523701457835003904:profile|AbruptHedgehog21> can you please share the logs from the async_delete service? It is responsible for the actual deletion of the data
Ok, I see. Then you can enter the apiserver container:sudo docker exec -it clearml-apiserver /bin/bashAnd run the following commands inside the containercurl -XGET curl -XGET
@<1523701868901961728:profile|ReassuredTiger98> Strange:( in 1.10 we already had the code for clearing ES scrolls created during the task deletion. I would recommend upgrading to the latest release v1.12.1 anyway. In addition you can instruct ES to allow more open scrolls like below. By default it is limited to 500.
Great:) The let's try to get the logs. Maybe we can get without changing the upgrade script. Please run 'sudo docker ps -a' if you see the exited container with name 'elastic-upgrade-7' then please save its logs to the file with the below command and send the file to me:
docker logs <container_id_for_elastic-upgrade-7> >& elastic_logs.txt
Hi @<1673863788857659392:profile|HomelyRabbit25> , yes it should include the support for async_delete service. Please provide the storage_credentials configuration to this service instead of the apiserver. For the details of whether the deletion works or it has any issues with the provided configuration please inspect the logs from the async_delete pod.
Hi QuaintJellyfish58 , thanks for the feedback. I am trying to compare what you send and receive for team's view with what you get in My-work view. Can you please also send the data for the same requests and responses in the My work view structured in the same way like you sent for the team view now?
The index "events-plot-d1bd92a3b039400cbafc60a7a5b1e52b" is red meaning that it is corrupted and elastic cannot work with it. The most straightforward solution would be to delete this index but it will result in all the plots generated so far will be lost.
No, there was a problem with the particular version migration. The temporary index creation allowed to this and all subsequent migrations to run successfully. So for now your DB is properly aligned with the latest ClearML and the future upgrades should work fine.
Can you run 'ls -al' in the /opt/trains/data folder and also in the /opt/trains/data/elastic_7 folder and send the output?
What about the UID for epdadmin user? 'id -u epdadmin'
Hi UnevenDolphin73 . how many artifacts do you have on this task? We are storing task metadata in Mongo and there is a limit of 16Mb per a single document. While the artifact itself is not stored under the task there is some metadata (notably the uri and display_data/preview) that is stored for each artifact
Hi @<1523701868901961728:profile|ReassuredTiger98> , how exactly do you override the values in storage_credentials file? Do you prepare a new docker image with the changed file or map this file from outside with the volume mapping in the docker compose or through the env variables? What is also important is that you do this override for the async_delete service. It is the service that actually uses the storage credentials. Not the apiserver itself
Are you sure that it was performed fully according to the suggested sequence? The error that you posted says that v3.6 data is incompatible with v4.4 and suggests version 4.2 or earlier. Step 3 starts with mongo 4.0 that should be able to open v3.6 data. And then a number of gradual updates through versions 4.0->4.2->4.4 is performed
We just uploaded the new update script into
https://github.com/allegroai/trains-server/releases/download/0.16.0/trains-server-0.16.0-migration.zip
It has several improvements and there is a chance that it will overcome the issue that you are facing. Also, please check that you have enough disk space for copying of ES data.
Hi @<1686547380465307648:profile|StrongSeaturtle89> , please put the following setting in the docker-compose.yaml under elasticsearch->environment:
ingest.geoip.downloader.enabled: false
And then restart the docker compose. Does it help?
Hi MortifiedDove27 , you can run the following commands on the clearml server host to get the docker logs for the apiserver and elasticsearch:sudo docker logs clearml-apiserver > apiserver.logs 2>&1 sudo docker logs clearml-elastic > elastic.logs 2>&1
The index events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b status is red. Meaning that the data for this index got corrupted. Since there are no replicas the only feasible option would be to delete this index. All the training scalars events for the old taskd would be lost then. But the newly created tasks should start working fine.curl -XDELETE
ReassuredTiger98 What are the memory settings for Elasticsearch in your docker compose? If it is 2 Gb and you have enough memory on your server then you can try to increase it to 4gb like this: ES_JAVA_OPTS: -Xms4g -Xmx4g