Hi QuaintJellyfish58 , thanks for the feedback. I am trying to compare what you send and receive for team's view with what you get in My-work view. Can you please also send the data for the same requests and responses in the My work view structured in the same way like you sent for the team view now?
Can you share all the error info that you get in the network tab?
Hi IdealPanda97 , can you please check your available disk space and available RAM? According to the logs all the services (Elastic, Mongo, Redis) fail to start
Ok, so there is no mapping for the whole config folder or specific config file that you changed. That's why async_delete does not get your updated configuration. You can do one of the following: either add here mapping for the specific file like you did earlier or map the whole config folder like apiserver service does:
- /opt/clearml/config:/opt/clearml/config
The second way is probably more flexible
I mean it is not possible to open v3.6 data in version 4.4. That's why the steps 3-10 are there
Ok, it seems that elasticsearch ports are open for internal communication but not for the host. Can you please add the following section to elasticsearch service in docker compose and restart the dockers?ports: - "9200:9200"After that the commands should work from host
This one is indeed dynamic but can be set as follows: "plot_len":{"type":"long"}
Hi WittyOwl57 , there is a chance that the reason is in this setting: Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log ...
First it say says about invalid log option that may require further investigation. Second the file that it tries to write to is logs/gc.log and it is not under the $clearml folder where you give the write permissions to the ES user. I would try cancelling JVM logging at all or specifying the full path to the file so that it would be under the folder that has 1000:1000 o...
IdealPanda97 Is your user id 1000? If not then this maybe the reason and chown -R 1000:1000 may help. Elasticsearch in the docker runs with user 1000. Another reason maybe some other elasticsearch process or docker running on your machine and holding the lock in the data folder. If there are any then please try stopping them. If neither of the above helps then there is an option of manually deleting .lock files from the elastic data folder. Of course the data should be backed up before this....
Are you running them on the computer that hosts the server docker containers. What is the port binding for elasticsearch in your docker compose?
What can be seen in the logs is that for some reason Elasticsearch had internal failure when trying to perform the plots query. I will send you the instruction on how to check for the health of ES nodes. It may provide us with some clues
We found the issue. It will be fixed in the upcoming patch for the open-v1.14 release
Can you run 'ls -al' in the /opt/trains/data folder and also in the /opt/trains/data/elastic_7 folder and send the output?
Oh, I see:( it turned out that --extra-source-env option was not officially released yet. But the script that supports it can be downloaded from here: https://github.com/allegroai/trains-server/files/5080286/upgrade.zip
Hi H4dr1en, there is a chance that the problem is that in parallel reindexing of data. You can try to replace parallel=max(docker_resources.cpus // 2, 1)
at line 190 with
parallel=1
I think you will need to remove the /opt/trains/data/elastic_7 folder before script restart
Thanks for the update. What can be seen from the log is that for some reason after copying of couple of indices Elasticsearch 7 becomes unavailable. I think we can find the reasons in the Elasticsearch 7 logs. I can send you the instructions on how to proceed (it requires a minimal change to the upgrade script so that the upgrade containers are not removed after the script run and inspection of ES7 logs)
Enjoy the new version:) Would still be interesting to see what caused ES7 to stop responding.
Great:) The let's try to get the logs. Maybe we can get without changing the upgrade script. Please run 'sudo docker ps -a' if you see the exited container with name 'elastic-upgrade-7' then please save its logs to the file with the below command and send the file to me:
docker logs <container_id_for_elastic-upgrade-7> >& elastic_logs.txt
Did you try restarting the docker compose since the problem start happening?
ReassuredTiger98 What are the memory settings for Elasticsearch in your docker compose? If it is 2 Gb and you have enough memory on your server then you can try to increase it to 4gb like this: ES_JAVA_OPTS: -Xms4g -Xmx4g
As long as you delete only from the deleted tasks folders it should be OK
The volumes section of elasticsearch service looks OK to me:
/opt/trains/data/elastic_7:/usr/share/elasticsearch/data
Ok, I see. Then you can enter the apiserver container:sudo docker exec -it clearml-apiserver /bin/bashAnd run the following commands inside the containercurl -XGET curl -XGET
For running of the old version of Trains the same setting can be added to elasticsearch environment section in the docker compose
Actually the task logs will be lost. The tasks themselves and their reported metrics and plots would stay. The command is the following:curl -XDELETE localhost:9200/events-log-d1bd92a3b039400cbafc60a7a5b1e52b
Enjoy the new version!
Oh, I see. Then maybe we can see some more info in the browser dev tools
Please run these commands and see if you have any "red" statuses in the output:
curl " http://localhost:9200/_cluster/health?pretty "
curl " http://localhost:9200/_cluster/health?level=indices&pretty "