No, there was a problem with the particular version migration. The temporary index creation allowed to this and all subsequent migrations to run successfully. So for now your DB is properly aligned with the latest ClearML and the future upgrades should work fine.
Hi @<1523701260895653888:profile|QuaintJellyfish58> . For the issue #229: we found and fixed the problem. The fix will be available in the coming patch for the v1.14 release. For the issue 228 I requested more info from you in the github
Setting up an elastic cluster requires some devops. You can search for "setup elasticsearch 7 cluster" in the internet and there are some tutorials there. Stopping your docker-compose once in a certain period of time and backing up the /opt/trains/data folder is more straightforward and it would backup also the data that we store in mongodb.
@<1523701868901961728:profile|ReassuredTiger98> Strange:( in 1.10 we already had the code for clearing ES scrolls created during the task deletion. I would recommend upgrading to the latest release v1.12.1 anyway. In addition you can instruct ES to allow more open scrolls like below. By default it is limited to 500.
SubstantialBaldeagle49 This is fine. When you start docker-compose it takes different time for the services to start. Apiserver waits for the Elasticsearch to start and proceeds once it is ready. Can you reproduce the buckets issue and share the apiserver log that contains it?
Thanks for the update. What can be seen from the log is that for some reason after copying of couple of indices Elasticsearch 7 becomes unavailable. I think we can find the reasons in the Elasticsearch 7 logs. I can send you the instructions on how to proceed (it requires a minimal change to the upgrade script so that the upgrade containers are not removed after the script run and inspection of ES7 logs)
Hi JitteryCoyote63 , you mentioned that download task logs brings all the events. It would be interesting to compare the events that are in the download log but not in the task log screen with those that are returned in the screen too. Can you please share the download task logs file and the request and response that you get from the events.get_task_log for the same task?
Hi CooperativeFox72 , there was a typo in the index creation instructions ("comapny" instead of "company"). Please try the following sequence in mongo shell and then starting the apiserver:use auth db.user.createIndex({"name": 1, "company": 1})
Do you see any error in the browser network tab?
Can you share all the error info that you get in the network tab?
Hi @<1523701868901961728:profile|ReassuredTiger98> , what version of the apiserver are you using?
Hi IdealPanda97 , can you please check your available disk space and available RAM? According to the logs all the services (Elastic, Mongo, Redis) fail to start
This one is indeed dynamic but can be set as follows: "plot_len":{"type":"long"}
I am not sure about the reasons. What you can do is to backup your /opt/trains/data folder periodically (preferably stopping the docker compose before it). Another possibility is to configure your elasticsearch to run as a cluster with 2 or more nodes on the same or different machine. This will allow elastic to replicate your indices to other nodes.
Can you run 'ls -al' in the /opt/trains/data folder and also in the /opt/trains/data/elastic_7 folder and send the output?
Hi H4dr1en, there is a chance that the problem is that in parallel reindexing of data. You can try to replace parallel=max(docker_resources.cpus // 2, 1)
at line 190 with
parallel=1
I think you will need to remove the /opt/trains/data/elastic_7 folder before script restart
Yes, it is safe to put number_of_replicas to 0 and refresh_interval to -1 for the target index before the reindex and then put them back after the reindex is finished
Yes exactly, can you please verify that you use /home/orpat/trains/data/elastic_7 in the docker compose of 1.5?
According to the sizes the data is there and ES sees it.
@<1523701066867150848:profile|JitteryCoyote63> The requirements list the client library that apiserver uses to access the Elasticsearch. This library is capable of working with both Elasticsearch 7 and 8
Hi QuaintJellyfish58 in the latest data that you sent I see only the responses (some of them are marked as payloads but they are actually responses). What would be very interesting is to see the requests (payloads) that resulted in the following empty responses:
` # response
{"meta":{"id":"aaaffe49ace64f1a8b0211925afcfd32","trx":"aaaffe49ace64f1a8b0211925afcfd32","endpoint":{"name":"projects.get_all_ex","requested_version":"2.20","actual_version":"1.0"},"result_code":200,"result_subcode":0,...
HiĀ ImmenseMole52 , did you do any changes in the docker compose file? If yes, then can you please send your version of the file?
Oh, I see. Then maybe we can see some more info in the browser dev tools
Hi UnevenDolphin73 . how many artifacts do you have on this task? We are storing task metadata in Mongo and there is a limit of 16Mb per a single document. While the artifact itself is not stored under the task there is some metadata (notably the uri and display_data/preview) that is stored for each artifact
Hi DefeatedCrab47 , ES docker requires that it is data folder belongs to 1000:1000 user and group. If you want to transfer your existing data from trains 15.1 then please follow the guide https://allegro.ai/docs/deploying_trains/trains_server_es7_migration/
The script that is run in this guide should create elastic_7 folder with the correct permissions and transfer all your existing data
Hi ExasperatedCrocodile76 , what version of the clearml server are you using? You can see it in the bottom right corner of the Settings screen
Hi @<1523701868901961728:profile|ReassuredTiger98> , how exactly do you override the values in storage_credentials file? Do you prepare a new docker image with the changed file or map this file from outside with the volume mapping in the docker compose or through the env variables? What is also important is that you do this override for the async_delete service. It is the service that actually uses the storage credentials. Not the apiserver itself
As long as you delete only from the deleted tasks folders it should be OK