No, there was a problem with the particular version migration. The temporary index creation allowed to this and all subsequent migrations to run successfully. So for now your DB is properly aligned with the latest ClearML and the future upgrades should work fine.
@<1668065560107159552:profile|VivaciousPenguin20> Did you re import the example projects after upgrading to v1.14.1? The problem was in the import procedure itself. The tasks that were imported in the previous versions will not have task results
Hi MortifiedDove27 , you can run the following commands on the clearml server host to get the docker logs for the apiserver and elasticsearch:sudo docker logs clearml-apiserver > apiserver.logs 2>&1 sudo docker logs clearml-elastic > elastic.logs 2>&1
Hi WittyOwl57 , there is a chance that the reason is in this setting: Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log ...
First it say says about invalid log option that may require further investigation. Second the file that it tries to write to is logs/gc.log and it is not under the $clearml folder where you give the write permissions to the ES user. I would try cancelling JVM logging at all or specifying the full path to the file so that it would be under the folder that has 1000:1000 o...
Are you running your dockers on Linux or Windows?
Hi SubstantialElk6 , another thing that can be checked is the health of the particular ES indices. Can you please run the below command in the clearml-elastic container and post the results here?curl -XGET
SubstantialElk6 Both indices that are red are not critical for the ClearML functioning and can be deleted like this:curl -XDELETE '
' curl -XDELETE '
'
For the analysis of the possible reasons that lead to it can you please collect the full ES logs to the file and send it here?sudo docker logs clearml-elastic > log.txt 2>&1
Hi ImmenseMole52 , did you do any changes in the docker compose file? If yes, then can you please send your version of the file?
Did you try restarting the docker compose since the problem start happening?
Can you try deleting the application cookie? While being on the trains page in the browser devtools you navigate to Application->Cookies and under it delete any trains cookies that are there. I believe that you will need to login after that
Oh, I see. Then maybe we can see some more info in the browser dev tools
Ok, I see. Then you can enter the apiserver container:sudo docker exec -it clearml-apiserver /bin/bash
And run the following commands inside the containercurl -XGET
curl -XGET
Hi @<1523707653782507520:profile|MelancholyElk85> , what version of the apiserver are you using?
Thanks, I think that I see the problem,
Yes exactly, can you please verify that you use /home/orpat/trains/data/elastic_7 in the docker compose of 1.5?
Can you please run the following in the command line of the hosting server and share the results?curl -XGET
Ok, I see. And if you run a new experiment in the new version do you see its logs?
If you open the browser developer tools and navigate to the task console logs for one of the tasks that you do not get the logs anymore. Do you see any errors (red lines) in the api calls? Can you share the payload and response from the events.get_task_log call?
According to the sizes the data is there and ES sees it.
Hi RotundSquirrel78 , can you please check that your docker compose file has the correct volume mapping for elasticsearch service? From the output of the upgrade script I assume it should be from /home/orpat/trains/data/elastic_7 into /usr/share/elasticsearch/data
The 1.10 version handles files deletion differently so there is chance that it fixes the issue. If you use the default apiserver port then I would try upgrading. If you override the apiserver port then please wait for the hotfix version 1.10.1 that should be released soon
Hi @<1558986867771183104:profile|ShakyKangaroo32> , can you please share the logs from the async_delete docker container?
Hi SarcasticSparrow10 , I am trying to understand whether we have some gaps in the instructions. In the upgrade process did you perform the steps 3-10 of the below instruction? Were there any errors when performing these steps?
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_mongo44_migration
Are you sure that it was performed fully according to the suggested sequence? The error that you posted says that v3.6 data is incompatible with v4.4 and suggests version 4.2 or earlier. Step 3 starts with mongo 4.0 that should be able to open v3.6 data. And then a number of gradual updates through versions 4.0->4.2->4.4 is performed
I mean it is not possible to open v3.6 data in version 4.4. That's why the steps 3-10 are there
I am not sure about the reasons. What you can do is to backup your /opt/trains/data folder periodically (preferably stopping the docker compose before it). Another possibility is to configure your elasticsearch to run as a cluster with 2 or more nodes on the same or different machine. This will allow elastic to replicate your indices to other nodes.
The index "events-plot-d1bd92a3b039400cbafc60a7a5b1e52b" is red meaning that it is corrupted and elastic cannot work with it. The most straightforward solution would be to delete this index but it will result in all the plots generated so far will be lost.