RotundSquirrel78 the mongo_4 folder was likely created when running 1.5, but it should be created as part of the migration when running the script as detailed in https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_mongo44_migration/
I think AgitatedDove14 is right, and you missed the migration part for mongodb. You will need to restore the original data since as far as I know mongo will corrupt old data if started using new versions. See https://github.com/allegroai/clearml-server/releases/tag/1.2.0 for the migration instructions.
my original trains server version was 0.14 if I remember correctly. Anywhere I can check it after the upgrade has been done?
My new clearml server is 1.5. I get that from http://localhost:8080/version.json but if there is somewhere else I should look, let me know.
AgitatedDove14 SuccessfulKoala55 , after I ran elastic_update.py (stage 5 as described above), I saw there was a new folder named data/mongo_4. Doesn't it mean mongodb was already migrated?
I will try it and keep you posted. Thanks!
The upgrade is from /home/orpat/trains/data/elastic into /home/orpat/trains/data/elastic_7. Do you different paths in the log? Where?
That's the ES - are you sure you've already performed the ES migration and that the data is in the right folder?
I would say restore the original ES data folder and run the ES migration on it
Yes, that's what I mean. Let me take a look.
The clearml dockers are down right now because I started a new ES migration (elastic_upgrade.py). I started it before you contacted me and I don't want to break it now. So I cannot look at the console right now.
It will probably finish 30 hours from now. If the same problems repeat, we will continue this chat then.
You should do that after applying the ES7 migration. Note however that if you already ran the server with the unmigrated mongodb database, you'll need to restore the original mongodb data before running the mongodb migration as the data is likely corrupted.
AppetizingMouse58 , SuccessfulKoala55 and AgitatedDove14 , after running the ES migration for the 2nd time the problem is solved 🎉 . Thank you all for your help! 🙏
I am not sure it matters for the following output, but anyway please note that the clearml dockers are down right now.
sigalr@momo : ~ $ curl -XGET http://localhost:9200/_cat/indices
yellow open queue_metrics_d1bd92a3b039400cbafc60a7a5b1e52b_2022-06 2F6APbQWSvajTZQ5JxXY1Q 1 1 59 0 26.2kb 26.2kb
yellow open events-plot-d1bd92a3b039400cbafc60a7a5b1e52b bZMKKCaKRXCys6VD_9oDDw 1 1 8556 0 4.1mb 4.1mb
yellow open worker_stats_d1bd92a3b039400cbafc60a7a5b1e52b_2022-06 c85DhBR4R9KZUo5qx7EMjQ 1 1 242 0 28.9kb 28.9kb
yellow open events-training_debug_image-d1bd92a3b039400cbafc60a7a5b1e52b gNvSuqmnQQ-12lk7bjGmrA 1 1 60239 0 18.4mb 18.4mb
yellow open .monitoring-es-6-2020.08.14 EV9OzkBfSh6FniGjUPZBdQ 1 1 18 0 83.5kb 83.5kb
yellow open .monitoring-es-6-2022.06.29 I_B77gPTRjuK75Fs-v0Vcg 1 1 304 0 140kb 140kb
yellow open events-training_stats_scalar-d1bd92a3b039400cbafc60a7a5b1e52b hj6mfnwrSoKrqFj2pNicWg 1 1 588773646 0 105.7gb 105.7gb
yellow open events-log-d1bd92a3b039400cbafc60a7a5b1e52b tBIG8bWNS9egKFvzByrZLQ 1 1 25241848 33 5gb 5gb
Hi RotundSquirrel78 , can you please check that your docker compose file has the correct volume mapping for elasticsearch service? From the output of the upgrade script I assume it should be from /home/orpat/trains/data/elastic_7 into /usr/share/elasticsearch/data
Just to make sure, by running ES migration you mean running elastic_upgrade.py again. Correct?
It took ~36 hours two days ago.
The ES migration log is attached in the 1st message of this thread. Do you see any problems in it?
Is there any way to make sure that the ES migration results are not good?
Ok, I see. And if you run a new experiment in the new version do you see its logs?
Is it ok to restore data/mongo from my backup, and leave all the other files that were created by elastic_upgrade.py (e.g., data/elastic_7) untouched?
What I mean, is: Do I need to run elastic_upgrade.py again, or just the mongo upgrade (clearml-server-1.2.0-migration.py)?
Yes exactly, can you please verify that you use /home/orpat/trains/data/elastic_7 in the docker compose of 1.5?
According to the sizes the data is there and ES sees it.
Is it ok to restore data/mongo from my backup, and leave all the other files that were created by elastic_upgrade.py (e.g., data/elastic_7) untouched?
Assuming you didn't run anything since migrating Elastic (I mean running new experiments and logging new data) than yes, sure 🙂
If you open the browser developer tools and navigate to the task console logs for one of the tasks that you do not get the logs anymore. Do you see any errors (red lines) in the api calls? Can you share the payload and response from the events.get_task_log call?
Yes I've performed the ES migration. The data is in clearml/data/elastic_7.
In file docker-compose.yml I replaced all the strings /opt/clearml/data/elastic_7 into /home/orpat/clearml/data/elastic_7.
Can you please run the following in the command line of the hosting server and share the results?curl -XGET
The sequence is unclear then:
I followed the instructions in https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_es7_migration/ .
Stage 5 ("python elastic_upgrade.py") ended successfully.
Then I skipped "Upgrading to ClearML Server v.1.2. or Newer" and went straight to "Completing the Installation".
Did I do wrong? What should I do to fix it?
Is there any log that maybe details the problem?
Update: I ran the mongo migration script (clearml-server-1.2.0-migration.py) and now I can see my projects! 👏
Now there is a new problem: I don't see any of the logs: console, artefacts, scalars, plots.
Can you help?