SuccessfulKoala55 and AppetizingMouse58 Thanks you very much!!
I have a future question:
Does this fix should harm in future cleraml-server upgrade?
Or what the best practice to upgrade after doing it?
It seems the automatic MongoDB migration failed on startup
the index creation:[ec2-user@ip-172-31-26-41 ~]$ sudo docker exec -it clearml-mongo /bin/bash root@3fc365193ed0:/# mongo MongoDB shell version v3.6.5 connecting to: mongodb://127.0.0.1:27017 MongoDB server version: 3.6.5 Welcome to the MongoDB shell. For interactive help, type "help". For more comprehensive documentation, see
Questions? Try the support group
`
Server has startup warnings:
2021-01-25T05:58:37.309+0000 I CONTROL [initandlisten]
2021-01-25T05:58:37.309+0000 I CONTROL [initandlisten] ** WARNING: Access control is not enabled for the database.
2021-01-25T05:58:37.309+0000 I CONTROL [initandlisten] ** Read and write access to data and configuration is unrestricted.
2021-01-25T05:58:37.309+0000 I CONTROL [initandlisten]
use auth
switched to db auth
db.user.createIndex({"name": 1, "comapny": 1})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 2,
"numIndexesAfter" : 3,
"ok" : 1
}
bye
root@3fc365193ed0:/# exit `
someone in my company started a training 😥 , will do it after it will finish.. and will update
Thanks you are the best 🙏
then start the server again and see if you get the errors in the log
Hi SuccessfulKoala55 ,
I down the server:[ec2-user@ip-172-31-26-41 ~]$ sudo docker-compose -f /opt/clearml/docker-compose.yml down WARNING: The CLEARML_HOST_IP variable is not set. Defaulting to a blank string. WARNING: The CLEARML_AGENT_GIT_USER variable is not set. Defaulting to a blank string. WARNING: The CLEARML_AGENT_GIT_PASS variable is not set. Defaulting to a blank string. Stopping clearml-webserver ... done Stopping clearml-agent-services ... done Stopping clearml-apiserver ... done Stopping clearml-redis ... done Stopping clearml-fileserver ... done Stopping clearml-mongo ... done Stopping clearml-elastic ... done Removing clearml-webserver ... done Removing clearml-agent-services ... done Removing clearml-apiserver ... done Removing clearml-redis ... done Removing clearml-fileserver ... done Removing clearml-mongo ... done Removing clearml-elastic ... done Removing network clearml_backend Removing network clearml_frontend
then try the commad:[ec2-user@ip-172-31-26-41 ~]$ sudo docker exec -it clearml-mongo /bin/bash Error: No such container: clearml-mongo
what did I done wrong?
Hi CooperativeFox72 , there was a typo in the index creation instructions ("comapny" instead of "company"). Please try the following sequence in mongo shell and then starting the apiserver:use auth db.user.createIndex({"name": 1, "company": 1})
Of course, do that while the server is down
[2021-01-24 17:02:25,660] [8] [INFO] [trains.service_repo] Returned 200 for queues.get_all in 2ms [2021-01-24 17:02:25,674] [8] [INFO] [trains.service_repo] Returned 200 for queues.get_next_task in 8ms [2021-01-24 17:02:26,696] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 36ms [2021-01-24 17:02:26,742] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 78ms [2021-01-24 17:02:27,169] [8] [INFO] [trains.service_repo] Returned 200 for projects.get_all_ex in 3ms [2021-01-24 17:02:27,638] [8] [INFO] [trains.service_repo] Returned 200 for tasks.get_by_id in 100ms [2021-01-24 17:02:28,923] [8] [INFO] [trains.service_repo] Returned 200 for projects.get_all_ex in 12ms [2021-01-24 17:02:28,963] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 105ms [2021-01-24 17:02:29,960] [8] [INFO] [trains.service_repo] Returned 200 for tasks.get_by_id in 138ms [2021-01-24 17:02:30,684] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 30ms [2021-01-24 17:02:30,691] [8] [INFO] [trains.service_repo] Returned 200 for queues.get_all in 2ms [2021-01-24 17:02:30,707] [8] [INFO] [trains.service_repo] Returned 200 for queues.get_next_task in 8ms [2021-01-24 17:02:31,611] [8] [INFO] [trains.service_repo] Returned 200 for projects.get_all_ex in 2ms [2021-01-24 17:02:31,738] [8] [INFO] [trains.service_repo] Returned 200 for tasks.get_by_id_ex in 26ms [2021-01-24 17:02:31,821] [8] [ERROR] [trains.service_repo] 'list' object has no attribute 'values' Traceback (most recent call last): File "/opt/trains/apiserver/service_repo/service_repo.py", line 273, in handle_call ret = endpoint.func(call, company, call.data_model) File "/opt/trains/apiserver/services/tasks.py", line 197, in get_by_id_ex unprepare_from_saved(call, tasks) File "/opt/trains/apiserver/services/tasks.py", line 349, in unprepare_from_saved artifacts_unprepare_from_saved(fields=data) File "/opt/trains/apiserver/bll/task/artifacts.py", line 43, in artifacts_unprepare_from_saved value=sorted(artifacts.values(), key=itemgetter("key", "mode")), AttributeError: 'list' object has no attribute 'values' [2021-01-24 17:02:31,821] [8] [ERROR] [trains.service_repo] Returned 500 for tasks.get_by_id_ex in 121ms, msg='list' object has no attribute 'values' [2021-01-24 17:02:31,824] [8] [INFO] [trains.service_repo] Returned 200 for events.get_task_log in 119ms [2021-01-24 17:02:32,167] [8] [INFO] [trains.service_repo] Returned 200 for tasks.ping in 5ms [2021-01-24 17:02:32,475] [8] [INFO] [trains.service_repo] Returned 200 for tasks.get_by_id in 87ms [2021-01-24 17:02:32,675] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 77ms [2021-01-24 17:02:32,697] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 28ms [2021-01-24 17:02:32,902] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 63ms [2021-01-24 17:02:34,773] [8] [INFO] [trains.service_repo] Returned 200 for tasks.get_by_id in 98ms [2021-01-24 17:02:35,721] [8] [INFO] [trains.service_repo] Returned 200 for queues.get_all in 2ms [2021-01-24 17:02:35,739] [8] [INFO] [trains.service_repo] Returned 200 for queues.get_next_task in 11ms [2021-01-24 17:02:36,386] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 73ms [2021-01-24 17:02:36,715] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 27ms [2021-01-24 17:02:36,750] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 73ms [2021-01-24 17:02:36,792] [8] [INFO] [trains.service_repo] Returned 200 for projects.get_all_ex in 6ms [2021-01-24 17:02:36,795] [8] [INFO] [trains.service_repo] Returned 200 for tasks.get_by_id_ex in 6ms [2021-01-24 17:02:36,933] [8] [INFO] [trains.service_repo] Returned 200 for tasks.get_by_id_ex in 154ms [2021-01-24 17:02:37,034] [8] [INFO] [trains.service_repo] Returned 200 for tasks.get_by_id in 88ms [2021-01-24 17:02:37,096] [8] [INFO] [trains.service_repo] Returned 200 for events.get_task_log in 13ms [2021-01-24 17:02:38,642] [8] [INFO] [trains.service_repo] Returned 200 for projects.get_all_ex in 3ms [2021-01-24 17:02:39,320] [8] [INFO] [trains.service_repo] Returned 200 for tasks.get_by_id in 82ms [2021-01-24 17:02:40,108] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 74ms [2021-01-24 17:02:40,694] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 24ms [2021-01-24 17:02:40,758] [8] [INFO] [trains.service_repo] Returned 200 for queues.get_all in 6ms [2021-01-24 17:02:40,771] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 62ms [2021-01-24 17:02:40,781] [8] [INFO] [trains.service_repo] Returned 200 for queues.get_next_task in 6ms [2021-01-24 17:02:41,263] [8] [INFO] [trains.service_repo] Returned 200 for tasks.get_configuration_names in 8ms [2021-01-24 17:02:41,264] [8] [INFO] [trains.service_repo] Returned 200 for projects.get_all_ex in 2ms [2021-01-24 17:02:41,419] [8] [INFO] [trains.service_repo] Returned 200 for projects.get_all_ex in 4ms [2021-01-24 17:02:41,574] [8] [INFO] [trains.service_repo] Returned 200 for tasks.get_by_id in 86ms [2021-01-24 17:02:43,873] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 156ms [2021-01-24 17:02:43,897] [8] [INFO] [trains.service_repo] Returned 200 for tasks.get_by_id in 138ms [2021-01-24 17:02:44,644] [8] [INFO] [trains.non_responsive_tasks_watchdog] Starting cleanup cycle for running tasks last updated before 2021-01-24 15:02:44.644426 [2021-01-24 17:02:44,646] [8] [INFO] [trains.non_responsive_tasks_watchdog] 0 non-responsive tasks found [2021-01-24 17:02:44,646] [8] [INFO] [trains.non_responsive_tasks_watchdog] 0 non-responsive tasks stopped [2021-01-24 17:02:44,686] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 23ms [2021-01-24 17:02:45,795] [8] [INFO] [trains.service_repo] Returned 200 for queues.get_all in 2ms [2021-01-24 17:02:45,812] [8] [INFO] [trains.service_repo] Returned 200 for queues.get_next_task in 11ms [2021-01-24 17:02:46,196] [8] [INFO] [trains.service_repo] Returned 200 for tasks.get_by_id in 122ms [2021-01-24 17:02:47,425] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 73ms [2021-01-24 17:02:48,166] [8] [INFO] [trains.service_repo] Returned 200 for projects.get_all_ex in 3ms [2021-01-24 17:02:48,325] [8] [INFO] [trains.service_repo] Returned 200 for tasks.get_by_id_ex in 19ms [2021-01-24 17:02:48,375] [8] [ERROR] [trains.service_repo] 'list' object has no attribute 'values' Traceback (most recent call last): File "/opt/trains/apiserver/service_repo/service_repo.py", line 273, in handle_call ret = endpoint.func(call, company, call.data_model) File "/opt/trains/apiserver/services/tasks.py", line 197, in get_by_id_ex unprepare_from_saved(call, tasks) File "/opt/trains/apiserver/services/tasks.py", line 349, in unprepare_from_saved artifacts_unprepare_from_saved(fields=data) File "/opt/trains/apiserver/bll/task/artifacts.py", line 43, in artifacts_unprepare_from_saved value=sorted(artifacts.values(), key=itemgetter("key", "mode")), AttributeError: 'list' object has no attribute 'values' [2021-01-24 17:02:48,379] [8] [ERROR] [trains.service_repo] Returned 500 for tasks.get_by_id_ex in 109ms, msg='list' object has no attribute 'values' [2021-01-24 17:02:48,454] [8] [INFO] [trains.service_repo] Returned 200 for tasks.get_by_id in 77ms [2021-01-24 17:02:48,687] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 25ms [2021-01-24 17:02:48,769] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 76ms [2021-01-24 17:02:50,709] [8] [INFO] [trains.service_repo] Returned 200 for tasks.get_by_id in 81ms [2021-01-24 17:02:50,824] [8] [INFO] [trains.service_repo] Returned 200 for queues.get_all in 2ms [2021-01-24 17:02:50,842] [8] [INFO] [trains.service_repo] Returned 200 for queues.get_next_task in 8ms [2021-01-24 17:02:51,075] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 71ms [2021-01-24 17:02:52,552] [8] [INFO] [trains.service_repo] Returned 200 for tasks.get_by_id in 176ms [2021-01-24 17:02:52,699] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 36ms [2021-01-24 17:02:52,717] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 52ms [2021-01-24 17:02:53,006] [8] [INFO] [trains.service_repo] Returned 200 for tasks.get_by_id in 120ms [2021-01-24 17:02:53,007] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 115ms [2021-01-24 17:02:54,691] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 27ms [2021-01-24 17:02:54,772] [8] [INFO] [trains.service_repo] Returned 200 for workers.status_report in 12ms [2021-01-24 17:02:54,797] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 73ms [2021-01-24 17:02:55,286] [8] [INFO] [trains.service_repo] Returned 200 for tasks.get_by_id in 77ms [2021-01-24 17:02:55,853] [8] [INFO] [trains.service_repo] Returned 200 for queues.get_all in 2ms [2021-01-24 17:02:55,866] [8] [INFO] [trains.service_repo] Returned 200 for queues.get_next_task in 7ms [2021-01-24 17:02:56,718] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 50ms [2021-01-24 17:02:57,531] [8] [INFO] [trains.service_repo] Returned 200 for tasks.get_by_id in 71ms [2021-01-24 17:02:58,561] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 79ms [2021-01-24 17:02:58,708] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 24ms [2021-01-24 17:02:58,810] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 76ms [2021-01-24 17:02:59,814] [8] [INFO] [trains.service_repo] Returned 200 for tasks.get_by_id in 83ms [2021-01-24 17:03:00,882] [8] [INFO] [trains.service_repo] Returned 200 for queues.get_all in 2ms [2021-01-24 17:03:00,901] [8] [INFO] [trains.service_repo] Returned 200 for queues.get_next_task in 13ms [2021-01-24 17:03:02,081] [8] [INFO] [trains.service_repo] Returned 200 for tasks.get_by_id in 70ms [2021-01-24 17:03:02,216] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 62ms [2021-01-24 17:03:02,712] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 25ms [2021-01-24 17:03:02,736] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 73ms [2021-01-24 17:03:04,280] [8] [INFO] [trains.service_repo] Returned 200 for tasks.ping in 7ms [2021-01-24 17:03:04,524] [8] [INFO] [trains.service_repo] Returned 200 for tasks.get_by_id in 76ms [2021-01-24 17:03:05,913] [8] [INFO] [trains.service_repo] Returned 200 for queues.get_all in 2ms [2021-01-24 17:03:05,934] [8] [INFO] [trains.service_repo] Returned 200 for queues.get_next_task in 14ms [2021-01-24 17:03:06,020] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 98ms [2021-01-24 17:03:06,787] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 96ms [2021-01-24 17:03:06,866] [8] [INFO] [trains.service_repo] Returned 200 for events.add_batch in 178ms
does it ok that it looks for files in /opt/trains
? since we move all to /opt/clearml
no?File "/opt/trains/apiserver/mongo/initialize/migration.py"
Are you sure you previously had 0.16.1? From the log it seems you either had an empty database or that you had a Trains Server <0.14.0
I update to the new version 0.16.1 few weeks away and it works using the elastic_upgrade.py
Is this after you've created the index using the instructions I sent?
Can you share the apiserver logs? Use docker logs clearml-apiserver
No, there was a problem with the particular version migration. The temporary index creation allowed to this and all subsequent migrations to run successfully. So for now your DB is properly aligned with the latest ClearML and the future upgrades should work fine.
Obviously you have to have the server up when you do that... 🙂
I did it and still getting the same error 😥
Thanks CooperativeFox72 , looking into it
Anyway, a quick fix could be to create the mongo index that's failing the imgration
First, go into the MongoDB docker instance using:sudo docker exec -it clearml-mongo /bin/bash
Then, inside the docker, start the MongoDB CLI using:mongo
Then, enter these two commands:use auth db.user.createIndex({"name": 1, "comapny": 1})