My computer just did an automatic suspend, as simple as that
We've updated everything now, launched a new experiment and we're tracking the logs. I'll tell you if I find anything
No sorry, I found the where the logs are. And it doesn't seem to have any errors in the logs:[2022-10-14 17:22:50,771] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all in 3ms [2022-10-14 17:22:50,784] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_by_id in 7ms [2022-10-14 17:22:50,853] [9] [INFO] [clearml.service_repo] Returned 200 for events.add_batch in 182ms [2022-10-14 17:22:50,874] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.edit in 28ms [2022-10-14 17:22:51,687] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all in 3ms [2022-10-14 17:22:53,703] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all in 3ms [2022-10-14 17:22:55,719] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all in 3ms [2022-10-14 17:22:55,985] [9] [INFO] [clearml.service_repo] Returned 200 for events.add_batch in 106ms [2022-10-14 17:22:57,733] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.ping in 4ms [2022-10-14 17:22:57,750] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all in 5ms [2022-10-14 17:22:59,767] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all in 4ms [2022-10-14 17:23:04,233] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_by_id in 7ms [2022-10-14 17:23:04,254] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.edit in 7ms [2022-10-14 17:23:04,266] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.completed in 6ms [2022-10-14 17:24:42,199] [9] [INFO] [clearml.non_responsive_tasks_watchdog] Starting cleanup cycle for running tasks last updated before 2022-10-14 15:24:42.199266 [2022-10-14 17:24:42,203] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks found [2022-10-14 17:24:42,203] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks stopped [...] [clearml.non_responsive_tasks_watchdog] Starting cleanup cycle for running tasks last updated before 2022-10-14 20:24:43.828925 [2022-10-14 22:24:43,833] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks found [2022-10-14 22:24:43,833] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks stopped [2022-10-14 22:25:38,225] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all_ex in 227ms [2022-10-14 22:25:38,284] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_by_id_ex in 5ms [2022-10-14 22:25:41,341] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_by_id_ex in 11ms [2022-10-14 22:25:43,417] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_by_id_ex in 10ms [2022-10-14 22:25:44,118] [9] [INFO] [clearml.service_repo] Returned 200 for events.get_task_log in 664ms [2022-10-14 22:25:44,128] [9] [INFO] [clearml.service_repo] Returned 200 for events.get_task_log in 634ms [2022-10-14 22:25:48,249] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_project_tags in 58ms [2022-10-14 22:25:48,321] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_all_ex in 218ms [2022-10-14 22:25:49,702] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_all_ex in 40ms [2022-10-14 22:25:49,725] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_all_ex in 39ms [2022-10-14 22:25:49,781] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_task_tags in 6ms [2022-10-14 22:25:49,812] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_task_parents in 57ms [2022-10-14 22:25:49,848] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_all_ex in 26ms [2022-10-14 22:25:49,857] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_types in 96ms [2022-10-14 22:25:49,873] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all_ex in 29ms [2022-10-14 22:25:49,919] [9] [INFO] [clearml.service_repo] Returned 200 for users.get_all_ex in 11ms [2022-10-14 22:25:50,204] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_by_id_ex in 84ms [2022-10-14 22:25:51,017] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_task_tags in 4ms [2022-10-14 22:25:51,023] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_model_tags in 16ms [2022-10-14 22:25:52,438] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_by_id_ex in 20ms [2022-10-14 22:25:54,162] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all_ex in 12ms [2022-10-14 22:25:56,878] [9] [INFO] [clearml.service_repo] Returned 200 for events.get_task_plots in 1161ms [2022-10-14 22:25:57,178] [9] [INFO] [clearml.service_repo] Returned 200 for events.debug_images in 1445ms [2022-10-14 22:26:04,168] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all_ex in 12ms [2022-10-14 22:26:14,195] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all_ex in 35ms [2022-10-14 22:39:43,911] [9] [INFO] [clearml.non_responsive_tasks_watchdog] Starting cleanup cycle for running tasks last updated before 2022-10-14 20:39:43.911278 [2022-10-14 22:39:43,915] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks found [2022-10-14 22:39:43,915] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks stopped [2022-10-14 22:54:43,956] [9] [INFO] [clearml.non_responsive_tasks_watchdog] Starting cleanup cycle for running tasks last updated before 2022-10-14 20:54:43.956715 [2022-10-14 22:54:43,961] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks found [2022-10-14 22:54:43,961] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks stopped [...]
Do you know where I can find the logs for that?
Can you seee if there are errors in the apiserver?
Not really, it's an Ubuntu desktop machine that I'm just updating times to times. I've also got a few pipelines running during my trainings. Do you know any tools that I could use to analyze network errors?
Hi SmugSnake6 , looks like network issue. Did you make any changes to your network/server recently