Hi SmugSnake6 , looks like network issue. Did you make any changes to your network/server recently
Not really, it's an Ubuntu desktop machine that I'm just updating times to times. I've also got a few pipelines running during my trainings. Do you know any tools that I could use to analyze network errors?
Can you seee if there are errors in the apiserver?
Do you know where I can find the logs for that?
No sorry, I found the where the logs are. And it doesn't seem to have any errors in the logs:[2022-10-14 17:22:50,771] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all in 3ms [2022-10-14 17:22:50,784] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_by_id in 7ms [2022-10-14 17:22:50,853] [9] [INFO] [clearml.service_repo] Returned 200 for events.add_batch in 182ms [2022-10-14 17:22:50,874] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.edit in 28ms [2022-10-14 17:22:51,687] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all in 3ms [2022-10-14 17:22:53,703] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all in 3ms [2022-10-14 17:22:55,719] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all in 3ms [2022-10-14 17:22:55,985] [9] [INFO] [clearml.service_repo] Returned 200 for events.add_batch in 106ms [2022-10-14 17:22:57,733] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.ping in 4ms [2022-10-14 17:22:57,750] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all in 5ms [2022-10-14 17:22:59,767] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all in 4ms [2022-10-14 17:23:04,233] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_by_id in 7ms [2022-10-14 17:23:04,254] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.edit in 7ms [2022-10-14 17:23:04,266] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.completed in 6ms [2022-10-14 17:24:42,199] [9] [INFO] [clearml.non_responsive_tasks_watchdog] Starting cleanup cycle for running tasks last updated before 2022-10-14 15:24:42.199266 [2022-10-14 17:24:42,203] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks found [2022-10-14 17:24:42,203] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks stopped [...] [clearml.non_responsive_tasks_watchdog] Starting cleanup cycle for running tasks last updated before 2022-10-14 20:24:43.828925 [2022-10-14 22:24:43,833] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks found [2022-10-14 22:24:43,833] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks stopped [2022-10-14 22:25:38,225] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all_ex in 227ms [2022-10-14 22:25:38,284] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_by_id_ex in 5ms [2022-10-14 22:25:41,341] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_by_id_ex in 11ms [2022-10-14 22:25:43,417] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_by_id_ex in 10ms [2022-10-14 22:25:44,118] [9] [INFO] [clearml.service_repo] Returned 200 for events.get_task_log in 664ms [2022-10-14 22:25:44,128] [9] [INFO] [clearml.service_repo] Returned 200 for events.get_task_log in 634ms [2022-10-14 22:25:48,249] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_project_tags in 58ms [2022-10-14 22:25:48,321] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_all_ex in 218ms [2022-10-14 22:25:49,702] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_all_ex in 40ms [2022-10-14 22:25:49,725] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_all_ex in 39ms [2022-10-14 22:25:49,781] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_task_tags in 6ms [2022-10-14 22:25:49,812] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_task_parents in 57ms [2022-10-14 22:25:49,848] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_all_ex in 26ms [2022-10-14 22:25:49,857] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_types in 96ms [2022-10-14 22:25:49,873] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all_ex in 29ms [2022-10-14 22:25:49,919] [9] [INFO] [clearml.service_repo] Returned 200 for users.get_all_ex in 11ms [2022-10-14 22:25:50,204] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_by_id_ex in 84ms [2022-10-14 22:25:51,017] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_task_tags in 4ms [2022-10-14 22:25:51,023] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_model_tags in 16ms [2022-10-14 22:25:52,438] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_by_id_ex in 20ms [2022-10-14 22:25:54,162] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all_ex in 12ms [2022-10-14 22:25:56,878] [9] [INFO] [clearml.service_repo] Returned 200 for events.get_task_plots in 1161ms [2022-10-14 22:25:57,178] [9] [INFO] [clearml.service_repo] Returned 200 for events.debug_images in 1445ms [2022-10-14 22:26:04,168] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all_ex in 12ms [2022-10-14 22:26:14,195] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all_ex in 35ms [2022-10-14 22:39:43,911] [9] [INFO] [clearml.non_responsive_tasks_watchdog] Starting cleanup cycle for running tasks last updated before 2022-10-14 20:39:43.911278 [2022-10-14 22:39:43,915] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks found [2022-10-14 22:39:43,915] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks stopped [2022-10-14 22:54:43,956] [9] [INFO] [clearml.non_responsive_tasks_watchdog] Starting cleanup cycle for running tasks last updated before 2022-10-14 20:54:43.956715 [2022-10-14 22:54:43,961] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks found [2022-10-14 22:54:43,961] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks stopped [...]
We've updated everything now, launched a new experiment and we're tracking the logs. I'll tell you if I find anything
My computer just did an automatic suspend, as simple as that