Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello, We'Re Getting A Strange Error While Training Yolov5:

Hello, we're getting a strange error while training YoloV5:
Retrying (Retry(total=237, connect=237, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f5c6c163d90>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /v2.13/tasks.get_all 2022-10-12 20:13:56 Retrying (Retry(total=237, connect=237, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f5b6528dd20>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /v2.13/events.add_batch Retrying (Retry(total=236, connect=236, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f5b8235e2f0>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /v2.13/tasks.get_all 2022-10-13 07:58:26 Retrying (Retry(total=236, connect=236, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f5b66bf58d0>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /v2.13/events.add_batch 2022-10-13 07:58:38 Retrying (Retry(total=237, connect=237, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f5b65295f90>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /v2.13/tasks.get_all 2022-10-13 07:58:42 Retrying (Retry(total=235, connect=235, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f5b66ba3850>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /v2.13/events.add_batchHas anyone seen something like this? We're self hosted and our server is up all day. This error is also shown in the ClearML logs by the way

  
  
Posted 2 years ago
Votes Newest

Answers 11


We've updated everything now, launched a new experiment and we're tracking the logs. I'll tell you if I find anything

  
  
Posted 2 years ago

Not really, it's an Ubuntu desktop machine that I'm just updating times to times. I've also got a few pipelines running during my trainings. Do you know any tools that I could use to analyze network errors?

  
  
Posted 2 years ago

Nothing strange in dmesg at least 😕

  
  
Posted 2 years ago

Found it!

  
  
Posted 2 years ago

My computer just did an automatic suspend, as simple as that

  
  
Posted 2 years ago

Do you know where I can find the logs for that?

  
  
Posted 2 years ago

Hi SmugSnake6 , looks like network issue. Did you make any changes to your network/server recently

  
  
Posted 2 years ago

No sorry, I found the where the logs are. And it doesn't seem to have any errors in the logs:
[2022-10-14 17:22:50,771] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all in 3ms [2022-10-14 17:22:50,784] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_by_id in 7ms [2022-10-14 17:22:50,853] [9] [INFO] [clearml.service_repo] Returned 200 for events.add_batch in 182ms [2022-10-14 17:22:50,874] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.edit in 28ms [2022-10-14 17:22:51,687] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all in 3ms [2022-10-14 17:22:53,703] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all in 3ms [2022-10-14 17:22:55,719] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all in 3ms [2022-10-14 17:22:55,985] [9] [INFO] [clearml.service_repo] Returned 200 for events.add_batch in 106ms [2022-10-14 17:22:57,733] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.ping in 4ms [2022-10-14 17:22:57,750] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all in 5ms [2022-10-14 17:22:59,767] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all in 4ms [2022-10-14 17:23:04,233] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_by_id in 7ms [2022-10-14 17:23:04,254] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.edit in 7ms [2022-10-14 17:23:04,266] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.completed in 6ms [2022-10-14 17:24:42,199] [9] [INFO] [clearml.non_responsive_tasks_watchdog] Starting cleanup cycle for running tasks last updated before 2022-10-14 15:24:42.199266 [2022-10-14 17:24:42,203] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks found [2022-10-14 17:24:42,203] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks stopped [...] [clearml.non_responsive_tasks_watchdog] Starting cleanup cycle for running tasks last updated before 2022-10-14 20:24:43.828925 [2022-10-14 22:24:43,833] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks found [2022-10-14 22:24:43,833] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks stopped [2022-10-14 22:25:38,225] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all_ex in 227ms [2022-10-14 22:25:38,284] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_by_id_ex in 5ms [2022-10-14 22:25:41,341] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_by_id_ex in 11ms [2022-10-14 22:25:43,417] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_by_id_ex in 10ms [2022-10-14 22:25:44,118] [9] [INFO] [clearml.service_repo] Returned 200 for events.get_task_log in 664ms [2022-10-14 22:25:44,128] [9] [INFO] [clearml.service_repo] Returned 200 for events.get_task_log in 634ms [2022-10-14 22:25:48,249] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_project_tags in 58ms [2022-10-14 22:25:48,321] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_all_ex in 218ms [2022-10-14 22:25:49,702] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_all_ex in 40ms [2022-10-14 22:25:49,725] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_all_ex in 39ms [2022-10-14 22:25:49,781] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_task_tags in 6ms [2022-10-14 22:25:49,812] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_task_parents in 57ms [2022-10-14 22:25:49,848] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_all_ex in 26ms [2022-10-14 22:25:49,857] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_types in 96ms [2022-10-14 22:25:49,873] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all_ex in 29ms [2022-10-14 22:25:49,919] [9] [INFO] [clearml.service_repo] Returned 200 for users.get_all_ex in 11ms [2022-10-14 22:25:50,204] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_by_id_ex in 84ms [2022-10-14 22:25:51,017] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_task_tags in 4ms [2022-10-14 22:25:51,023] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_model_tags in 16ms [2022-10-14 22:25:52,438] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_by_id_ex in 20ms [2022-10-14 22:25:54,162] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all_ex in 12ms [2022-10-14 22:25:56,878] [9] [INFO] [clearml.service_repo] Returned 200 for events.get_task_plots in 1161ms [2022-10-14 22:25:57,178] [9] [INFO] [clearml.service_repo] Returned 200 for events.debug_images in 1445ms [2022-10-14 22:26:04,168] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all_ex in 12ms [2022-10-14 22:26:14,195] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all_ex in 35ms [2022-10-14 22:39:43,911] [9] [INFO] [clearml.non_responsive_tasks_watchdog] Starting cleanup cycle for running tasks last updated before 2022-10-14 20:39:43.911278 [2022-10-14 22:39:43,915] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks found [2022-10-14 22:39:43,915] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks stopped [2022-10-14 22:54:43,956] [9] [INFO] [clearml.non_responsive_tasks_watchdog] Starting cleanup cycle for running tasks last updated before 2022-10-14 20:54:43.956715 [2022-10-14 22:54:43,961] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks found [2022-10-14 22:54:43,961] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks stopped [...]

  
  
Posted 2 years ago

SmugSnake6 what was is?

  
  
Posted 2 years ago

Can you seee if there are errors in the apiserver?

  
  
Posted 2 years ago

😞

  
  
Posted 2 years ago