Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello, We'Re Getting A Strange Error While Training Yolov5:

Hello, we're getting a strange error while training YoloV5:
Retrying (Retry(total=237, connect=237, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f5c6c163d90>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /v2.13/tasks.get_all 2022-10-12 20:13:56 Retrying (Retry(total=237, connect=237, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f5b6528dd20>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /v2.13/events.add_batch Retrying (Retry(total=236, connect=236, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f5b8235e2f0>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /v2.13/tasks.get_all 2022-10-13 07:58:26 Retrying (Retry(total=236, connect=236, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f5b66bf58d0>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /v2.13/events.add_batch 2022-10-13 07:58:38 Retrying (Retry(total=237, connect=237, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f5b65295f90>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /v2.13/tasks.get_all 2022-10-13 07:58:42 Retrying (Retry(total=235, connect=235, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f5b66ba3850>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /v2.13/events.add_batchHas anyone seen something like this? We're self hosted and our server is up all day. This error is also shown in the ClearML logs by the way

  
  
Posted one year ago
Votes Newest

Answers 11


No sorry, I found the where the logs are. And it doesn't seem to have any errors in the logs:
[2022-10-14 17:22:50,771] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all in 3ms [2022-10-14 17:22:50,784] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_by_id in 7ms [2022-10-14 17:22:50,853] [9] [INFO] [clearml.service_repo] Returned 200 for events.add_batch in 182ms [2022-10-14 17:22:50,874] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.edit in 28ms [2022-10-14 17:22:51,687] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all in 3ms [2022-10-14 17:22:53,703] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all in 3ms [2022-10-14 17:22:55,719] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all in 3ms [2022-10-14 17:22:55,985] [9] [INFO] [clearml.service_repo] Returned 200 for events.add_batch in 106ms [2022-10-14 17:22:57,733] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.ping in 4ms [2022-10-14 17:22:57,750] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all in 5ms [2022-10-14 17:22:59,767] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all in 4ms [2022-10-14 17:23:04,233] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_by_id in 7ms [2022-10-14 17:23:04,254] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.edit in 7ms [2022-10-14 17:23:04,266] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.completed in 6ms [2022-10-14 17:24:42,199] [9] [INFO] [clearml.non_responsive_tasks_watchdog] Starting cleanup cycle for running tasks last updated before 2022-10-14 15:24:42.199266 [2022-10-14 17:24:42,203] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks found [2022-10-14 17:24:42,203] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks stopped [...] [clearml.non_responsive_tasks_watchdog] Starting cleanup cycle for running tasks last updated before 2022-10-14 20:24:43.828925 [2022-10-14 22:24:43,833] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks found [2022-10-14 22:24:43,833] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks stopped [2022-10-14 22:25:38,225] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all_ex in 227ms [2022-10-14 22:25:38,284] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_by_id_ex in 5ms [2022-10-14 22:25:41,341] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_by_id_ex in 11ms [2022-10-14 22:25:43,417] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_by_id_ex in 10ms [2022-10-14 22:25:44,118] [9] [INFO] [clearml.service_repo] Returned 200 for events.get_task_log in 664ms [2022-10-14 22:25:44,128] [9] [INFO] [clearml.service_repo] Returned 200 for events.get_task_log in 634ms [2022-10-14 22:25:48,249] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_project_tags in 58ms [2022-10-14 22:25:48,321] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_all_ex in 218ms [2022-10-14 22:25:49,702] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_all_ex in 40ms [2022-10-14 22:25:49,725] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_all_ex in 39ms [2022-10-14 22:25:49,781] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_task_tags in 6ms [2022-10-14 22:25:49,812] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_task_parents in 57ms [2022-10-14 22:25:49,848] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_all_ex in 26ms [2022-10-14 22:25:49,857] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_types in 96ms [2022-10-14 22:25:49,873] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all_ex in 29ms [2022-10-14 22:25:49,919] [9] [INFO] [clearml.service_repo] Returned 200 for users.get_all_ex in 11ms [2022-10-14 22:25:50,204] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_by_id_ex in 84ms [2022-10-14 22:25:51,017] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_task_tags in 4ms [2022-10-14 22:25:51,023] [9] [INFO] [clearml.service_repo] Returned 200 for projects.get_model_tags in 16ms [2022-10-14 22:25:52,438] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_by_id_ex in 20ms [2022-10-14 22:25:54,162] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all_ex in 12ms [2022-10-14 22:25:56,878] [9] [INFO] [clearml.service_repo] Returned 200 for events.get_task_plots in 1161ms [2022-10-14 22:25:57,178] [9] [INFO] [clearml.service_repo] Returned 200 for events.debug_images in 1445ms [2022-10-14 22:26:04,168] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all_ex in 12ms [2022-10-14 22:26:14,195] [9] [INFO] [clearml.service_repo] Returned 200 for tasks.get_all_ex in 35ms [2022-10-14 22:39:43,911] [9] [INFO] [clearml.non_responsive_tasks_watchdog] Starting cleanup cycle for running tasks last updated before 2022-10-14 20:39:43.911278 [2022-10-14 22:39:43,915] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks found [2022-10-14 22:39:43,915] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks stopped [2022-10-14 22:54:43,956] [9] [INFO] [clearml.non_responsive_tasks_watchdog] Starting cleanup cycle for running tasks last updated before 2022-10-14 20:54:43.956715 [2022-10-14 22:54:43,961] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks found [2022-10-14 22:54:43,961] [9] [INFO] [clearml.non_responsive_tasks_watchdog] 0 non-responsive tasks stopped [...]

  
  
Posted one year ago

Not really, it's an Ubuntu desktop machine that I'm just updating times to times. I've also got a few pipelines running during my trainings. Do you know any tools that I could use to analyze network errors?

  
  
Posted one year ago

Hi SmugSnake6 , looks like network issue. Did you make any changes to your network/server recently

  
  
Posted one year ago

Can you seee if there are errors in the apiserver?

  
  
Posted one year ago

Do you know where I can find the logs for that?

  
  
Posted one year ago

Nothing strange in dmesg at least 😕

  
  
Posted one year ago

We've updated everything now, launched a new experiment and we're tracking the logs. I'll tell you if I find anything

  
  
Posted one year ago

My computer just did an automatic suspend, as simple as that

  
  
Posted one year ago

SmugSnake6 what was is?

  
  
Posted one year ago

😞

  
  
Posted one year ago

Found it!

  
  
Posted one year ago