Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
I'M Experiencing An Issue With My Yolo Training Script When Using Clearml. Although The Training Process Itself Completes Successfully (As Indicated By The "Training Is Finished" Message), The Script Appears To Hang Indefinitely After This Point. The Proc

I'm experiencing an issue with my YOLO training script when using ClearML. Although the training process itself completes successfully (as indicated by the "training is finished" message), the script appears to hang indefinitely after this point. The process doesn't terminate on its own, forcing me to use CTRL+C to stop it manually.
Code Snippet

import os
os.environ['YOLO_VERBOSE'] = 'false'
from ultralytics import YOLO
import multiprocessing as mp
from clearml import Task
task = Task.init(
    project_name='TEST',
    task_name="YOLO_TRAIN",
    output_uri=True,
)
# Load a model
model = YOLO(
    "models/yolo11n-seg.pt"
)  # load a pretrained model (recommended for training)
# Train the model
print('initializing training..')
model.train(
    data="data/YOLO_DATASETS/data.yml",
    batch=-1,
    lr0=1e-3,
    optimizer="AdamW",
    epochs=1,
    imgsz=1024,
    pretrained=True,
    verbose=False,
    workers=mp.cpu_count(),
    patience=200,
    plots=True
)
print('training is finished')
task.flush()
task.close()

Console Output

(yolo-training) joao@LCPServer:~/Experimentos/_CLEARML/project$ python script.py 
ClearML Task: created new task id=7f690a8a560f4655a79b2d015a33c5dd
======> WARNING! Git diff too large to store (1323kb), skipping uncommitted changes <======
ClearML results page: 

2025-02-27 17:30:06,673 - clearml.model - INFO - Selected model id: 93bb56d6459a461c928ec14e493d4ded
initializing training..
/home/joao/miniconda3/envs/yolo-training/lib/python3.9/site-packages/albumentations/__init__.py:13: UserWarning:
A new version of Albumentations is available: 2.0.4 (you have 1.4.17). Upgrade using: pip install -U albumentations. To disable automatic update checks, set the environment variable NO_ALBUMENTATIONS_UPDATE to 1.
training is finished
                                             0% | 0.00/5.8 MB [00:00<?, ?MB/s]: /home/joao/miniconda3/envs/yolo-training/lib/python3.9/site-packages/tqdm/std.py:636: TqdmWarning:
clamping frac to range [0, 1]
██████████████████████████████████ 100% | 5.80/5.8 MB [00:00<00:00, 13.80MB/s]: 

At this point, the script hangs indefinitely, and I have to manually terminate it with CTRL+C, which produces the following stack trace:

^CTraceback (most recent call last):
  File "/home/joao/Experimentos/_CLEARML/project/script.py", line 37, in <module>
    task.close()
  File "/home/joao/miniconda3/envs/yolo-training/lib/python3.9/site-packages/clearml/task.py", line 2504, in close
    self.__shutdown()
  File "/home/joao/miniconda3/envs/yolo-training/lib/python3.9/site-packages/clearml/task.py", line 4656, in __shutdown
    self.flush(wait_for_uploads=True)
  File "/home/joao/miniconda3/envs/yolo-training/lib/python3.9/site-packages/clearml/task.py", line 2453, in flush
    self.__reporter.wait_for_events()
  File "/home/joao/miniconda3/envs/yolo-training/lib/python3.9/site-packages/clearml/backend_interface/metrics/reporter.py", line 337, in wait_for_events
    return report_service.wait_for_events(timeout=timeout)
  File "/home/joao/miniconda3/envs/yolo-training/lib/python3.9/site-packages/clearml/backend_interface/metrics/reporter.py", line 129, in wait_for_events
    if self._empty_state_event.wait(timeout=1.0):
  File "/home/joao/miniconda3/envs/yolo-training/lib/python3.9/site-packages/clearml/utilities/process/mp.py", line 449, in wait
    return self._event.wait(timeout=timeout)
  File "/home/joao/miniconda3/envs/yolo-training/lib/python3.9/multiprocessing/synchronize.py", line 349, in wait
    self._cond.wait(timeout)
  File "/home/joao/miniconda3/envs/yolo-training/lib/python3.9/multiprocessing/synchronize.py", line 261, in wait
    return self._wait_semaphore.acquire(True, timeout)
  File "/home/joao/miniconda3/envs/yolo-training/lib/python3.9/site-packages/clearml/utilities/process/exit_hooks.py", line 157, in signal_handler
    return org_handler if not callable(org_handler) else org_handler(sig, frame)
KeyboardInterrupt

Environment

  • Python 3.9
  • YOLO training environment (conda)
  • ClearML latest version
  • Ultralytics YOLOQuestions
  • Why does the script hang after training completion, even though "training is finished" is printed?
  • Are there any recommended configurations or changes to make the script terminate properly after training?
  • Could this be related to background processes or threads started by either YOLO or ClearML that aren't being properly closed?Any guidance or suggestions for fixing this issue would be greatly appreciated.
  
  
Posted one month ago
Votes Newest

Answers 3


An update, Setting detect_repository = False in the auto_connect_frameworks in Task.init parameter resolved the hanging issue.

  
  
Posted one month ago

Yes, the files are being uploaded to files.clear.ml . For example, I can see logs like this during the process:
2025-02-28 10:00:46,013 - clearml.Task - INFO - Completed model upload to None
The problem is that after this upload, the script seems to hang and doesn't terminate automatically, even with the calls to task.flush() and task.close() .

  
  
Posted one month ago

Hi RoundLion96 , it seems like the code is uploading something and waits for it to finish - are you configuring ClearML to upload the resulting model somewhere?

  
  
Posted one month ago