but then the error occurs, after the training und the validating where succesfuly completed
It seems it is failing on the last eval ? could it be testing is missing? is it the same dataset ? can you verify the file is there? (notice I see a mix of / and \ in the file name, this is odd Windows is \ and linux/mac are / , you should never have a mix)
Hi Martin
Thank you very much for your answer and sorry for the late answer. I have testet a few things. The training works fine:
The only difference i see is, that there is no labels.cache file in the test folder.
if the same code is run localy it works:
Hi Martin
Thank you very much for your answer. I have the dataset already uploaded and it is visible by datasets. Also the dataset is downloaded and stored by .clearml. If i try to accses the data with the following code I get an Permission denied error.
......
File "C:\Users\junke\AppData\Local\Programs\Python\Python310\lib\gzip.py", line 174, in init
fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
PermissionError: [Errno 13] Permission denied: 'C:/Users/junke/.clearml/cache/storage_manager/datasets/ds_30892c41582b4537bb9508f3c09ae9ed'
.........
File "C:\Users\junke\Dropbox\MIR\Ausbildung\MAS Automation Management\001_Module\005_Masterarbeit\Software\003_Test_Laptop_Yoga\venv\lib\site-packages\ultralytics\engine\trainer.py", line 120, in init
raise RuntimeError(emojis(f"Dataset '{clean_url(self.args.data)}' error ❌ {e}")) from e
RuntimeError: Dataset ' None ' error [Errno 13] Permission denied: 'C:/Users/junke/.clearml/cache/storage_manager/datasets/ds_30892c41582b4537bb9508f3c09ae9ed'
However if I accses the data directly with:data = r"C:\Users\junke\.clearml\cache\storage_manager\datasets\ds_30892c41582b4537bb9508f3c09ae9ed\0013_Datenset\data.yaml"
there is no errormessage and the data can be accssed.
import pandas as pd
from ultralytics import YOLO
from clearml import Task, Dataset
# Creating a ClearML Task
task = Task.init(
project_name="Training_MASAM_Modell_N",
task_name="Datensatz_0013_Freeze_15",
output_uri=True
)
model = YOLO("yolov8n.pt")
dataset_name = "0013_Dataset"
dataset_project = "Vehicle_Dataset"
dataset_path = Dataset.get(
dataset_name=dataset_name,
dataset_project=dataset_project,
alias="0013_Dataset"
).get_local_copy()
data=dataset_path
#data = r"C:\Users\junke\.clearml\cache\storage_manager\datasets\ds_30892c41582b4537bb9508f3c09ae9ed\0013_Datenset\data.yaml"
def main():
results = model.train(data=data,
epochs=80,
device=0, # 0 = GPU
imgsz=640,
patience=50, # Epochen die gewartet werden bis das Training vorzeitig beendet wird, wenn keine Verbesserung erkannt wird
batch=16, # Anzahl der Bilder pro batch
save=True,
resume=False, # Start Training vom letzten Checkpunkt (Wenn z.B. wegen Fehler abgebrochen wurde)
freeze=None, # Freeze first n Layers, oder Liste von Layern
pretrained=True # Benuetze ein vortrainiertes Modell; default=True
)
if __name__ == '__main__':
main()
Yes, when I understand correctly from the documentation the dataset is the first time downloaded und later on only the increment changes of it:
2023-12-29 20:24:11
2023-12-29 19:24:06,083 - clearml.storage - INFO - Downloading: 255.00MB / 387.75MB @ 48.47MBs from None
2023-12-29 19:24:06,182 - clearml.storage - INFO - Downloading: 260.00MB / 387.75MB @ 50.53MBs from None
2023-12-29 19:24:06,270 - clearml.storage - INFO - Downloading: 265.00MB / 387.75MB @ 56.95MBs from None
2023-12-29 19:24:06,334 - clearml.storage - INFO - Downloading: 270.00MB / 387.75MB @ 78.38MBs from None
If I access the dataset on the same location directly it works fine:
#data = r"C:\Users\junke\.clearml\cache\storage_manager\datasets\ds_30892c41582b4537bb9508f3c09ae9ed\0013_Datenset\data.yaml"
'
' error [Errno 13] Permission denied:
Seems like a permission issue ?
Try to remove your entire clearml cache folder None
The Problem where the / and . Now the process ended without any error:
🙂
try:
import os
...
dataset_path = Dataset.get(
dataset_name=dataset_name,
dataset_project=dataset_project,
alias="0013_Dataset"
).get_local_copy()
dataset_path = os.path.join(dataset_path, "data.yaml")
...
Now it seems to work. I had to add the 0013_Datenset as well:
dataset_path = Dataset.get(
dataset_name=dataset_name,
dataset_project=dataset_project,
alias="0013_Dataset"
).get_local_copy()
dataset_path = os.path.join(dataset_path, "0013_Datenset", "data.yaml")
but then the error occurs, after the training und the validating where succesfuly completed
Yes this part is correct. If i point directly to the data.yaml the training starts without any problem
with #data = r"C:\Users\junke\.clearml\cache\storage_manager\datasets\ds_30892c41582b4537bb9508f3c09ae9ed\0013_Datenset\data.yaml"
Same issue when I try to run a clone of the same program on a google colab worker:
2023-12-29 19:24:12,028 - clearml - INFO - Dataset.get() did not specify alias. Dataset information will not be automatically logged in ClearML Server.
New
available 😃 Update with 'pip install -U ultralytics'
Ultralytics YOLOv8.0.225 🚀 Python-3.10.12 torch-2.1.2+cu121 CUDA:0 (Tesla T4, 15102MiB)
engine/trainer: task=detect, mode=train, model=yolov8n.pt, data=/root/.clearml/cache/storage_manager/datasets/ds_30892c41582b4537bb9508f3c09ae9ed, epochs=80, patience=50, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=0, workers=8, project=None, name=train, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, cfg=None, tracker=botsort.yaml, save_dir=runs/detect/train
Traceback (most recent call last):
File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/ultralytics/engine/trainer.py", line 116, in __init__
self.data = check_det_dataset(self.args.data)
File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/ultralytics/data/utils.py", line 257, in check_det_dataset
if zipfile.is_zipfile(file) or is_tarfile(file):
File "/usr/lib/python3.10/tarfile.py", line 2780, in is_tarfile
t = open(name)
File "/usr/lib/python3.10/tarfile.py", line 1797, in open
return func(name, "r", fileobj, **kwargs)
File "/usr/lib/python3.10/tarfile.py", line 1863, in gzopen
fileobj = GzipFile(name, mode + "b", compresslevel, fileobj)
File "/usr/lib/python3.10/gzip.py", line 174, in __init__
fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
IsADirectoryError: [Errno 21] Is a directory: '/root/.clearml/cache/storage_manager/datasets/ds_30892c41582b4537bb9508f3c09ae9ed'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/.clearml/venvs-builds/3.10/code/training.py", line 38, in <module>
main()
File "/root/.clearml/venvs-builds/3.10/code/training.py", line 26, in main
results = model.train(data=data,
File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/ultralytics/engine/model.py", line 333, in train
self.trainer = (trainer or self._smart_load('trainer'))(overrides=args, _callbacks=self.callbacks)
File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/ultralytics/engine/trainer.py", line 120, in __init__
raise RuntimeError(emojis(f"Dataset '{clean_url(self.args.data)}' error ❌ {e}")) from e
RuntimeError: Dataset '/root/.clearml/cache/storage_manager/datasets/ds_30892c41582b4537bb9508f3c09ae9ed' error ❌ [Errno 21] Is a directory: '/root/.clearml/cache/storage_manager/datasets/ds_30892c41582b4537bb9508f3c09ae9ed'
2023-12-29 20:24:27
Process failed, exit code 1
It worked until it should validate the trainings. Here as well the same error.
2 epochs completed in 0.174 hours.
Optimizer stripped from runs/detect/train/weights/last.pt, 136.7MB
Optimizer stripped from runs/detect/train/weights/best.pt, 136.7MB
Validating runs/detect/train/weights/best.pt...
Ultralytics YOLOv8.0.231 🚀 Python-3.10.12 torch-2.1.2+cu121 CUDA:0 (NVIDIA A100-SXM4-40GB, 40514MiB)
Model summary (fused): 268 layers, 68129346 parameters, 0 gradients, 257.4 GFLOPs
Traceback (most recent call last):
File "/root/.clearml/venvs-builds/3.10/code/training.py", line 40, in <module>
main()
File "/root/.clearml/venvs-builds/3.10/code/training.py", line 28, in main
results = model.train(data=data,
File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/ultralytics/engine/model.py", line 356, in train
self.trainer.train()
File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/ultralytics/engine/trainer.py", line 190, in train
self._do_train(world_size)
File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/ultralytics/engine/trainer.py", line 427, in _do_train
self.final_eval()
File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/ultralytics/engine/trainer.py", line 576, in final_eval
self.metrics = self.validator(model=f)
File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/ultralytics/engine/validator.py", line 139, in __call__
self.data = check_det_dataset(self.args.data)
File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/ultralytics/data/utils.py", line 253, in check_det_dataset
file = check_file(dataset)
File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/ultralytics/utils/checks.py", line 460, in check_file
raise FileNotFoundError(f"'{file}' does not exist")
FileNotFoundError: 'C:/Users/junke/.clearml/cache/storage_manager/datasets/ds_30892c41582b4537bb9508f3c09ae9ed\0013_Datenset\data.yaml' does not exist
2023-12-29 23:14:48
Process failed, exit code 1
Here as well, if I copy the following path:
C:/Users/junke/.clearml/cache/storage_manager/datasets/ds_30892c41582b4537bb9508f3c09ae9ed\0013_Datenset\data.yaml
the file opens.
My complete code is:
import pandas as pd
from ultralytics import YOLO
from clearml import Task, Dataset
# Creating a ClearML Task
task = Task.init(
project_name="Training_MASAM_Modell_N",
task_name="Datensatz_0013_Freeze_15",
output_uri=True
)
model = YOLO("yolov8n.pt")
dataset_name = "0013_Dataset"
dataset_project = "Vehicle_Dataset"
dataset_path = Dataset.get(
dataset_name=dataset_name,
dataset_project=dataset_project,
alias="0013_Dataset"
).get_local_copy()
data=dataset_path
#data = r"C:\Users\junke\.clearml\cache\storage_manager\datasets\ds_30892c41582b4537bb9508f3c09ae9ed\0013_Datenset\data.yaml"
def main():
results = model.train(data=data,
epochs=80,
device=0, # 0 = GPU
imgsz=640,
patience=50, # Epochen die gewartet werden bis das Training vorzeitig beendet wird, wenn keine Verbesserung erkannt wird
batch=16, # Anzahl der Bilder pro batch
save=True,
resume=False, # Start Training vom letzten Checkpunkt (Wenn z.B. wegen Fehler abgebrochen wurde)
freeze=None, # Freeze first n Layers, oder Liste von Layern
pretrained=True # Benuetze ein vortrainiertes Modell; default=True
)
if __name__ == '__main__':
main()
If i set:
#data = r"C:\Users\junke\.clearml\cache\storage_manager\datasets\ds_30892c41582b4537bb9508f3c09ae9ed\0013_Datenset\data.yaml"
the dataset (data.yaml can be accessed. I guess I do not open the dataset correctly.
@<1651395720067944448:profile|GiddyHedgehong81> just to be clear, Dataset.get_local_copy returns a path to your files,
You have to Manually add the additional path to the specific files you need to use. It does Not know that in advance.
That was the initial issue you had, and I assume it is the same one here. does that make sense ?
and the data.yaml file as well:
train: ../train/images
val: ../valid/images
test: ../test/images
nc: 6
names: ['S_60_aktiv', 'S_Verboten_aktiv', 'bus', 'car', 'motorcycle', 'truck']
I have realy an understanding problem. I have started the process from a diffrend computer. I have deleted the complete .clearml folder.
However the training starts and the validating process fails.
Hi @<1651395720067944448:profile|GiddyHedgehong81>
However I need for a yolov8 (Object detection with arround 20k jpgs and .txt files) the data.yaml file:
Just add the entire folder with your files to a dataset, then get it in your code
Add files (you can do that from CLI for example): None
clearml-data add --files my_folder_with_files
Then from code: None
data_path = Dataset.get(dataset_name="my dataset", alias="training dataset").get_local_copy()
# now all my files are in `data_path`
Now i am wondering if this works on a google colab worker as well.
okay so it is downloaded to your machine, and unzipped , is that part correct?
If I access the dataset on the same location directly it works fine:
wait, I'm confused, how is it the datset us there? did it download the dataset?
are you saying this line for example will fail? (assuming you actually have a dataset by that name)
data_path = Dataset.get(dataset_name="002_Datenset_MASAM_for_fintuning", alias="002_Datenset_MASAM_for_fintuning").get_local_copy()
Hi Martin
I just deleted the complet folder, still the same:
Ultralytics YOLOv8.0.225 🚀 Python-3.10.11 torch-2.2.0.dev20231207+cu118 CUDA:0 (NVIDIA GeForce GTX 1650 Ti with Max-Q Design, 4096MiB)
engine\trainer: task=detect, mode=train, model=yolov8n.pt, data=C:/Users/junke/.clearml/cache/storage_manager/datasets/ds_30892c41582b4537bb9508f3c09ae9ed, epochs=80, patience=50, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=0, workers=8, project=None, name=train17, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, cfg=None, tracker=botsort.yaml, save_dir=runs\detect\train17
Traceback (most recent call last):
File "C:\Users\junke\Dropbox\MIR\Ausbildung\MAS Automation Management\001_Module\005_Masterarbeit\Software\003_Test_Laptop_Yoga\venv\lib\site-packages\ultralytics\engine\trainer.py", line 116, in __init__
self.data = check_det_dataset(self.args.data)
File "C:\Users\junke\Dropbox\MIR\Ausbildung\MAS Automation Management\001_Module\005_Masterarbeit\Software\003_Test_Laptop_Yoga\venv\lib\site-packages\ultralytics\data\utils.py", line 257, in check_det_dataset
if zipfile.is_zipfile(file) or is_tarfile(file):
File "C:\Users\junke\AppData\Local\Programs\Python\Python310\lib\tarfile.py", line 2517, in is_tarfile
t = open(name)
File "C:\Users\junke\AppData\Local\Programs\Python\Python310\lib\tarfile.py", line 1632, in open
return func(name, "r", fileobj, **kwargs)
File "C:\Users\junke\AppData\Local\Programs\Python\Python310\lib\tarfile.py", line 1698, in gzopen
fileobj = GzipFile(name, mode + "b", compresslevel, fileobj)
File "C:\Users\junke\AppData\Local\Programs\Python\Python310\lib\gzip.py", line 174, in __init__
fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
PermissionError: [Errno 13] Permission denied: 'C:/Users/junke/.clearml/cache/storage_manager/datasets/ds_30892c41582b4537bb9508f3c09ae9ed'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\junke\Dropbox\MIR\Ausbildung\MAS Automation Management\001_Module\005_Masterarbeit\Software\003_Test_Laptop_Yoga\training.py", line 38, in <module>
main()
File "C:\Users\junke\Dropbox\MIR\Ausbildung\MAS Automation Management\001_Module\005_Masterarbeit\Software\003_Test_Laptop_Yoga\training.py", line 26, in main
results = model.train(data=data,
File "C:\Users\junke\Dropbox\MIR\Ausbildung\MAS Automation Management\001_Module\005_Masterarbeit\Software\003_Test_Laptop_Yoga\venv\lib\site-packages\ultralytics\engine\model.py", line 333, in train
self.trainer = (trainer or self._smart_load('trainer'))(overrides=args, _callbacks=self.callbacks)
File "C:\Users\junke\Dropbox\MIR\Ausbildung\MAS Automation Management\001_Module\005_Masterarbeit\Software\003_Test_Laptop_Yoga\venv\lib\site-packages\ultralytics\engine\trainer.py", line 120, in __init__
raise RuntimeError(emojis(f"Dataset '{clean_url(self.args.data)}' error ❌ {e}")) from e
RuntimeError: Dataset '
' error [Errno 13] Permission denied: 'C:/Users/junke/.clearml/cache/storage_manager/datasets/ds_30892c41582b4537bb9508f3c09ae9ed'
Process finished with exit code 1