Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi I Have A Most Probably A Beginer Question Abour Loading The Data In Pycharm And Later On In Google Colab From An Dataset From Clearml. I Used From Page:

Hi
I have a most probably a beginer question abour loading the data in pycharm and later on in google colab from an Dataset from clearML.
I used from page: None and from the youtube video the following example:

# Read the data
data_path = Dataset.get(dataset_name="Fashion MNIST", alias="Fashion MNIST").get_local_copy()
fashion_mnist_test = pd.read_csv(f"{data_path}/fashion-mnist_test.csv")
X_test = np.array(fashion_mnist_test.iloc[:,1:])
y_test = np.array(fashion_mnist_test.iloc[:,0])
dtest = xgb.DMatrix(X_test, label=y_test)

However I need for a yolov8 (Object detection with arround 20k jpgs and .txt files) the data.yaml file:

model = YOLO("yolov8n.pt")
data = r"C:\Users\junke\.clearml\cache\storage_manager\datasets\ds_1019154914df4346a316c6e63a7237c9\data.yaml"
#data_path = Dataset.get(dataset_name="002_Datenset_MASAM_for_fintuning", alias="002_Datenset_MASAM_for_fintuning").get_local_copy()


def main():
    results = model.train(data=data,
                          epochs=80,

at the moment I found not a way how I could allocate the data.yaml file in the Dateset.

Thank you for your support and help:-)

  
  
Posted 4 months ago
Votes Newest

Answers 31


but then the error occurs, after the training und the validating where succesfuly completed

It seems it is failing on the last eval ? could it be testing is missing? is it the same dataset ? can you verify the file is there? (notice I see a mix of / and \ in the file name, this is odd Windows is \ and linux/mac are / , you should never have a mix)

  
  
Posted 3 months ago

Hi Martin
Thank you very much for your answer and sorry for the late answer. I have testet a few things. The training works fine:
image

  
  
Posted 3 months ago

but then the error occurs, after the training und the validating where succesfuly completed
image

  
  
Posted 3 months ago

if the same code is run localy it works:
image

  
  
Posted 3 months ago

the folders are there
image

  
  
Posted 3 months ago

Validating as well
image

  
  
Posted 3 months ago

and the data.yaml file as well:

train: ../train/images
val: ../valid/images
test: ../test/images

nc: 6
names: ['S_60_aktiv', 'S_Verboten_aktiv', 'bus', 'car', 'motorcycle', 'truck']
  
  
Posted 3 months ago

The only difference i see is, that there is no labels.cache file in the test folder.

  
  
Posted 3 months ago

I use windows

  
  
Posted 3 months ago

Now it seems to work. I had to add the 0013_Datenset as well:

dataset_path = Dataset.get(
    dataset_name=dataset_name,
    dataset_project=dataset_project,
    alias="0013_Dataset"
).get_local_copy()
dataset_path = os.path.join(dataset_path, "0013_Datenset", "data.yaml")
  
  
Posted 4 months ago

The Problem where the / and . Now the process ended without any error:

🙂
image

  
  
Posted 3 months ago

@<1651395720067944448:profile|GiddyHedgehong81> just to be clear, Dataset.get_local_copy returns a path to your files,
You have to Manually add the additional path to the specific files you need to use. It does Not know that in advance.
That was the initial issue you had, and I assume it is the same one here. does that make sense ?

  
  
Posted 4 months ago

Hi @<1651395720067944448:profile|GiddyHedgehong81>

However I need for a yolov8 (Object detection with arround 20k jpgs and .txt files) the data.yaml file:

Just add the entire folder with your files to a dataset, then get it in your code
Add files (you can do that from CLI for example): None

clearml-data add --files my_folder_with_files

Then from code: None

data_path = Dataset.get(dataset_name="my dataset", alias="training dataset").get_local_copy()

# now all my files are in `data_path`
  
  
Posted 4 months ago

It worked until it should validate the trainings. Here as well the same error.

2 epochs completed in 0.174 hours.
Optimizer stripped from runs/detect/train/weights/last.pt, 136.7MB
Optimizer stripped from runs/detect/train/weights/best.pt, 136.7MB
Validating runs/detect/train/weights/best.pt...
Ultralytics YOLOv8.0.231 🚀 Python-3.10.12 torch-2.1.2+cu121 CUDA:0 (NVIDIA A100-SXM4-40GB, 40514MiB)
Model summary (fused): 268 layers, 68129346 parameters, 0 gradients, 257.4 GFLOPs
Traceback (most recent call last):
  File "/root/.clearml/venvs-builds/3.10/code/training.py", line 40, in <module>
    main()
  File "/root/.clearml/venvs-builds/3.10/code/training.py", line 28, in main
    results = model.train(data=data,
  File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/ultralytics/engine/model.py", line 356, in train
    self.trainer.train()
  File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/ultralytics/engine/trainer.py", line 190, in train
    self._do_train(world_size)
  File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/ultralytics/engine/trainer.py", line 427, in _do_train
    self.final_eval()
  File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/ultralytics/engine/trainer.py", line 576, in final_eval
    self.metrics = self.validator(model=f)
  File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/ultralytics/engine/validator.py", line 139, in __call__
    self.data = check_det_dataset(self.args.data)
  File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/ultralytics/data/utils.py", line 253, in check_det_dataset
    file = check_file(dataset)
  File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/ultralytics/utils/checks.py", line 460, in check_file
    raise FileNotFoundError(f"'{file}' does not exist")
FileNotFoundError: 'C:/Users/junke/.clearml/cache/storage_manager/datasets/ds_30892c41582b4537bb9508f3c09ae9ed\0013_Datenset\data.yaml' does not exist
2023-12-29 23:14:48
Process failed, exit code 1

Here as well, if I copy the following path:

C:/Users/junke/.clearml/cache/storage_manager/datasets/ds_30892c41582b4537bb9508f3c09ae9ed\0013_Datenset\data.yaml

the file opens.

  
  
Posted 4 months ago

Hi Martin

Thank you very much for your answer. I have the dataset already uploaded and it is visible by datasets. Also the dataset is downloaded and stored by .clearml. If i try to accses the data with the following code I get an Permission denied error.

......
File "C:\Users\junke\AppData\Local\Programs\Python\Python310\lib\gzip.py", line 174, in init
fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
PermissionError: [Errno 13] Permission denied: 'C:/Users/junke/.clearml/cache/storage_manager/datasets/ds_30892c41582b4537bb9508f3c09ae9ed'
.........
File "C:\Users\junke\Dropbox\MIR\Ausbildung\MAS Automation Management\001_Module\005_Masterarbeit\Software\003_Test_Laptop_Yoga\venv\lib\site-packages\ultralytics\engine\trainer.py", line 120, in init
raise RuntimeError(emojis(f"Dataset '{clean_url(self.args.data)}' error ❌ {e}")) from e
RuntimeError: Dataset ' None ' error [Errno 13] Permission denied: 'C:/Users/junke/.clearml/cache/storage_manager/datasets/ds_30892c41582b4537bb9508f3c09ae9ed'

However if I accses the data directly with:
data = r"C:\Users\junke\.clearml\cache\storage_manager\datasets\ds_30892c41582b4537bb9508f3c09ae9ed\0013_Datenset\data.yaml"

there is no errormessage and the data can be accssed.

import pandas as pd
from ultralytics import YOLO
from clearml import Task, Dataset

# Creating a ClearML Task
task = Task.init(
    project_name="Training_MASAM_Modell_N",
    task_name="Datensatz_0013_Freeze_15",
    output_uri=True
)
model = YOLO("yolov8n.pt")

dataset_name = "0013_Dataset"
dataset_project = "Vehicle_Dataset"

dataset_path = Dataset.get(
    dataset_name=dataset_name,
    dataset_project=dataset_project,
    alias="0013_Dataset"
).get_local_copy()

data=dataset_path
#data = r"C:\Users\junke\.clearml\cache\storage_manager\datasets\ds_30892c41582b4537bb9508f3c09ae9ed\0013_Datenset\data.yaml"

def main():
    results = model.train(data=data,
                          epochs=80,
                          device=0,             # 0 = GPU
                          imgsz=640,
                          patience=50,          # Epochen die gewartet werden bis das Training vorzeitig beendet wird, wenn keine Verbesserung erkannt wird
                          batch=16,             # Anzahl der Bilder pro batch
                          save=True,
                          resume=False,         # Start Training vom letzten Checkpunkt (Wenn z.B. wegen Fehler abgebrochen wurde)
                          freeze=None,          # Freeze first n Layers, oder Liste von Layern
                          pretrained=True       # Benuetze ein vortrainiertes Modell; default=True
                          )
if __name__ == '__main__':
    main()
  
  
Posted 4 months ago

'

' error [Errno 13] Permission denied:

Seems like a permission issue ?
Try to remove your entire clearml cache folder None

  
  
Posted 4 months ago

Hi Martin
I just deleted the complet folder, still the same:

Ultralytics YOLOv8.0.225 🚀 Python-3.10.11 torch-2.2.0.dev20231207+cu118 CUDA:0 (NVIDIA GeForce GTX 1650 Ti with Max-Q Design, 4096MiB)
engine\trainer: task=detect, mode=train, model=yolov8n.pt, data=C:/Users/junke/.clearml/cache/storage_manager/datasets/ds_30892c41582b4537bb9508f3c09ae9ed, epochs=80, patience=50, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=0, workers=8, project=None, name=train17, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, cfg=None, tracker=botsort.yaml, save_dir=runs\detect\train17
Traceback (most recent call last):
  File "C:\Users\junke\Dropbox\MIR\Ausbildung\MAS Automation Management\001_Module\005_Masterarbeit\Software\003_Test_Laptop_Yoga\venv\lib\site-packages\ultralytics\engine\trainer.py", line 116, in __init__
    self.data = check_det_dataset(self.args.data)
  File "C:\Users\junke\Dropbox\MIR\Ausbildung\MAS Automation Management\001_Module\005_Masterarbeit\Software\003_Test_Laptop_Yoga\venv\lib\site-packages\ultralytics\data\utils.py", line 257, in check_det_dataset
    if zipfile.is_zipfile(file) or is_tarfile(file):
  File "C:\Users\junke\AppData\Local\Programs\Python\Python310\lib\tarfile.py", line 2517, in is_tarfile
    t = open(name)
  File "C:\Users\junke\AppData\Local\Programs\Python\Python310\lib\tarfile.py", line 1632, in open
    return func(name, "r", fileobj, **kwargs)
  File "C:\Users\junke\AppData\Local\Programs\Python\Python310\lib\tarfile.py", line 1698, in gzopen
    fileobj = GzipFile(name, mode + "b", compresslevel, fileobj)
  File "C:\Users\junke\AppData\Local\Programs\Python\Python310\lib\gzip.py", line 174, in __init__
    fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
PermissionError: [Errno 13] Permission denied: 'C:/Users/junke/.clearml/cache/storage_manager/datasets/ds_30892c41582b4537bb9508f3c09ae9ed'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\junke\Dropbox\MIR\Ausbildung\MAS Automation Management\001_Module\005_Masterarbeit\Software\003_Test_Laptop_Yoga\training.py", line 38, in <module>
    main()
  File "C:\Users\junke\Dropbox\MIR\Ausbildung\MAS Automation Management\001_Module\005_Masterarbeit\Software\003_Test_Laptop_Yoga\training.py", line 26, in main
    results = model.train(data=data,
  File "C:\Users\junke\Dropbox\MIR\Ausbildung\MAS Automation Management\001_Module\005_Masterarbeit\Software\003_Test_Laptop_Yoga\venv\lib\site-packages\ultralytics\engine\model.py", line 333, in train
    self.trainer = (trainer or self._smart_load('trainer'))(overrides=args, _callbacks=self.callbacks)
  File "C:\Users\junke\Dropbox\MIR\Ausbildung\MAS Automation Management\001_Module\005_Masterarbeit\Software\003_Test_Laptop_Yoga\venv\lib\site-packages\ultralytics\engine\trainer.py", line 120, in __init__
    raise RuntimeError(emojis(f"Dataset '{clean_url(self.args.data)}' error ❌ {e}")) from e
RuntimeError: Dataset '
' error  [Errno 13] Permission denied: 'C:/Users/junke/.clearml/cache/storage_manager/datasets/ds_30892c41582b4537bb9508f3c09ae9ed'

Process finished with exit code 1
  
  
Posted 4 months ago

Same issue when I try to run a clone of the same program on a google colab worker:

2023-12-29 19:24:12,028 - clearml - INFO - Dataset.get() did not specify alias. Dataset information will not be automatically logged in ClearML Server.
New 
 available 😃 Update with 'pip install -U ultralytics'
Ultralytics YOLOv8.0.225 🚀 Python-3.10.12 torch-2.1.2+cu121 CUDA:0 (Tesla T4, 15102MiB)
engine/trainer: task=detect, mode=train, model=yolov8n.pt, data=/root/.clearml/cache/storage_manager/datasets/ds_30892c41582b4537bb9508f3c09ae9ed, epochs=80, patience=50, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=0, workers=8, project=None, name=train, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, cfg=None, tracker=botsort.yaml, save_dir=runs/detect/train
Traceback (most recent call last):
  File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/ultralytics/engine/trainer.py", line 116, in __init__
    self.data = check_det_dataset(self.args.data)
  File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/ultralytics/data/utils.py", line 257, in check_det_dataset
    if zipfile.is_zipfile(file) or is_tarfile(file):
  File "/usr/lib/python3.10/tarfile.py", line 2780, in is_tarfile
    t = open(name)
  File "/usr/lib/python3.10/tarfile.py", line 1797, in open
    return func(name, "r", fileobj, **kwargs)
  File "/usr/lib/python3.10/tarfile.py", line 1863, in gzopen
    fileobj = GzipFile(name, mode + "b", compresslevel, fileobj)
  File "/usr/lib/python3.10/gzip.py", line 174, in __init__
    fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
IsADirectoryError: [Errno 21] Is a directory: '/root/.clearml/cache/storage_manager/datasets/ds_30892c41582b4537bb9508f3c09ae9ed'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/root/.clearml/venvs-builds/3.10/code/training.py", line 38, in <module>
    main()
  File "/root/.clearml/venvs-builds/3.10/code/training.py", line 26, in main
    results = model.train(data=data,
  File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/ultralytics/engine/model.py", line 333, in train
    self.trainer = (trainer or self._smart_load('trainer'))(overrides=args, _callbacks=self.callbacks)
  File "/root/.clearml/venvs-builds/3.10/lib/python3.10/site-packages/ultralytics/engine/trainer.py", line 120, in __init__
    raise RuntimeError(emojis(f"Dataset '{clean_url(self.args.data)}' error ❌ {e}")) from e
RuntimeError: Dataset '/root/.clearml/cache/storage_manager/datasets/ds_30892c41582b4537bb9508f3c09ae9ed' error ❌ [Errno 21] Is a directory: '/root/.clearml/cache/storage_manager/datasets/ds_30892c41582b4537bb9508f3c09ae9ed'
2023-12-29 20:24:27
Process failed, exit code 1
  
  
Posted 4 months ago

If I access the dataset on the same location directly it works fine:

#data = r"C:\Users\junke\.clearml\cache\storage_manager\datasets\ds_30892c41582b4537bb9508f3c09ae9ed\0013_Datenset\data.yaml"
  
  
Posted 4 months ago

Yes, when I understand correctly from the documentation the dataset is the first time downloaded und later on only the increment changes of it:

2023-12-29 20:24:11
2023-12-29 19:24:06,083 - clearml.storage - INFO - Downloading: 255.00MB / 387.75MB @ 48.47MBs from None
2023-12-29 19:24:06,182 - clearml.storage - INFO - Downloading: 260.00MB / 387.75MB @ 50.53MBs from None
2023-12-29 19:24:06,270 - clearml.storage - INFO - Downloading: 265.00MB / 387.75MB @ 56.95MBs from None
2023-12-29 19:24:06,334 - clearml.storage - INFO - Downloading: 270.00MB / 387.75MB @ 78.38MBs from None

  
  
Posted 4 months ago

If I access the dataset on the same location directly it works fine:

wait, I'm confused, how is it the datset us there? did it download the dataset?

are you saying this line for example will fail? (assuming you actually have a dataset by that name)

data_path = Dataset.get(dataset_name="002_Datenset_MASAM_for_fintuning", alias="002_Datenset_MASAM_for_fintuning").get_local_copy()
  
  
Posted 4 months ago

okay so it is downloaded to your machine, and unzipped , is that part correct?

  
  
Posted 4 months ago

If i point directly to the data.yaml the training starts without any problem

what do you mean? how do you know where the extracted file is?
basically:

data_path = Dataset.get(...).get_local_copy()

then you should be able to open your file with open(data_path + "/data.yaml", "rt")
doe that work?

  
  
Posted 4 months ago

with #data = r"C:\Users\junke\.clearml\cache\storage_manager\datasets\ds_30892c41582b4537bb9508f3c09ae9ed\0013_Datenset\data.yaml"
  
  
Posted 4 months ago

Yes this part is correct. If i point directly to the data.yaml the training starts without any problem

  
  
Posted 4 months ago

My complete code is:

import pandas as pd
from ultralytics import YOLO
from clearml import Task, Dataset

# Creating a ClearML Task
task = Task.init(
    project_name="Training_MASAM_Modell_N",
    task_name="Datensatz_0013_Freeze_15",
    output_uri=True
)
model = YOLO("yolov8n.pt")

dataset_name = "0013_Dataset"
dataset_project = "Vehicle_Dataset"

dataset_path = Dataset.get(
    dataset_name=dataset_name,
    dataset_project=dataset_project,
    alias="0013_Dataset"
).get_local_copy()

data=dataset_path
#data = r"C:\Users\junke\.clearml\cache\storage_manager\datasets\ds_30892c41582b4537bb9508f3c09ae9ed\0013_Datenset\data.yaml"

def main():
    results = model.train(data=data,
                          epochs=80,
                          device=0,             # 0 = GPU
                          imgsz=640,
                          patience=50,          # Epochen die gewartet werden bis das Training vorzeitig beendet wird, wenn keine Verbesserung erkannt wird
                          batch=16,             # Anzahl der Bilder pro batch
                          save=True,
                          resume=False,         # Start Training vom letzten Checkpunkt (Wenn z.B. wegen Fehler abgebrochen wurde)
                          freeze=None,          # Freeze first n Layers, oder Liste von Layern
                          pretrained=True       # Benuetze ein vortrainiertes Modell; default=True
                          )
if __name__ == '__main__':
    main()

If i set:

#data = r"C:\Users\junke\.clearml\cache\storage_manager\datasets\ds_30892c41582b4537bb9508f3c09ae9ed\0013_Datenset\data.yaml"

the dataset (data.yaml can be accessed. I guess I do not open the dataset correctly.

  
  
Posted 4 months ago

try:

import os

...

dataset_path = Dataset.get(
    dataset_name=dataset_name,
    dataset_project=dataset_project,
    alias="0013_Dataset"
).get_local_copy()
dataset_path = os.path.join(dataset_path, "data.yaml")

...
  
  
Posted 4 months ago

Now i am wondering if this works on a google colab worker as well.

  
  
Posted 4 months ago

and it does 😀 .

Thank you very much

  
  
Posted 4 months ago

it should

  
  
Posted 4 months ago
9K Views
31 Answers
4 months ago
3 months ago
Tags