Reputation
Badges 1
22 × Eureka!Hi Martin
Thank you very much for your answer. I have the dataset already uploaded and it is visible by datasets. Also the dataset is downloaded and stored by .clearml. If i try to accses the data with the following code I get an Permission denied error.
......
File "C:\Users\junke\AppData\Local\Programs\Python\Python310\lib\gzip.py", line 174, in init
fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
PermissionError: [Errno 13] Permission denied: 'C:/Users/junke/.c...
and it does 😀 .
Thank you very much
Now i am wondering if this works on a google colab worker as well.
Yes, when I understand correctly from the documentation the dataset is the first time downloaded und later on only the increment changes of it:
2023-12-29 20:24:11
2023-12-29 19:24:06,083 - clearml.storage - INFO - Downloading: 255.00MB / 387.75MB @ 48.47MBs from None
2023-12-29 19:24:06,182 - clearml.sto...
but then the error occurs, after the training und the validating where succesfuly completed
if the same code is run localy it works:
My complete code is:
import pandas as pd
from ultralytics import YOLO
from clearml import Task, Dataset
# Creating a ClearML Task
task = Task.init(
project_name="Training_MASAM_Modell_N",
task_name="Datensatz_0013_Freeze_15",
output_uri=True
)
model = YOLO("yolov8n.pt")
dataset_name = "0013_Dataset"
dataset_project = "Vehicle_Dataset"
dataset_path = Dataset.get(
dataset_name=dataset_name,
dataset_project=dataset_project,
alias="0013_Dataset"
).get_local_copy()...
and the data.yaml file as well:
train: ../train/images
val: ../valid/images
test: ../test/images
nc: 6
names: ['S_60_aktiv', 'S_Verboten_aktiv', 'bus', 'car', 'motorcycle', 'truck']
The only difference i see is, that there is no labels.cache file in the test folder.
If I access the dataset on the same location directly it works fine:
#data = r"C:\Users\junke\.clearml\cache\storage_manager\datasets\ds_30892c41582b4537bb9508f3c09ae9ed\0013_Datenset\data.yaml"
It worked until it should validate the trainings. Here as well the same error.
2 epochs completed in 0.174 hours.
Optimizer stripped from runs/detect/train/weights/last.pt, 136.7MB
Optimizer stripped from runs/detect/train/weights/best.pt, 136.7MB
Validating runs/detect/train/weights/best.pt...
Ultralytics YOLOv8.0.231 🚀 Python-3.10.12 torch-2.1.2+cu121 CUDA:0 (NVIDIA A100-SXM4-40GB, 40514MiB)
Model summary (fused): 268 layers, 68129346 parameters, 0 gradients, 257.4 GFLOPs
Traceback (...
Hi Martin
Thank you very much for your answer and sorry for the late answer. I have testet a few things. The training works fine:
with #data = r"C:\Users\junke\.clearml\cache\storage_manager\datasets\ds_30892c41582b4537bb9508f3c09ae9ed\0013_Datenset\data.yaml"
Same issue when I try to run a clone of the same program on a google colab worker:
2023-12-29 19:24:12,028 - clearml - INFO - Dataset.get() did not specify alias. Dataset information will not be automatically logged in ClearML Server.
New
available 😃 Update with 'pip install -U ultralytics'
Ultralytics YOLOv8.0.225 🚀 Python-3.10.12 torch-2.1.2+cu121 CUDA:0 (Tesla T4, 15102MiB)
engine/trainer: task=detect, mode=train, model=yolov8n.pt, data=/root/.clearml/cache/storage_m...
The Problem where the / and . Now the process ended without any error:
🙂
I have realy an understanding problem. I have started the process from a diffrend computer. I have deleted the complete .clearml folder.
However the training starts and the validating process fails.
Now it seems to work. I had to add the 0013_Datenset as well:
dataset_path = Dataset.get(
dataset_name=dataset_name,
dataset_project=dataset_project,
alias="0013_Dataset"
).get_local_copy()
dataset_path = os.path.join(dataset_path, "0013_Datenset", "data.yaml")
Hi Martin
I just deleted the complet folder, still the same:
Ultralytics YOLOv8.0.225 🚀 Python-3.10.11 torch-2.2.0.dev20231207+cu118 CUDA:0 (NVIDIA GeForce GTX 1650 Ti with Max-Q Design, 4096MiB)
engine\trainer: task=detect, mode=train, model=yolov8n.pt, data=C:/Users/junke/.clearml/cache/storage_manager/datasets/ds_30892c41582b4537bb9508f3c09ae9ed, epochs=80, patience=50, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=0, workers=8, project=None, name=train17, exi...
Yes this part is correct. If i point directly to the data.yaml the training starts without any problem