Hi, I Am Running A Script Very Similar To The One In

Hi @<1523701070390366208:profile|CostlyOstrich36> , Here's sample code:

from ultralytics import YOLO
from clearml import Task, Dataset
from jsonargparse import CLI

def train_yolo(ds_name: str=None):
    dataset_path = Dataset.get(dataset_name=ds_name).get_local_copy() 
    task = Task.current_task()
    if task == None:
        task = Task.init(project_name="YOLO", task_name=ds_name)
    model = YOLO("yolov8n")
if __name__ == "__main__":

I enqueued a job using this code (with clearml-task). It ran on machine1 and crashed at some point. I reset the job and re-enqueued it, and it now ran machine2 . For some reason the training started fine on the clearml dataset, but when there was a second call to the data (during model.val), it was looking for a dataset in /home/machine1/.clearml/cache/storage_manager/datasets/... and it crashes the job.

Posted 6 months ago
0 Answers
6 months ago
6 months ago