For example:
task = Task.init(project_name='examples', task_name='PyTorch MNIST train', output_uri=True)
# Training settings
parser = argparse.ArgumentParser(description='PyTorch MNIST Example')
parser.add_argument('--ds-name', default="blabla")
args = parser.parse_args()
okay I'll try that. Although I am using parameters from the argparser to set the task name and project. Can I init with dummy values and update those after ?
okay, and after I can use something like task.set_name("args.ds_name")
?
Hi @<1523701070390366208:profile|CostlyOstrich36> , Here's sample code:
from ultralytics import YOLO
from clearml import Task, Dataset
from jsonargparse import CLI
def train_yolo(ds_name: str=None):
dataset_path = Dataset.get(dataset_name=ds_name).get_local_copy()
task = Task.current_task()
if task == None:
task = Task.init(project_name="YOLO", task_name=ds_name)
model = YOLO("yolov8n")
model.train(data=dataset_path)
if __name__ == "__main__":
CLI(train_yolo)
I enqueued a job using this code (with clearml-task). It ran on machine1
and crashed at some point. I reset the job and re-enqueued it, and it now ran machine2
. For some reason the training started fine on the clearml dataset, but when there was a second call to the data (during model.val), it was looking for a dataset in /home/machine1/.clearml/cache/storage_manager/datasets/...
and it crashes the job.
For more info, I am using jsonargparse to expose my params to clearml, but it looks like it's also picking up the params directly from YOLO
Hi @<1644147961996775424:profile|HurtStarfish47> , Do you have some basic code snippet that reproduces this behavior?
I'd suggest running Task.init
first and then exposing the dataset name using argparser afterwards