Hi, I Am Running A Script Very Similar To The One In

Answered

Hi, I am running a script very similar to the one in this example , except that the data parameter for training is taken from a clearml.Dataset.get . I can clone my job and modify my parameters, but the data parameter for model training is now cached and no longer computed from my script. How can I make it such that clearml does not overwrite that parameter ?

data = clearml.Dataset.get(dataset_name=ds_name)  # ds_name is a param

model.train(data=data) # Here data is overwritten for the cached value on a cloned job

  				
Posted 
	one year ago

					More
				  		
  Report
		
					HurtStarfish47
				
					0
					 × 1

Votes Newest

Answers 10

okay, and after I can use something like task.set_name("args.ds_name") ?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					HurtStarfish47
				
					0
					 × 1

For more info, I am using jsonargparse to expose my params to clearml, but it looks like it's also picking up the params directly from YOLO

  				
Posted 
	one year ago

					More
				  		
  Report
		
					HurtStarfish47
				
					0
					 × 1

For example:

task = Task.init(project_name='examples', task_name='PyTorch MNIST train', output_uri=True)

    # Training settings
    parser = argparse.ArgumentParser(description='PyTorch MNIST Example')
    parser.add_argument('--ds-name', default="blabla")
    args = parser.parse_args()

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Hi @<1523701070390366208:profile|CostlyOstrich36> , Here's sample code:

from ultralytics import YOLO
from clearml import Task, Dataset
from jsonargparse import CLI

def train_yolo(ds_name: str=None):
    dataset_path = Dataset.get(dataset_name=ds_name).get_local_copy() 
    task = Task.current_task()
    
    if task == None:
        task = Task.init(project_name="YOLO", task_name=ds_name)
    
    model = YOLO("yolov8n")
    model.train(data=dataset_path)
    
if __name__ == "__main__":
    CLI(train_yolo)

I enqueued a job using this code (with clearml-task). It ran on machine1 and crashed at some point. I reset the job and re-enqueued it, and it now ran machine2 . For some reason the training started fine on the clearml dataset, but when there was a second call to the data (during model.val), it was looking for a dataset in /home/machine1/.clearml/cache/storage_manager/datasets/... and it crashes the job.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					HurtStarfish47
				
					0
					 × 1

Just set defaults

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

awesome thanks !

  				
Posted 
	one year ago

					More
				  		
  Report
		
					HurtStarfish47
				
					0
					 × 1

I'd suggest running Task.init first and then exposing the dataset name using argparser afterwards

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

okay I'll try that. Although I am using parameters from the argparser to set the task name and project. Can I init with dummy values and update those after ?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					HurtStarfish47
				
					0
					 × 1

Hi @<1644147961996775424:profile|HurtStarfish47> , Do you have some basic code snippet that reproduces this behavior?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Yes 🙂

  				
Posted 
	one year ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Write your answer

1K Views

10 Answers

one year ago