Answered

My Nth Question For The Day

My nth question for the day 🙂

What’s the general pattern for running a pipeline - train model, evaluate metrics and publish the model if satisfactory (based on a threshold, for example)

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

Votes Newest

Answers 7

AgitatedDove14 I'm making some progress on this. I've currently got the situation that my training run saved all of these files, and Task.get_task(param['TaskA']).models['output''][-1] gets me just one of them, training_args.bin . Then -2 gets me another, rng_state.pth

If I just get Task.get_task(param['TaskA']).models['output'] , I end up getting a huge list of, like, [<clearml.model.Model object at 0x7fec2841c880>, <clearml.model.Model object at 0x7fec2841c8e0>, <clearml.model.Model object at 0x7fec2841c820>...

So I think I have a solution here, which is to just loop backwards through the list until I find the right file I want to load.

But I just noticed that for some reason pytorch_model.bin isn't there. I'm not sure why that wasn't saved. huh

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SmallDeer34
				
					0
					 × 1

That's cool AgitatedDove14 , will try it out and pester you a bit more. 🙂

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					TrickySheep9
				
					0
					 × 1

Very interesting, thanks! I'll look into it!

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SmallDeer34
				
					0
					 × 1

Is there a way to do this all elegantly?

Of yes there is, this is how TaskB code will look:

` task = Task.init(..., 'task b')
param = {'TaskA' :'TaskAs ID HERE'}
task.connect(param)
taska_model = Task.get_task(param['TaskA']).models['output''][-1]
torch.load(taska_model.get_local_copy())

train

torch.save('modelb') `I might have missed something there, but generally speaking this will let you:
Select TASKA as a parameter of TaskB training process Will register automagically Tasks'A model as input model of TaskB Store TasksB in the model repositorySo basically full lineage with ability to automate. wdyt?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Interesting, I wasn't aware of the possibilities you outline there at the end, where you, like, programmatically pull all the results down for all the tasks. Neat!

A more complex version of this which I'm trying to figure out:

I trained a model using TaskA. I need to now pull that model down from the saved artifacts of TaskA and fine-tune it in TaskB That finetuning in TaskB spits out a metric.
Is there a way to do this all elegantly? Currently my process is to manually download the models from the UI, then manually upload them to S3, then manually pull them down from S3 and then start the code to finetune TaskB

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SmallDeer34
				
					0
					 × 1

What’s the general pattern for running a pipeline - train model, evaluate metrics and publish the model if satisfactory (based on a threshold, for example)

Basically I would do:
parameters for pipeline:
TaskA = Training model Task (think of it as our template Task)
Metric = title/series/sign we want to choose based on, where sign is max/min
Project = Project to compare the performance so that we could decide to publish based on the best Metric.

Pipeline:
Clone TaskA Change TaskA arguments (if needed) Launch and wait until completed Get TaskA's instance Metric value = Task.get_task(task_id='instance_id_111').get_last_scalar_metrics[Metric.title][Metric.series][Metric.sign])5. Get all Tasks with metric above/below this one,
tasks = Tasks.get_tasks(project=, name=, etc...) tasks = sorted(tasks, key=lambda x: x.get_last_scalar_metrics[Metric.title][Metric.series][Metric.sign]))6. pick the best one
# best task, if this is us, publish if tasks[-1].id == instance_id_111: tasks[-1].publish()wdyt?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

seconded, I'm curious about this also.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					SmallDeer34
				
					0
					 × 1

Write your answer

2K Views

7 Answers

4 years ago

2 years ago