Answered

Hi, Another Question If You May. Is It Possible To Edit A Logged Task? For Instance - Remove All The Metrics From Some Step Onward?

Hi,
Another question if you may. Is it possible to edit a logged task? for instance - remove all the metrics from some step onward?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					OddAlligator72
				
					0
					 × 1

Votes Newest

Answers 11

Manually should be the simplest, so let's start from there...

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					OddAlligator72
				
					0
					 × 1

That's great for continuing from the last checkpoint, but, unless I misunderstand you, my intention is different:

Suppose I trained a model for 30k epochs over night, and looking at the graphs, I wish to get back to the 22k'th epoch and retrain it from there differently, while preserving all the history up to that point.
So, I start by cloning the task, and.. what can I do then to "get back" to the previous epoch? This means that I would like all metrics, logs, checkpoints, etc. from the 22k'th epoch forward deleted, and then to use your approach.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					OddAlligator72
				
					0
					 × 1

I see now.
Let's assume you know which snapshot that was:
` prev_task = Task.get_task(task_id='the_first_training_task_id')

get the second from last checkpoint

task.models['output'][-2].url
prev_scalars = prev_task.get_reported_scalars()
new_task = Task.init('example', 'new task')
logger = new_task.get_logger()

do some fpr loop and report the prev_scalars with logger.report_scalars

new_task.flush(wait_for_uploads=True)
new_task.set_initial_iteration(22000)

start the train `

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hey AgitatedDove14 ,
I wish to be able to continue a previous run, but from a certain checkpoint onward (perhaps with changed data, perhaps with different LR...). So I wish to be able to be able to "go back" to the epoch of the checkpoint, and continue from there while retaining the entire history up to that point.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					OddAlligator72
				
					0
					 × 1

I see, is this what you are looking for?
https://allegro.ai/docs/task.html#trains.task.Task.init

continue_last_task='task_id'

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

OddAlligator72 let's separate the two issues:
Continue reporting from a previous iteration Retrieving a previously stored checkpointNow for the details:
Are you referring to a scenario where you execute your code manually (i.e. without the trains-agent) ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi OddAlligator72

for instance - remove all the metrics from some step onward?

(I think that as long as the Task is not published you could do such a thing directly with the RestAPI (aka APIClient from python)
What's the use case?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

OddAlligator72 sure thing 🙂
This should sort it out:
Task.init('examples', 'train', continue_last_task=True)If you want to continue a specific Task:
continue_last_task='task_id_here'Getting the previous model:
last_checkopoint = task.models['output'][-1]What do you think?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

OK, that looks like a nice workaround. Thanks!

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					OddAlligator72
				
					0
					 × 1

Getting the last checkpoint can be done via.
Task.get_task(task_id='aabbcc').models['output'][-1]

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I'm not sure that this is exactly that, though I wish to continue from a given checkpoint.
Also, will this overwrite graphs starting at a given step?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					OddAlligator72
				
					0
					 × 1

Write your answer

779 Views

11 Answers

3 years ago

one year ago