Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, Another Question If You May. Is It Possible To Edit A Logged Task? For Instance - Remove All The Metrics From Some Step Onward?

Hi,
Another question if you may. Is it possible to edit a logged task? for instance - remove all the metrics from some step onward?

  
  
Posted 3 years ago
Votes Newest

Answers 11


Hey AgitatedDove14 ,
I wish to be able to continue a previous run, but from a certain checkpoint onward (perhaps with changed data, perhaps with different LR...). So I wish to be able to be able to "go back" to the epoch of the checkpoint, and continue from there while retaining the entire history up to that point.

  
  
Posted 3 years ago

Hi OddAlligator72

for instance - remove all the metrics from some step onward? 

(I think that as long as the Task is not published you could do such a thing directly with the RestAPI (aka APIClient from python)
What's the use case?

  
  
Posted 3 years ago

I see, is this what you are looking for?
https://allegro.ai/docs/task.html#trains.task.Task.init

continue_last_task='task_id'

  
  
Posted 3 years ago

Getting the last checkpoint can be done via.
Task.get_task(task_id='aabbcc').models['output'][-1]

  
  
Posted 3 years ago

That's great for continuing from the last checkpoint, but, unless I misunderstand you, my intention is different:

Suppose I trained a model for 30k epochs over night, and looking at the graphs, I wish to get back to the 22k'th epoch and retrain it from there differently, while preserving all the history up to that point.
So, I start by cloning the task, and.. what can I do then to "get back" to the previous epoch? This means that I would like all metrics, logs, checkpoints, etc. from the 22k'th epoch forward deleted, and then to use your approach.

  
  
Posted 3 years ago

OddAlligator72 sure thing 🙂
This should sort it out:
Task.init('examples', 'train', continue_last_task=True)If you want to continue a specific Task:
continue_last_task='task_id_here'Getting the previous model:
last_checkopoint = task.models['output'][-1]What do you think?

  
  
Posted 3 years ago

OddAlligator72 let's separate the two issues:
Continue reporting from a previous iteration Retrieving a previously stored checkpointNow for the details:
Are you referring to a scenario where you execute your code manually (i.e. without the trains-agent) ?

  
  
Posted 3 years ago

Manually should be the simplest, so let's start from there...

  
  
Posted 3 years ago

I see now.
Let's assume you know which snapshot that was:
` prev_task = Task.get_task(task_id='the_first_training_task_id')

get the second from last checkpoint

task.models['output'][-2].url
prev_scalars = prev_task.get_reported_scalars()
new_task = Task.init('example', 'new task')
logger = new_task.get_logger()

do some fpr loop and report the prev_scalars with logger.report_scalars

new_task.flush(wait_for_uploads=True)
new_task.set_initial_iteration(22000)

start the train `

  
  
Posted 3 years ago

I'm not sure that this is exactly that, though I wish to continue from a given checkpoint.
Also, will this overwrite graphs starting at a given step?

  
  
Posted 3 years ago

OK, that looks like a nice workaround. Thanks!

  
  
Posted 3 years ago
540 Views
11 Answers
3 years ago
one year ago
Tags