AgitatedDove14 This does not solve the problem unfortunately:( New exp: https://app.community.clear.ml/projects/2d68c58ff6f14403b51ff4c2d0b4f626/experiments/ec096e98ed5c4eccaf8047673023fc3e/output/execution
The image shows the eval log. The second column is val, the third column is step
Run this example:
Once, then change line #26 to:
task = Task.init(project_name="examples", task_name="scalar reporting", continue_last_task=True)and run again,
Hmmm. So if last iteration was 75, the next iteration (after we continue) will be 150 ?
I think that you do not actually need this one:
step = step - cfg.start_epoch + 1you can just do
step += 1ClearML Will take care of the offset itself
I'm not sure I follow the example... Are you sure this experiment continued a previous run?
What was the last iteration on the previous run ?
Give me a minute, I'll check something
Nice SourOx12 !
Yes, i use continue_last_task with reuse_last_task_id. The iteration number is the actual number of batches that were used, or the number of the epoch at which the training stopped. The iterations are served sequentially, but for some reason there is a gap in this picture
How do you set the iteration when you continue the experiment? is it with
Okay let me check....
Hmm, I see the jump from 50 to 100, is that consistent with the last iteration on the aborted Task (before continuing )?
Sorry to answer so late AgitatedDove14
I also thought so and tried this thing:
` !pip install clearml
id_last_start = '873add629cf44cd5ab2ef383c94b1c'
if id_last_start != '':
task = clearml.Task.get_task(task_id=id_last_start,project_name='tests', task_name='patience: 5 factor:0.5')
task = clearml.Task.init(project_name='Exp with ROP', task_name='patience: 2 factor:0.75', continue_last_task=True, reuse_last_task_id=id_last_start, )
task = clearml.Task.init(project_name='tests', task_name='patience: 2 factor:0.75')
cfg.task = task
folder = path[:path.find('/')]
file = path[path.find('/')+1:]
if step == cfg.epoch:
step = step - cfg.start_epoch + 1
clearml.Logger.current_logger().report_scalar(folder, file, iteration=step, value=val) elif step == cfg.step: step = step - cfg.start_step + 1 clearml.Logger.current_logger().report_scalar(folder, file, iteration=step, value=val) `And after restarting, I get these breaks in Scalars: https://app.community.clear.ml/projects/2d68c58ff6f14403b51ff4c2d0b4f626/experiments/873add629cf44cd5ab2ef383c94b1c9b/output/execution
So the thing is
clearml automatically detects the last iteration of the previous run, my assumption you also add it hence the double shift.
SourOx12 could that be it?