Reputation
Badges 1
17 × Eureka!Sorry to answer so late AgitatedDove14
I also thought so and tried this thing:
` !pip install clearml
import clearml
id_last_start = '873add629cf44cd5ab2ef383c94b1c'
clearml.Task.set_credentials(...)
if id_last_start != '':
task = clearml.Task.get_task(task_id=id_last_start,project_name='tests', task_name='patience: 5 factor:0.5')
task = clearml.Task.init(project_name='Exp with ROP',
task_name='patience: 2 factor:0.75',
co...
AgitatedDove14
The gap is always equal to the number of iterations completed before continuing training
AgitatedDove14
Can you please give some code examples where the training restore, because I haven't found any? I will be very grateful
AgitatedDove14
Yes (if value of first iter is 0)
AgitatedDove14 Of course, I added it when restoring the experiment. And it works correctly when running on my computer, and if I use colab, then for some reason it has no effect.
AgitatedDove14
Yes, I have problems with continuing experiments in colab. I do everything the same as on my computer, but in the case of colab, I have gaps in the charts.
When I work through Colab, when I continue experimenting, I get gaps in the graphs.
For example, the first time I run, I create a task and run a loop:for i in range(1,100):
clearml.Logger.current_logger().report_scalar("test", "loss", iteration=i, value=i)
Then, on the second run, I continue the task via continue_last_task and reuse_last_task_id and write task.set_initial_iteration(0). Then I start the cycle:for i in range(100,200):
` clearml.Logger.current_logger()...
AgitatedDove14
The last iteration before the restore was 2. Starting from the 3rd iteration, this is the restored model
AgitatedDove14
I upload to colab via “pip install clearml”, and therefore it is probably the most up-to-date there. The version on my computer and colab is 1.1.4
AgitatedDove14 Hooray, it helped! Thank you very much!!!!
Hi AgitatedDove14 I finally found a solution to the problem. I should have written task.set_initial_iteration(0)
after restore task. Thank you for your help
AgitatedDove14
If I use this method, then new scalar values stop being added to the graph
AgitatedDove14 This does not solve the problem unfortunately:( New exp: https://app.community.clear.ml/projects/2d68c58ff6f14403b51ff4c2d0b4f626/experiments/ec096e98ed5c4eccaf8047673023fc3e/output/execution
The image shows the eval log. The second column is val, the third column is step
Hi, AgitatedDove14
Yes, that sounds like my problem. But I do not know how it can help me:(
AgitatedDove14
Yes, i use continue_last_task with reuse_last_task_id. The iteration number is the actual number of batches that were used, or the number of the epoch at which the training stopped. The iterations are served sequentially, but for some reason there is a gap in this picture
For example, when I start the pipeline, pytorch starts to be installed in the service queue. But I would like it to be installed only inside the queue that train step will run on.