Unanswered
Hi Guys. Say That We Train A Model With 10 Epoch, And Suddenly Interruption Occur On Epoch 5. How Can We Continue The By Using Clearml?
Hi @<1546665666675740672:profile|AttractiveFrog67>
- Make sure you stored the model's checkpoint (either pass
output_uri=TrueinTask.initor manually upload) - When you call
Task.initpass "continue_last_task=True" - Now you can do
last_checkpoint=task.models["output"][-1].get_local_copy()and all you need is to loadlast_checkpoint
238 Views
0
Answers
2 years ago
2 years ago