Unanswered
Hi Guys. Say That We Train A Model With 10 Epoch, And Suddenly Interruption Occur On Epoch 5. How Can We Continue The By Using Clearml?
Hi @<1546665666675740672:profile|AttractiveFrog67>
- Make sure you stored the model's checkpoint (either pass
output_uri=True
inTask.init
or manually upload) - When you call
Task.init
pass "continue_last_task=True
" - Now you can do
last_checkpoint=task.models["output"][-1].get_local_copy()
and all you need is to loadlast_checkpoint
137 Views
0
Answers
one year ago
one year ago