Unanswered
Hello, I Would Like To Use Spot Instances Together With The Aws Autoscaler To Train Models With Pytorch/Ignite And I Am Wondering How To Support Interruptions During The Training (In Case The Instance Is Terminated By Aws). Is There Anything Already Built
JitteryCoyote63
somehow the previous iterations, not sure yet if it’s coming from my code, ignite or clearml
ClearML will automatically continue reporting from the previous iteration (i.e. if before continuing the Task the last iteration was 100, then the next report with iteration =0 will actually be 101)
task.set_initial_iteration(engine.state.iteration)
Basically it is called automatically by ClearML (obviously only when you continue an aborted Task)
175 Views
0
Answers
3 years ago
one year ago