Unanswered
Hello, I Would Like To Use Spot Instances Together With The Aws Autoscaler To Train Models With Pytorch/Ignite And I Am Wondering How To Support Interruptions During The Training (In Case The Instance Is Terminated By Aws). Is There Anything Already Built
Although task.data.last_iteration
is correct when resuming, there is still this doubling effect when logging metrics after resuming 😞
176 Views
0
Answers
3 years ago
one year ago