Unanswered
Hello, I Would Like To Use Spot Instances Together With The Aws Autoscaler To Train Models With Pytorch/Ignite And I Am Wondering How To Support Interruptions During The Training (In Case The Instance Is Terminated By Aws). Is There Anything Already Built
If the reporting is done on a subprocess, I can imagine that the task.set_initial_iteration(0)
call is only effective in the main process, not in the subprocess used for reporting. Could it be the case?
165 Views
0
Answers
3 years ago
one year ago