Unanswered
Hi Good Folks Here! Does Clearml Allow Auto-Rerun Of Failed Jobs, For Example When A Spot Instance Gets Interrupted, Please? (Or Auto-Resume, If Checkpointing Logic In Place)
@<1546665634195050496:profile|SolidGoose91> regarding spot instances, are you referring to tasks running using the AutoScaler App? If so, the autoscaler app should detect the failed spot machine and create a new spot machine that should start running the specific task which was interrupted
151 Views
0
Answers
one year ago
one year ago