Tagging my colleague @<1529271085315395584:profile|AmusedCat74> who needs this with me π
Hi @<1546665634195050496:profile|SolidGoose91> , I think this capability exists when running pipelines. The pipeline controller will detect spot instances that failed and will retry running them.
Are you using the PRO or the open source auto scaler?
Do Pipelines work with Hyperparameter search, and with single training jobs?
@<1546665634195050496:profile|SolidGoose91> regarding spot instances, are you referring to tasks running using the AutoScaler App? If so, the autoscaler app should detect the failed spot machine and create a new spot machine that should start running the specific task which was interrupted
@<1546665634195050496:profile|SolidGoose91> pipeliens are yours to implement as you with - you define what which step will do. However, for Hyperparameter search, you have the HPO app, which might be a quicker ready-made solution π
Yes, we love the HPO app, and are using it :)
And yes, I was also referring to tasks ran by the Autoscaler (potentially via the HPO) app, too.