Hi π Anyone having any idea on that one please? Or could point me in the right place or the right person to find out? Thanks for any help!
Hi @<1546665634195050496:profile|SolidGoose91> , sorry, missed this π
The Regular Instance Rollback Timeout controls when the autoscaler will revert to starting a regular instance and not a spot instance, after failing to start a spot - it will attempt to start a spot, and than wait and retry again and again - once the time it waited exceeded the Regular Instance Rollback Timeout, it will try to start a regular instance instead. This is for a specific attempt, where starting a spot fails and an alternative instance needs to be started.
The Spot Instance Blackout Period specifies a blackout period after trying to start a spot failed. This is related to future attempts, and basically says that after an event of failing to start a spot, all requests to start additional spot instances will be converted to attempts to start regular instances, basically as a way of "easing" the spot requests load on the cloud provider and not creating a "DOS" situation in the cloud account which might cause the provider to refuse creating spots for a longer period.
Brilliant, thanks a lot for the answer Jake, much appreciated and clearer!
@<1529271085315395584:profile|AmusedCat74> @<1548115177340145664:profile|HungryHorse70> here we have the answer :)
π thanks for clearing that up @<1523701087100473344:profile|SuccessfulKoala55>
Is the doc on GitHub so we can copy that into a PR?
Sure, docs are in https://github.com/allegroai/clearml-docs