Why would it solve the issue? max_spin_up_time_min
should be the param defining how long to wait after starting an instance, not polling_interval_time_min
, right?
I will try with that and keep you updated
Here is what happens with polling_interval_time_min=1
when I add one task to the queue. The instance takes ~5 mins to start and connect. During this timeframe, the autoscaler starts to new instances, then spin them down. So it acts as if max_spin_up_time_min=10
is not taken into account
Correct ๐polling_interval_time_min
= the scaler interval for checking tasks in the queue
HI JitteryCoyote63 ,
can you try increasing the polling_interval_time_min
to 5-7 minutes for the check? do you get double machines with it too?
I want to verify it doesnโt start more than one instance for the same task
Ok, I am asking because I often see the autoscaler starting more instances than the number of experiments in the queues, so I guess I just need to increase the max_spin_up_time_min
Yes, it did spin two instances for the same task