Hi, I am using the aws autoscaler and getting the following error while trying to spin up spot instances:
2021-08-16 17:18:48 Spinning new instance type=v100_spot 2021-08-16 17:28:35 Error: Failed to start new instance, Waiter SpotInstanceRequestFulfilled failed: Max attempts exceeded. Previously accepted state: Matched expected service error code: InvalidSpotInstanceRequestID.NotFound 2021-08-16 17:33:36 Spinning new instance type=v100_spot 2021-08-16 17:43:23 Error: Failed to start new instance, Waiter SpotInstanceRequestFulfilled failed: Max attempts exceeded 2021-08-16 17:48:24 Spinning new instance type=v100_spot 2021-08-16 17:58:15 Error: Failed to start new instance, Waiter SpotInstanceRequestFulfilled failed: Max attempts exceeded. Previously accepted state: Matched expected service error code: InvalidSpotInstanceRequestID.NotFound 2021-08-16 18:03:16 Spinning new instance type=v100_spot 2021-08-16 18:13:03 Error: Failed to start new instance, Waiter SpotInstanceRequestFulfilled failed: Max attempts exceeded 2021-08-16 18:18:04 Spinning new instance type=v100_spot 2021-08-16 18:27:56 Error: Failed to start new instance, Waiter SpotInstanceRequestFulfilled failed: Max attempts exceeded. Previously accepted state: Matched expected service error code: InvalidSpotInstanceRequestID.NotFound 2021-08-16 18:32:51 Spinning new instance type=v100_spot 2021-08-16 18:42:43 Error: Failed to start new instance, Waiter SpotInstanceRequestFulfilled failed: Max attempts exceeded. Previously accepted state: Matched expected service error code: InvalidSpotInstanceRequestID.NotFound 2021-08-16 18:47:44 Spinning new instance type=v100_spot 2021-08-16 19:04:17 Ignoring 1 stuck instances of type v100_spot 2021-08-16 19:09:13 Ignoring 1 stuck instances of type v100_spot 2021-08-16 19:14:14 Ignoring 1 stuck instances of type v100_spot .... (Keep logging the same message: "Ignoring 1 stuck instances of type v100_spot")
So it looks like there is no availability from aws side - this is fine. But I observe that the aws autoscaler starting from a specific time is always logging Ignoring 1 stuck instances of type v100_spot
and not even creating new spot instances requests (I can see it in the AWS console) - Is this expected?
It means that I have to restart it periodically so that it can create new spot instance requests..