I Encountered A Weird Edge Case With The Aws Auto-Scaler, Wondering If There Are Any Solutions Or If This Is A Known Issue. Something As Follows Happened:

CostlyOstrich36 I'm not sure what is holding it from spinning down. Unfortunately I was not around when this happened. Maybe it was AWS taking a while to terminate, or maybe it was just taking a while to register in the autoscaler.

The logs looked like this:

  1. Recognizing an idle worker and spinning down.
    2022-09-19 12:27:33,197 - clearml.auto_scaler - INFO - Spin down instance cloud id 'i-058730639c72f91e1'2. Recognizing a new task is available, but the worker is still idle.
    2022-09-19 12:32:35,698 - clearml.auto_scaler - INFO - Found 1 tasks in queue 'aws' 2022-09-19 12:32:35,816 - clearml.auto_scaler - INFO - idle worker: {'dynamic_worker:c5n_4xl:c5n.4xlarge:i-058730639c72f91e1': (1663590436.5344, 'c5n_4xl', <Worker: id=dynamic_worker:c5n_4xl:c5n.4xlarge:i-058730639c72f91e1>)}3. A few minutes later, the task is still queued, the idle worker is still active (we have a budget of 6 AWS instances on this aws queue):
    2022-09-19 12:36:37,860 - clearml.auto_scaler - INFO - Found 1 tasks in queue 'aws' 2022-09-19 12:36:37,973 - clearml.auto_scaler - INFO - idle worker: {'dynamic_worker:c5n_4xl:c5n.4xlarge:i-058730639c72f91e1': (1663590436.5344, 'c5n_4xl', <Worker: id=dynamic_worker:c5n_4xl:c5n.4xlarge:i-058730639c72f91e1>)}4. A minute later, the idle worker finally shuts down and disappears from the idle worker list, and a new instance is spun up:
    2022-09-19 12:37:38,389 - clearml.auto_scaler - INFO - Found 1 tasks in queue 'aws' 2022-09-19 12:37:38,506 - clearml.auto_scaler - INFO - Spinning new instance resource='c5n_4xl', prefix='dynamic_worker', queue='aws'
Posted one year ago
0 Answers
one year ago
one year ago