Unanswered
I Encountered A Weird Edge Case With The Aws Auto-Scaler, Wondering If There Are Any Solutions Or If This Is A Known Issue.
Something As Follows Happened:
CostlyOstrich36 I'm not sure what is holding it from spinning down. Unfortunately I was not around when this happened. Maybe it was AWS taking a while to terminate, or maybe it was just taking a while to register in the autoscaler.
The logs looked like this:
- Recognizing an idle worker and spinning down.
2022-09-19 12:27:33,197 - clearml.auto_scaler - INFO - Spin down instance cloud id 'i-058730639c72f91e1'
2. Recognizing a new task is available, but the worker is still idle.2022-09-19 12:32:35,698 - clearml.auto_scaler - INFO - Found 1 tasks in queue 'aws' 2022-09-19 12:32:35,816 - clearml.auto_scaler - INFO - idle worker: {'dynamic_worker:c5n_4xl:c5n.4xlarge:i-058730639c72f91e1': (1663590436.5344, 'c5n_4xl', <Worker: id=dynamic_worker:c5n_4xl:c5n.4xlarge:i-058730639c72f91e1>)}
3. A few minutes later, the task is still queued, the idle worker is still active (we have a budget of 6 AWS instances on thisaws
queue):2022-09-19 12:36:37,860 - clearml.auto_scaler - INFO - Found 1 tasks in queue 'aws' 2022-09-19 12:36:37,973 - clearml.auto_scaler - INFO - idle worker: {'dynamic_worker:c5n_4xl:c5n.4xlarge:i-058730639c72f91e1': (1663590436.5344, 'c5n_4xl', <Worker: id=dynamic_worker:c5n_4xl:c5n.4xlarge:i-058730639c72f91e1>)}
4. A minute later, the idle worker finally shuts down and disappears from the idle worker list, and a new instance is spun up:2022-09-19 12:37:38,389 - clearml.auto_scaler - INFO - Found 1 tasks in queue 'aws' 2022-09-19 12:37:38,506 - clearml.auto_scaler - INFO - Spinning new instance resource='c5n_4xl', prefix='dynamic_worker', queue='aws'
171 Views
0
Answers
2 years ago
one year ago
Tags