Unanswered
Hi, Where Can I Find The Server Parameter To Control When The Server Is Unregistering An Agent After Not Receiving Updates? Currently It'S Quite Long (30Mins) And This Prevents The Autoscaler From Launching A New Agent
Thank you, for your answer.
aws_autoscaler.py works as follows (based on my experiments):
- let’s assume that the instance and the worker is started
- there are no tasks running on the worker for max_idle_time_min
- autoscaler terminates the instance
- worker stops sending updates to app.clear.ml
- worker is still shown on the ui with message “Update Time a few minutes ago”
- autoscaler thinks that this worker is still idle because it’s returned via workers.get_all
- when I enqueue task in this state autoscaler doesn’t start new instance untill 600secs interval finishes
Does app.clear.ml autoscaler works the same way ?
Is it possible to see app.clear.ml autoscaler sources ?
85 Views
0
Answers
11 months ago
11 months ago