Hi, I Have A Question Regarding The Aws-Autoscaler: Am I Understanding Correctly That:

Answered

Hi, I have a question regarding the aws-autoscaler: am I understanding correctly that:
max_idle_time_min=5 max_spin_up_time_min=10 polling_interval_time_min=1The autoscaler will:
Check every minute the status of the queues/agents If one instance was started more than 10 mins ago, but is still not connected to the clearml-server, another one will be started If one instance is idle for more than 5 mins, it will be shutted down

  				
Posted 
	3 years ago

					More  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

Votes Newest

Answers 11

👍 checking it

  				
Posted 
	3 years ago

					More  		
  Report
		
					TimelyPenguin76
				
					0
					 Administrator

HI JitteryCoyote63 ,

can you try increasing the polling_interval_time_min to 5-7 minutes for the check? do you get double machines with it too?

  				
Posted 
	3 years ago

					More  		
  Report
		
					TimelyPenguin76
				
					0
					 Administrator

Yes, it did spin two instances for the same task

  				
Posted 
	3 years ago

					More  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

1.1.2

  				
Posted 
	3 years ago

					More  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

whats the clearml version?

  				
Posted 
	3 years ago

					More  		
  Report
		
					TimelyPenguin76
				
					0
					 Administrator

Why would it solve the issue? max_spin_up_time_min should be the param defining how long to wait after starting an instance, not polling_interval_time_min , right?

  				
Posted 
	3 years ago

					More  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

Correct 🙂
polling_interval_time_min = the scaler interval for checking tasks in the queue

  				
Posted 
	3 years ago

					More  		
  Report
		
					TimelyPenguin76
				
					0
					 Administrator

I want to verify it doesn’t start more than one instance for the same task

  				
Posted 
	3 years ago

					More  		
  Report
		
					TimelyPenguin76
				
					0
					 Administrator

Ok, I am asking because I often see the autoscaler starting more instances than the number of experiments in the queues, so I guess I just need to increase the max_spin_up_time_min

  				
Posted 
	3 years ago

					More  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

I will try with that and keep you updated

  				
Posted 
	3 years ago

					More  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

Here is what happens with polling_interval_time_min=1 when I add one task to the queue. The instance takes ~5 mins to start and connect. During this timeframe, the autoscaler starts to new instances, then spin them down. So it acts as if max_spin_up_time_min=10 is not taken into account

  				
Posted 
	3 years ago

					More  		
  Report
		
					JitteryCoyote63
				
					0
					 × 1

Write your answer

1K Views

11 Answers

3 years ago

2 years ago