Hello Community. I'D Like To Try The Aws Autoscaler (I Actually Prefer To Try The Gcp One But I Think It'S Broken Or, At Least, I'Ve Failed To Make It Work So Far) I Can'T Find Documentation On What Permissions Would Be Required From An Aws Sub-Account

Answered

Hello community.

I'd like to try the AWS autoscaler (I actually prefer to try the GCP one but I think it's broken or, at least, I've failed to make it work so far)

I can't find documentation on what permissions would be required from an AWS sub-account for the AWS Autoscaler to function. Can someone here tell me what roles that account would need to have ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					PanickyMoth78
				
					0
					 × 1

Votes Newest

Answers 8

Just updating here that I got the AWS autoscaler working with CostlyOstrich36 ’s generous help 🎉

I thought I'd share here some details in case others experience similar difficulties

With regards to permissions, this is the list of actions that the autoscaler uses which your aws account would need to permit:
GetConsoleOutput RequestSpotInstances DescribeSpotInstanceRequests RunInstances DescribeInstances TerminateInstances DescribeInstancesthe instance image ami-04c0416d6bd8e4b1f is no longer available to new users. you will need different images that match:
The machine architectures of your chosen machines The region you specified in your aws credentials The region that you specified in your resource definitions (it think the that aws credentials region and this one have to match)Otherwise you'll get the image ... does not exist errors or an "image doesn't match the instance architecture" error (if once the is found).

As I understand it, when using pipelines, you'd probably want cpu-only instances for the services queue and GPU-sporting instances for the default queue (or any queue that runs pipeline components). This means different machine architectures and different instance images as well !

The default docker image that currently appears in the definition popups, nvidia/cuda:10.2-runtime-ubuntu18.04 , is somewhat outdated. Aside from possible problems with current packages that use GPUs, it uses python 3.6 and this led to package install failures when the clearml agent is brought up within the docker container and starts installing python packages .

Here is a configuration (as reported on the autoscaler task's configuration tab under resource_configurations) that worked for me with AWS credentials that specify us-east-1 as the region :
[ { "resource_name": "aws_default", "instance_type": "g4dn.2xlarge", "cpu_only": false, "is_spot": false, "regular_instance_rollback": false, "regular_instance_rollback_timeout": null, "availability_zone": "us-east-1a", "ami_id": "ami-003f25e6e2d2db8f1", "num_instances": 3, "queue_name": "default", "tags": "owner=lavi", "ebs_device_name": "/dev/sda1", "ebs_volume_size": 500, "ebs_volume_type": "gp3", "key_name": null, "security_group_ids": null, "subnet_id": null }, { "resource_name": "aws_services", "instance_type": "m5.large", "cpu_only": true, "is_spot": false, "regular_instance_rollback": false, "regular_instance_rollback_timeout": null, "availability_zone": "us-east-1a", "ami_id": "ami-040d909ea4e56f8f3", "num_instances": 2, "queue_name": "services", "tags": "owner=lavi", "ebs_device_name": "/dev/sda1", "ebs_volume_size": 500, "ebs_volume_type": "gp3", "key_name": null, "security_group_ids": null, "subnet_id": null } ]using base docker image nvidia/cuda:11.2.2-runtime-ubuntu20.0 4

Perhaps the defaults https://github.com/allegroai/clearml/blob/master/examples/services/aws-autoscaler/aws_autoscaler.py should be updated?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					PanickyMoth78
				
					0
					 × 1

PanickyMoth78 , please try with us-east-1a

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

trying the AWS Autoscaler for the first time I get his error on instance spin up:
An error occurred (InvalidAMIID.NotFound) when calling the RunInstances operation: The image id '[ami-04c0416d6bd8e4b1f]' does not existI tried both us-west-2 and us-east-1b (thinking it might be zone specific).

I'm not sure if this is a permissions issue or a config issue.

The same occures when I try a different image:
ami-06bafe528da33cdb8
(an aws public image)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					PanickyMoth78
				
					0
					 × 1

PanickyMoth78 , let me check on that 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Oh, wasn't aware 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I'm on the pro tier

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					PanickyMoth78
				
					0
					 × 1

Hi PanickyMoth78 , the GCP autoscaler is actually part of the pro/scale solutions

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

I'm looking for a minimal set of permissions because we have other sensitive ec2 instances running in the same account and our IT people are rightfully concerned about providing access to that account externally.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					PanickyMoth78
				
					0
					 × 1

Write your answer

2K Views

8 Answers

3 years ago

2 years ago