Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Hello Community. I'D Like To Try The Aws Autoscaler (I Actually Prefer To Try The Gcp One But I Think It'S Broken Or, At Least, I'Ve Failed To Make It Work So Far) I Can'T Find Documentation On What Permissions Would Be Required From An Aws Sub-Account

Hello community.

I'd like to try the AWS autoscaler (I actually prefer to try the GCP one but I think it's broken or, at least, I've failed to make it work so far)

I can't find documentation on what permissions would be required from an AWS sub-account for the AWS Autoscaler to function. Can someone here tell me what roles that account would need to have ?

Posted 10 months ago
Votes Newest

Answers 8

PanickyMoth78 , let me check on that 🙂

Posted 10 months ago

PanickyMoth78 , please try with us-east-1a

Posted 10 months ago

Oh, wasn't aware 🙂

Posted 10 months ago

Hi PanickyMoth78 , the GCP autoscaler is actually part of the pro/scale solutions

Posted 10 months ago

I'm looking for a minimal set of permissions because we have other sensitive ec2 instances running in the same account and our IT people are rightfully concerned about providing access to that account externally.

Posted 10 months ago

I'm on the pro tier

Posted 10 months ago

trying the AWS Autoscaler for the first time I get his error on instance spin up:
An error occurred (InvalidAMIID.NotFound) when calling the RunInstances operation: The image id '[ami-04c0416d6bd8e4b1f]' does not existI tried both us-west-2 and us-east-1b (thinking it might be zone specific).

I'm not sure if this is a permissions issue or a config issue.

The same occures when I try a different image:
(an aws public image)

Posted 10 months ago

Just updating here that I got the AWS autoscaler working with CostlyOstrich36 ’s generous help 🎉

I thought I'd share here some details in case others experience similar difficulties

With regards to permissions, this is the list of actions that the autoscaler uses which your aws account would need to permit:
GetConsoleOutput RequestSpotInstances DescribeSpotInstanceRequests RunInstances DescribeInstances TerminateInstances DescribeInstancesthe instance image ami-04c0416d6bd8e4b1f is no longer available to new users. you will need different images that match:
The machine architectures of your chosen machines The region you specified in your aws credentials The region that you specified in your resource definitions (it think the that aws credentials region and this one have to match)Otherwise you'll get the image ... does not exist errors or an "image doesn't match the instance architecture" error (if once the is found).

As I understand it, when using pipelines, you'd probably want cpu-only instances for the services queue and GPU-sporting instances for the default queue (or any queue that runs pipeline components). This means different machine architectures and different instance images as well !

The default docker image that currently appears in the definition popups, nvidia/cuda:10.2-runtime-ubuntu18.04 , is somewhat outdated. Aside from possible problems with current packages that use GPUs, it uses python 3.6 and this led to package install failures when the clearml agent is brought up within the docker container and starts installing python packages .

Here is a configuration (as reported on the autoscaler task's configuration tab under resource_configurations) that worked for me with AWS credentials that specify us-east-1 as the region :
[ { "resource_name": "aws_default", "instance_type": "g4dn.2xlarge", "cpu_only": false, "is_spot": false, "regular_instance_rollback": false, "regular_instance_rollback_timeout": null, "availability_zone": "us-east-1a", "ami_id": "ami-003f25e6e2d2db8f1", "num_instances": 3, "queue_name": "default", "tags": "owner=lavi", "ebs_device_name": "/dev/sda1", "ebs_volume_size": 500, "ebs_volume_type": "gp3", "key_name": null, "security_group_ids": null, "subnet_id": null }, { "resource_name": "aws_services", "instance_type": "m5.large", "cpu_only": true, "is_spot": false, "regular_instance_rollback": false, "regular_instance_rollback_timeout": null, "availability_zone": "us-east-1a", "ami_id": "ami-040d909ea4e56f8f3", "num_instances": 2, "queue_name": "services", "tags": "owner=lavi", "ebs_device_name": "/dev/sda1", "ebs_volume_size": 500, "ebs_volume_type": "gp3", "key_name": null, "security_group_ids": null, "subnet_id": null } ]using base docker image nvidia/cuda:11.2.2-runtime-ubuntu20.0 4

Perhaps the defaults https://github.com/allegroai/clearml/blob/master/examples/services/aws-autoscaler/aws_autoscaler.py should be updated?

Posted 10 months ago
8 Answers
10 months ago
4 months ago