Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I Am Trying To Use The Clearml Aws Autoscaler Service (In The Pro Version) And Getting This Error:

Hi, I am trying to use the ClearML AWS Autoscaler Service (in the PRO version) and getting this error:
raceback (most recent call last): File "/root/.clearml/venvs-builds/3/task_repository/clearml-apps.git/apps/auto_scaler/auto_scaler.py", line 311, in supervisor instance_id = self.driver.spin_up_worker(resource_conf, worker_prefix, queue, task_id=task_id) File "/root/.clearml/venvs-builds/3/task_repository/clearml-apps.git/apps/auto_scaler/cloud_driver.py", line 150, in spin_up_worker instance_id, region = self._spin_up_worker(resource_conf, worker_prefix, queue_name, task_id) File "/root/.clearml/venvs-builds/3/task_repository/clearml-apps.git/apps/auto_scaler/aws_driver.py", line 181, in _spin_up_worker instances = ec2.run_instances(**launch_specification) File "/root/venv/lib/python3.8/site-packages/botocore/client.py", line 508, in _api_call return self._make_api_call(operation_name, kwargs) File "/root/venv/lib/python3.8/site-packages/botocore/client.py", line 915, in _make_api_call raise error_class(parsed_response, operation_name) botocore.exceptions.ClientError: An error occurred (UnauthorizedOperation) when calling the RunInstances operation: You are not authorized to perform this operation.The user policy contains this statement:
{ "Sid": "RunEC2", "Effect": "Allow", "Action": [ "ec2:RunInstances" ], "Resource": [ "arn:aws:ec2:*:{my_user_id}:network-interface/*", "arn:aws:ec2:{my_region}:{my_user_id}:key-pair/{my key-pair name}", "arn:aws:ec2:*:*:instance/*", "arn:aws:ec2:*::volume/*", "arn:aws:ec2:{my_region}:{my_user_id}:subnet/subnet-{subnet_id}", "arn:aws:ec2:{my_region}:{my_user_id}:security-group/sg-{securitygroup_id}", "arn:aws:ec2:*::image/*" ] }This is the resource configuration:
[{"resource_name": "clearml_agent_gpu", "instance_type": "g4dn.4xlarge", "cpu_only": false, "is_spot": false, "regular_instance_rollback": false, "regular_instance_rollback_timeout": null, "availability_zone": "{region}", "ami_id": "ami-02ee8fd0114cd8baf", "num_instances": 1, "queue_name": "default", "tags": "Owner=....,Name=ClearML_agent,4M-Team=ML", "ebs_device_name": "/dev/sda1", "ebs_volume_size": 100, "ebs_volume_type": "gp2", "key_name": "{my key-pair name}", "security_group_ids": "sg-{securitygroup_id}", "subnet_id": "subnet-{subnet_id}"}]I will appreciate your help solving the issue 🙂

  
  
Posted one year ago
Votes Newest

Answers 21


Hi, SmugTurtle78 ,

Can you please try with "Resource": "*" ?
Also these are the settings that I use, Some might be redundant so consults with your devops guys 🙂
{ "Sid": "EC2InstanceManagement", "Effect": "Allow", "Action": [ "ec2:AttachClassicLinkVpc", "ec2:CancelSpotInstanceRequests", "ec2:CreateFleet", "ec2:CreateTags", "ec2:DeleteTags", "ec2:Describe*", "ec2:DetachClassicLinkVpc", "ec2:ModifyInstanceAttribute", "ec2:RequestSpotInstances", "ec2:RunInstances", "ec2:StartInstances", "ec2:StopInstances", "ec2:TerminateInstances" ], "Resource": "*" }

  
  
Posted one year ago

SmugTurtle78 , regarding the CPU only mode - How are you running. Are you using the application in PRO version or are you running through one of the examples?

  
  
Posted one year ago

It is working with "Resource": "*" , I will try to use it and maybe use deny for specific configuration, thanks :)

  
  
Posted one year ago

Hi CostlyOstrich36 AnxiousSeal95 , Do you have any idea ?

  
  
Posted one year ago

Also, I think that maybe there is a bug with the CPU mode: I tried to run tests with instance without GPU , marked the option "Run in CPU mode (no gpus)" and I saw on the experiment logs that its trying to run the docker with "--gpus all" option and failed right after the execution.

Which instance type did you use?

  
  
Posted one year ago

About the CPU mode, used t3.medium...
About the specific configuration, of course- was trying this policy for example ( when I remove the ec2-vpc condition it works)
{ "Sid": "GeneralEC2", "Effect": "Allow", "Action": [ "ec2:AttachClassicLinkVpc", "ec2:CancelSpotInstanceRequests", "ec2:CreateFleet", "ec2:Describe*", "ec2:GetConsoleOutput", "ec2:DetachClassicLinkVpc", "ec2:ModifyInstanceAttribute", "ec2:RequestSpotInstances" ], "Resource": "*", "Condition": { "StringEquals": { "aws:RequestedRegion": "{region}" } } }, { "Sid": "RunEC2", "Effect": "Allow", "Action": [ "ec2:RunInstances", "ec2:CreateTags", "ec2:DeleteTags", "ec2:StartInstances", "ec2:StopInstances", "ec2:TerminateInstances" ], "Resource": "*", "Condition": { "StringEquals": { "aws:RequestedRegion": "{region}", "ec2:vpc": "arn:aws:ec2:{region}:{user_id}:vpc/vpc-{subnet_id}" } } }

  
  
Posted one year ago

No, I want to use AWS user with much lower credentials ( Only to relevant for the autoscaler missions)- for example Describe/ RequestSpotInstance/ StopInstances permissions only for the relevant subnet, security group and instance types..

  
  
Posted one year ago

Hi SmugTurtle78 , sorry for answer in slow-mo 😉 I'm not 100% sure I got the question... you want a a global security group and network for the entire autoscaler instead of per-instance type?

  
  
Posted one year ago

Can you try launching a new instance with CPU only and add the log here? I just tried on PRO myself with CPU only and it worked. Can you look at the version of the application you're running with? To see the version you have inside the screen of the application on the top left a small highlighted text "more" if you click on it some of the text will scroll down and show you the version

  
  
Posted one year ago

I tried again and now it is working ,The version is: v1.4.0.
about the options of subnet & security group, I saw it already, but I use it but still I want to give the app lower policy that enable it to run only this network and this security group. CostlyOstrich36

  
  
Posted one year ago

Also in applications I see an option for subnet ID & security group

  
  
Posted one year ago

Can you try creating a new instance?

  
  
Posted one year ago

CostlyOstrich36 Any idea? :)\

  
  
Posted one year ago

Hi SmugTurtle78 , Could you spin up an instance with the same user from AWS CLI?

  
  
Posted one year ago

AnxiousSeal95 Maybe you could help me? 🙂

  
  
Posted one year ago

CostlyOstrich36 Through the PRO version

  
  
Posted one year ago

CostlyOstrich36 AnxiousSeal95
So ,when I was added specific configuration it failed.
Is there a way to lower the needed credentials for specific actions such as: run, stop, start instances etc...? for example: fixing it to work only with conditions of specific subnet, security group and instance types? ( I was trying doing it but as I said it failed with this message:
An error occurred (UnauthorizedOperation) when calling the RunInstances operation: You are not authorized to perform this operation. )
Also, I think that maybe there is a bug with the CPU mode: I tried to run tests with instance without GPU , marked the option "Run in CPU mode (no gpus)" and I saw on the experiment logs that its trying to run the docker with "--gpus all" option and failed right after the execution.

  
  
Posted one year ago

Is there a way to lower the needed credentials for specific actions such as: run, stop, start instances etc...? for example: fixing it to work only with conditions of specific subnet, security group and instance types? ( I was trying doing it but as I said it failed with this message:

Can you elaborate on the specific configuration?

  
  
Posted one year ago

CostlyOstrich36 I would like to add more conditions such as security groups and instance types- this is only an example :)

  
  
Posted one year ago

SmugTurtle78 , I'll take a look at it shortly 🙂

  
  
Posted one year ago

CornyDeer86 you are the best! AnxiousSeal95 thanks for your help also, we have been trying to solve this problem for a long time, and now it works like a magic.
conclusions:
minimal policy needed for the auto-scaling service if we want to specify security group and subnet for RunInstances action permissions and also to use spot instances (as far as we found until now):
still one problem is existing - AWS doesn't support specifications of resources for DescribeInstances/RequestSpotInstances Actions{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "ec2:CancelSpotInstanceRequests", "ec2:RequestSpotInstances", "ec2:DescribeSpotInstanceRequests", "ec2:DescribeInstances" ], "Resource": "*", "Condition": { "StringEquals": { "aws:RequestedRegion": "{region}" } } }, { "Sid": "VisualEditor1", "Effect": "Allow", "Action": [ "ec2:TerminateInstances", "ec2:DeleteTags", "ec2:StartInstances", "ec2:CreateTags", "ec2:RunInstances", "ec2:StopInstances", "ec2:GetConsoleOutput" ], "Resource": [ "arn:aws:ec2:{region}:{user_id}:network-interface/*", "arn:aws:ec2:{region}:{user_id}:subnet/subnet-{subnet_id}", "arn:aws:ec2:{region}:{user_id}:key-pair/*", "arn:aws:ec2:{region}:{user_id}:instance/*", "arn:aws:ec2:{region}:{user_id}:volume/*", "arn:aws:ec2:{region}:{user_id}:security-group/sg-{security_group_id}" ], "Condition": { "StringEquals": { "aws:RequestedRegion": "{region}" } } }, { "Sid": "VisualEditor2", "Effect": "Allow", "Action": "ec2:RunInstances", "Resource": "arn:aws:ec2:{region}::image/ami-{ami_id}", "Condition": { "StringEquals": { "ec2:Owner": "amazon" } } } ] }

  
  
Posted one year ago
564 Views
21 Answers
one year ago
one year ago
Tags