ClearML FAQ | Hello, A Question On Using The Aws Autoscaler App In The Saas – Can A Custom Iam Role Be Used For Launching The Machines? I’M Looking At Giving Custom Permissions To The Machine For Checking Out Secrets / Code / Etc.

Answered

Hello, A Question On Using The Aws Autoscaler App In The Saas – Can A Custom Iam Role Be Used For Launching The Machines? I’M Looking At Giving Custom Permissions To The Machine For Checking Out Secrets / Code / Etc.

Hello, a question on using the AWS Autoscaler app in the SaaS – can a custom IAM role be used for launching the machines?
I’m looking at giving custom permissions to the machine for checking out secrets / code / etc.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SparklingHedgehong28
				
					0
					 × 1

Votes Newest

Answers 23

Hello Jake, sorry for the delay. I found the original auto scaler script and now understand more about how stuff works.
Can you please help understand how Clear assigns jobs to queues – will there be more than 1 job per machine at a single point in time? I do understand that if there are 2 agents running for each GPU, then a vacant agent will probably take a job from the queue. But what’s the general case here?
Thanks so much and sorry if this was already explained – feel free to point me to a doc.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SparklingHedgehong28
				
					0
					 × 1

Do you mean assigning an IAM role to an ec2 instance it launches?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Regarding this one, as

mention, you can have a full iam role (without any credentials) in higher tiers, in the regular youll need the credentials for the auth using the boto3 commands for spin up, spin down, tags and such apis commands. The app currently is hosted by us, so you iam role won’t be really available

With it the new created instance will have the iam role associate to it too

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TimelyPenguin76
				
					0
					 Administrator

If you're using recent versions, you can use the queues.get_num_entries endpoint to obtain the number of pending entries (jobs) in a queue, instead of pulling the entire queue contents

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Yup, seems like it. Which Server version are you using?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

SparklingHedgehong28 since the application code launching the machines is running in the ClearML SaaS, we can't really use your IAM role 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Hi SparklingHedgehong28 ,

also, when the AZ spec is left empty (it’s optional), machines fail to start with

Invalid type for parameter LaunchSpecification.Placement.AvailabilityZone, value: None, type: <class 'NoneType'>, valid types: <class 'str'>Checking this issue, when not specifying the AZ it should use an available one, will keep you posted about it

ok, I misread. The launch code runs in the SaaS, but it uses credentials to launch machines in our cloud. What stops it then from specifying an IAM role existing in our cloud? Isn’t this just an API call?

Regarding this one, as SuccessfulKoala55 mention, you can have a full iam role (without any credentials) in higher tiers, in the regular youll need the credentials for the auth using the boto3 commands for spin up, spin down, tags and such apis commands. The app currently is hosted by us, so you iam role won’t be really available

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TimelyPenguin76
				
					0
					 Administrator

We don't, we use the SaaS.
Exactly, that is the call I implemented and wrapped it into some serverless code to export data to CloudWatch.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SparklingHedgehong28
				
					0
					 × 1

what stops adding ‘IAM Role’ to the list of parameters of the machine group? I can’t wrap my head around this.

Actually this option is also available with the iam role version mention above (you can specify Arn or Name for the role)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					TimelyPenguin76
				
					0
					 Administrator

Thank you for the reply, Alon and looking into the issue. I’m not speaking of the app/launcher app itself inheriting IAM permissions by assuming the role – this is absolutely understandable as it’s in your cloud. Contrary, when launching an AWS auto-scaler app, what stops adding ‘IAM Role’ to the list of parameters of the machine group? I can’t wrap my head around this.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SparklingHedgehong28
				
					0
					 × 1

Also, in scale and higher tiers I think the IAM role feature is also supported

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

What do you mean by "managed"?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

…or running the auto-scaler on our own (with own node pools) would solve this issue?

Sorry, maybe I’m not getting the whole picture yet.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SparklingHedgehong28
				
					0
					 × 1

How else would it spin new instances in your account? 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Yes. A pre-created one.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SparklingHedgehong28
				
					0
					 × 1

Oh, well, the SaaS supports that as well 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Hi SparklingHedgehong28 , ClearML enqueues jobs/tasks to queues simply according to your request - when you enqueue using the UI, or from code. Any agent monitoring a queue (or more than one queue) can pull work from that queue (agents monitoring more than one queue use a round-robin scheme). In general, an agent will pull and run one job at a time. Multiple agents running on a single machine (each with a different GPU assignment) will pull and run one job at a time each.
An exception to this is the agent services mode, which is designed to spin several jobs in parallel - this mode is used in agents running CPU-only tasks that require low resources (the default services agent running as part of the server deployment is such an example)
An exception to this is the agen

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Running the application code on you own is supported in scale and higher tiers, but you can always spin your own autoscaler code based on the open-source version 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

Thank you for explaining. As we are implementing a custom auto-scaler for the machines, it looks like the queue length - or a rate of messages falling into the queue - are the best indicators for scaling up or down.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SparklingHedgehong28
				
					0
					 × 1

ok, I misread. The launch code runs in the SaaS, but it uses credentials to launch machines in our cloud. What stops it then from specifying an IAM role existing in our cloud? Isn’t this just an API call?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SparklingHedgehong28
				
					0
					 × 1

Hi SuccessfulKoala55 , just wondering – you mentioned the open-source autoscaler code version, but where is it hosted? I’ve only found https://github.com/allegroai/clearml/blob/master/examples/services/aws-autoscaler/aws_autoscaler.py , but it looks like code to launch a managed auto-scaler instead 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SparklingHedgehong28
				
					0
					 × 1

Why does the autoscaler app then ask for AWS credentials? 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SparklingHedgehong28
				
					0
					 × 1

also, when the AZ spec is left empty (it’s optional), machines fail to start with
Invalid type for parameter LaunchSpecification.Placement.AvailabilityZone, value: None, type: <class 'NoneType'>, valid types: <class 'str'>

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SparklingHedgehong28
				
					0
					 × 1

Write your answer

2K Views

23 Answers

3 years ago

2 years ago