Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello, A Question On Using The Aws Autoscaler App In The Saas – Can A Custom Iam Role Be Used For Launching The Machines? I’M Looking At Giving Custom Permissions To The Machine For Checking Out Secrets / Code / Etc.

Hello, a question on using the AWS Autoscaler app in the SaaS – can a custom IAM role be used for launching the machines?
I’m looking at giving custom permissions to the machine for checking out secrets / code / etc.

  
  
Posted one year ago
Votes Newest

Answers 23


Hello Jake, sorry for the delay. I found the original auto scaler script and now understand more about how stuff works.
Can you please help understand how Clear assigns jobs to queues – will there be more than 1 job per machine at a single point in time? I do understand that if there are 2 agents running for each GPU, then a vacant agent will probably take a job from the queue. But what’s the general case here?
Thanks so much and sorry if this was already explained – feel free to point me to a doc.

  
  
Posted one year ago

Hi SparklingHedgehong28 , ClearML enqueues jobs/tasks to queues simply according to your request - when you enqueue using the UI, or from code. Any agent monitoring a queue (or more than one queue) can pull work from that queue (agents monitoring more than one queue use a round-robin scheme). In general, an agent will pull and run one job at a time. Multiple agents running on a single machine (each with a different GPU assignment) will pull and run one job at a time each.
An exception to this is the agent services mode, which is designed to spin several jobs in parallel - this mode is used in agents running CPU-only tasks that require low resources (the default services agent running as part of the server deployment is such an example)
An exception to this is the agen

  
  
Posted one year ago

Running the application code on you own is supported in scale and higher tiers, but you can always spin your own autoscaler code based on the open-source version 🙂

  
  
Posted one year ago

SparklingHedgehong28 since the application code launching the machines is running in the ClearML SaaS, we can't really use your IAM role 🙂

  
  
Posted one year ago

…or running the auto-scaler on our own (with own node pools) would solve this issue?

Sorry, maybe I’m not getting the whole picture yet.

  
  
Posted one year ago

Also, in scale and higher tiers I think the IAM role feature is also supported

  
  
Posted one year ago

Hi SparklingHedgehong28 ,

also, when the AZ spec is left empty (it’s optional), machines fail to start with

Invalid type for parameter LaunchSpecification.Placement.AvailabilityZone, value: None, type: <class 'NoneType'>, valid types: <class 'str'>Checking this issue, when not specifying the AZ it should use an available one, will keep you posted about it

ok, I misread. The launch code runs in the SaaS, but it uses credentials to launch machines in our cloud. What stops it then from specifying an IAM role existing in our cloud? Isn’t this just an API call?

Regarding this one, as SuccessfulKoala55 mention, you can have a full iam role (without any credentials) in higher tiers, in the regular youll need the credentials for the auth using the boto3 commands for spin up, spin down, tags and such apis commands. The app currently is hosted by us, so you iam role won’t be really available

  
  
Posted one year ago

Regarding this one, as

mention, you can have a full iam role (without any credentials) in higher tiers, in the regular youll need the credentials for the auth using the boto3 commands for spin up, spin down, tags and such apis commands. The app currently is hosted by us, so you iam role won’t be really available

With it the new created instance will have the iam role associate to it too

  
  
Posted one year ago

Why does the autoscaler app then ask for AWS credentials? 🙂

  
  
Posted one year ago

also, when the AZ spec is left empty (it’s optional), machines fail to start with
Invalid type for parameter LaunchSpecification.Placement.AvailabilityZone, value: None, type: <class 'NoneType'>, valid types: <class 'str'>

  
  
Posted one year ago

ok, I misread. The launch code runs in the SaaS, but it uses credentials to launch machines in our cloud. What stops it then from specifying an IAM role existing in our cloud? Isn’t this just an API call?

  
  
Posted one year ago

Do you mean assigning an IAM role to an ec2 instance it launches?

  
  
Posted one year ago

Hi SuccessfulKoala55 , just wondering – you mentioned the open-source autoscaler code version, but where is it hosted? I’ve only found https://github.com/allegroai/clearml/blob/master/examples/services/aws-autoscaler/aws_autoscaler.py , but it looks like code to launch a managed auto-scaler instead 🙂

  
  
Posted one year ago

Thank you for the reply, Alon and looking into the issue. I’m not speaking of the app/launcher app itself inheriting IAM permissions by assuming the role – this is absolutely understandable as it’s in your cloud. Contrary, when launching an AWS auto-scaler app, what stops adding ‘IAM Role’ to the list of parameters of the machine group? I can’t wrap my head around this.

  
  
Posted one year ago

what stops adding ‘IAM Role’ to the list of parameters of the machine group? I can’t wrap my head around this.

Actually this option is also available with the iam role version mention above (you can specify Arn or Name for the role)

  
  
Posted one year ago

Yup, seems like it. Which Server version are you using?

  
  
Posted one year ago

If you're using recent versions, you can use the queues.get_num_entries endpoint to obtain the number of pending entries (jobs) in a queue, instead of pulling the entire queue contents

  
  
Posted one year ago

Oh, well, the SaaS supports that as well 🙂

  
  
Posted one year ago

Thank you for explaining. As we are implementing a custom auto-scaler for the machines, it looks like the queue length - or a rate of messages falling into the queue - are the best indicators for scaling up or down.

  
  
Posted one year ago

Yes. A pre-created one.

  
  
Posted one year ago

What do you mean by "managed"?

  
  
Posted one year ago

We don't, we use the SaaS.
Exactly, that is the call I implemented and wrapped it into some serverless code to export data to CloudWatch.

  
  
Posted one year ago

How else would it spin new instances in your account? 🙂

  
  
Posted one year ago
639 Views
23 Answers
one year ago
one year ago
Tags