Hi SuccessfulKoala55 , just wondering – you mentioned the open-source autoscaler code version, but where is it hosted? I’ve only found https://github.com/allegroai/clearml/blob/master/examples/services/aws-autoscaler/aws_autoscaler.py , but it looks like code to launch a managed auto-scaler instead 🙂
Hello Jake, sorry for the delay. I found the original auto scaler script and now understand more about how stuff works.
Can you please help understand how Clear assigns jobs to queues – will there be more than 1 job per machine at a single point in time? I do understand that if there are 2 agents running for each GPU, then a vacant agent will probably take a job from the queue. But what’s the general case here?
Thanks so much and sorry if this was already explained – feel free to point me to a doc.
Hi SparklingHedgehong28 , ClearML enqueues jobs/tasks to queues simply according to your request - when you enqueue using the UI, or from code. Any agent monitoring a queue (or more than one queue) can pull work from that queue (agents monitoring more than one queue use a round-robin scheme). In general, an agent will pull and run one job at a time. Multiple agents running on a single machine (each with a different GPU assignment) will pull and run one job at a time each.
An exception to this is the agent services mode, which is designed to spin several jobs in parallel - this mode is used in agents running CPU-only tasks that require low resources (the default services agent running as part of the server deployment is such an example)
An exception to this is the agen
Hi SparklingHedgehong28 ,
also, when the AZ spec is left empty (it’s optional), machines fail to start with
Invalid type for parameter LaunchSpecification.Placement.AvailabilityZone, value: None, type: <class 'NoneType'>, valid types: <class 'str'>Checking this issue, when not specifying the AZ it should use an available one, will keep you posted about it
ok, I misread. The launch code runs in the SaaS, but it uses credentials to launch machines in our cloud. What stops it then from specifying an IAM role existing in our cloud? Isn’t this just an API call?
Regarding this one, as SuccessfulKoala55 mention, you can have a full iam role (without any credentials) in higher tiers, in the regular youll need the credentials for the auth using the boto3 commands for spin up, spin down, tags and such apis commands. The app currently is hosted by us, so you iam role won’t be really available
what stops adding ‘IAM Role’ to the list of parameters of the machine group? I can’t wrap my head around this.
Actually this option is also available with the
iam role version mention above (you can specify Arn or Name for the role)
Regarding this one, as
mention, you can have a full iam role (without any credentials) in higher tiers, in the regular youll need the credentials for the auth using the boto3 commands for spin up, spin down, tags and such apis commands. The app currently is hosted by us, so you iam role won’t be really available
With it the new created instance will have the iam role associate to it too
Thank you for the reply, Alon and looking into the issue. I’m not speaking of the app/launcher app itself inheriting IAM permissions by assuming the role – this is absolutely understandable as it’s in your cloud. Contrary, when launching an AWS auto-scaler app, what stops adding ‘IAM Role’ to the list of parameters of the machine group? I can’t wrap my head around this.