Reputation
Badges 1
4 × Eureka!Hi Aleksei,
As I understood it there is a self hosted pro version (Enterprise), but that is not 15 USD / Month.
That being said we build a custom solution in AWS because in the beginning, we where not aware of the autoscaler (and after testing the autoscaler I am still not sure whether we will stick to our solution).
Basically we build a solution outside of the server Instance. A seperate Instance is polling the ClearML queues, spinning up Instances, installing the ClearML Agent on them which...
Looks like the bootstrap is broken, as the ami in the documentation is deprecated but there are some hard constraints on the image (I just used a basic amazon ami which failed with docker missing, etc.).
The https://github.com/allegroai/clearml/blob/master/examples/services/aws-autoscaler/aws_autoscaler.py is referencing to ami "ami-04c0416d6bd8e4b1f" which does not exist anymore (referencing to amis by ID might not be the best idea anyway, but this is a different stroy). So I used a plain amazon-linux-2 (ami-0b920b0594b5288fb) which lead to errors because of missing dependencies like docker.
Thanks for the fast response.
If understood you correctly I could set the credentials using env variables on e.g. the ClearML Server (If I use the "service" queue there) and omit them in the aws_autoscaler.yaml file? Wouldn't that make the autoscaler complain about missing credentials, if I don't mess around in the code?
Removing the AWS credentials from the aws_autoscaler.yaml and setting them as env Variables seems to work at least for the local version using the --run parameter. Took me a while because I needed to fiddle in the subnetid using the extra_configurations field which is not documented... 😄
But now I have encountered some funny behaviour. A worker node is scheduled and according to the autoscaler logs I would say it is assigned to the correct queue:
2022-11-18 14:34:34,590 - clearml.a...