
Reputation
Badges 1
4 × Eureka!Thanks for the fast response.
If understood you correctly I could set the credentials using env variables on e.g. the ClearML Server (If I use the "service" queue there) and omit them in the aws_autoscaler.yaml file? Wouldn't that make the autoscaler complain about missing credentials, if I don't mess around in the code?
Hi Aleksei,
As I understood it there is a self hosted pro version (Enterprise), but that is not 15 USD / Month.
That being said we build a custom solution in AWS because in the beginning, we where not aware of the autoscaler (and after testing the autoscaler I am still not sure whether we will stick to our solution).
Basically we build a solution outside of the server Instance. A seperate Instance is polling the ClearML queues, spinning up Instances, installing the ClearML Agent on them which...
The https://github.com/allegroai/clearml/blob/master/examples/services/aws-autoscaler/aws_autoscaler.py is referencing to ami "ami-04c0416d6bd8e4b1f" which does not exist anymore (referencing to amis by ID might not be the best idea anyway, but this is a different stroy). So I used a plain amazon-linux-2 (ami-0b920b0594b5288fb) which lead to errors because of missing dependencies like docker.
Removing the AWS credentials from the aws_autoscaler.yaml and setting them as env Variables seems to work at least for the local version using the --run parameter. Took me a while because I needed to fiddle in the subnetid using the extra_configurations field which is not documented... 😄
But now I have encountered some funny behaviour. A worker node is scheduled and according to the autoscaler logs I would say it is assigned to the correct queue:
2022-11-18 14:34:34,590 - clearml.a...
Looks like the bootstrap is broken, as the ami in the documentation is deprecated but there are some hard constraints on the image (I just used a basic amazon ami which failed with docker missing, etc.).