Thanks for the fast response.
If understood you correctly I could set the credentials using env variables on e.g. the ClearML Server (If I use the "service" queue there) and omit them in the aws_autoscaler.yaml file? Wouldn't that make the autoscaler complain about missing credentials, if I don't mess around in the code?
Looks like the bootstrap is broken, as the ami in the documentation is deprecated but there are some hard constraints on the image (I just used a basic amazon ami which failed with docker missing, etc.).
The https://github.com/allegroai/clearml/blob/master/examples/services/aws-autoscaler/aws_autoscaler.py is referencing to ami "ami-04c0416d6bd8e4b1f" which does not exist anymore (referencing to amis by ID might not be the best idea anyway, but this is a different stroy). So I used a plain amazon-linux-2 (ami-0b920b0594b5288fb) which lead to errors because of missing dependencies like docker.
Removing the AWS credentials from the aws_autoscaler.yaml and setting them as env Variables seems to work at least for the local version using the --run parameter. Took me a while because I needed to fiddle in the subnetid using the extra_configurations field which is not documented... 😄
But now I have encountered some funny behaviour. A worker node is scheduled and according to the autoscaler logs I would say it is assigned to the correct queue:
2022-11-18 14:34:34,590 - clearml.auto_scaler - INFO - Idle for 120.00 seconds
ClearML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start
2022-11-18 14:36:35,106 - clearml.auto_scaler - INFO - Found 1 tasks in queue 'autoscaler_test_machines'
2022-11-18 14:36:35,207 - clearml.auto_scaler - INFO - resources: {'AutoscalerTest': 'autoscaler_test_machines'}
2022-11-18 14:36:35,208 - clearml.auto_scaler - INFO - idle worker: {}
2022-11-18 14:36:35,208 - clearml.auto_scaler - INFO - up machines: defaultdict(<class 'int'>, {'AutoscalerTest': 1}
However, in the Web UI the worker does not show up and the task does not get picked up. Any idea what went wrong?
VexedStork84 I think this is planned for one of the next versions (masking in the UI).
You can also make sure the credentials are set on the machine running the autocaler using env vars - boto3 should be able to pick them up, I think
I'm not sure, but you can check - you can also fix that and submit a PR 🙂