I did that recently - what are you trying to do exactly?
Spin up instance using AWS auto-scaler and use the init script to:
Get key-value pairs from AWS ssm and write to .env file clone private git repo build docker-image locally and use .env file during docker-compose enter container and spin up clearml-agent
echo -e $(aws ssm --region=eu-west-2 get-parameter --name 'my-param' --with-decryption --query "Parameter.Value") | tr -d '"' > .env set -a source .env set +a git clone https://${PAT}@github.com/myrepo/toolbox.git mv .env toolbox/ cd toolbox/ docker-compose up -d --build docker exec -it $(docker-compose ps -q) clearml-agent daemon --detached --gpus 0 --queue default
When I run in the UI I get the following responseError: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2]
When I run programatically it just stalls and I don't get any read out
did you try with another availability zone?
Error: Can not start new instance, Could not connect to the endpoint URL: "
"
For ClearML UI
2021-10-19 14:24:13 ClearML results page:
Spinning new instance type=aws4gpu ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring 2021-10-19 14:24:18 Error: Can not start new instance, Could not connect to the endpoint URL: "
" Spinning new instance type=aws4gpu 2021-10-19 14:24:28 Error: Can not start new instance, Could not connect to the endpoint URL: "
" Spinning new instance type=aws4gpu 2021-10-19 14:24:38 Error: Can not start new instance, Could not connect to the endpoint URL: "
" Spinning new instance type=aws4gpu 2021-10-19 14:24:48 Error: Can not start new instance, Could not connect to the endpoint URL: "
" Spinning new instance type=aws4gpu 2021-10-19 14:24:53 Error: Can not start new instance, Could not connect to the endpoint URL: "
" 2021-10-19 14:27:15 ClearML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start
For local:
` AWS Autoscaler setup wizard
Follow the wizard to configure your AWS auto-scaler service.
Once completed, you will be able to view and change the configuration in the clearml-server web UI.
It means there is no need to worry about typos or mistakes :)
Load configurations from config file 'aws_autoscaler.yaml' [Y/n]? : Y
ClearML Task: overwriting (reusing) task id=d3406f8cc24742fe9227ea4266d8e0cd
ClearML results page:
Running AWS auto-scaler as a service
Execution log
ClearML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start `
what about the stacktrace of the error:Error: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2]
?
I would probably leave it to the ClearML team to answer you, I am not using the UI app and for me it worked just well with different regions. Maybe check permissions of the key/secrets?
2021-10-19 14:19:07 Spinning new instance type=aws4gpu Error: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2] Spinning new instance type=aws4gpu ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring Error: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2] Spinning new instance type=aws4gpu Error: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2] Spinning new instance type=aws4gpu Error: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2] Spinning new instance type=aws4gpu Error: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2] 2021-10-19 14:22:07 ClearML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start 2021-10-19 14:23:08 User aborted: stopping task (1)
Thanks JitteryCoyote63 , I'll double check the permissions of key/secrets and if no luck I'll check with the team
RobustRat47 It can also simply be that the instance type you declared is not available in the zone you defined
Try to spin up the instance of that type manually in that region to see if it is available
I make 2x in eu-west-2 on the AWS console but still no luck
It doesn't help that the stacktrace isn't very verbose
Hi SuccessfulKoala55 who's the best person on the team to speak with?
RobustRat47 to make sure this is not a configuration/instance limit/zone issue, I would try to launch such an instance using the AWS CLI in the specified zone
Sure, I'll check this out later in the week and get back to you
I was having an issue with availability zone. I was using 'eu-west-2' instead of 'eu-west-2c'