what about the stacktrace of the error:Error: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2]
?
It doesn't help that the stacktrace isn't very verbose
For ClearML UI
2021-10-19 14:24:13 ClearML results page:
Spinning new instance type=aws4gpu ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring 2021-10-19 14:24:18 Error: Can not start new instance, Could not connect to the endpoint URL: "
" Spinning new instance type=aws4gpu 2021-10-19 14:24:28 Error: Can not start new instance, Could not connect to the endpoint URL: "
" Spinning new instance type=aws4gpu 2021-10-19 14:24:38 Error: Can not start new instance, Could not connect to the endpoint URL: "
" Spinning new instance type=aws4gpu 2021-10-19 14:24:48 Error: Can not start new instance, Could not connect to the endpoint URL: "
" Spinning new instance type=aws4gpu 2021-10-19 14:24:53 Error: Can not start new instance, Could not connect to the endpoint URL: "
" 2021-10-19 14:27:15 ClearML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start
For local:
` AWS Autoscaler setup wizard
Follow the wizard to configure your AWS auto-scaler service.
Once completed, you will be able to view and change the configuration in the clearml-server web UI.
It means there is no need to worry about typos or mistakes :)
Load configurations from config file 'aws_autoscaler.yaml' [Y/n]? : Y
ClearML Task: overwriting (reusing) task id=d3406f8cc24742fe9227ea4266d8e0cd
ClearML results page:
Running AWS auto-scaler as a service
Execution log
ClearML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start `
Hi SuccessfulKoala55 who's the best person on the team to speak with?
Error: Can not start new instance, Could not connect to the endpoint URL: "
"
echo -e $(aws ssm --region=eu-west-2 get-parameter --name 'my-param' --with-decryption --query "Parameter.Value") | tr -d '"' > .env set -a source .env set +a git clone https://${PAT}@github.com/myrepo/toolbox.git mv .env toolbox/ cd toolbox/ docker-compose up -d --build docker exec -it $(docker-compose ps -q) clearml-agent daemon --detached --gpus 0 --queue default
Thanks JitteryCoyote63 , I'll double check the permissions of key/secrets and if no luck I'll check with the team
Try to spin up the instance of that type manually in that region to see if it is available
I make 2x in eu-west-2 on the AWS console but still no luck
I was having an issue with availability zone. I was using 'eu-west-2' instead of 'eu-west-2c'
I did that recently - what are you trying to do exactly?
RobustRat47 It can also simply be that the instance type you declared is not available in the zone you defined
I would probably leave it to the ClearML team to answer you, I am not using the UI app and for me it worked just well with different regions. Maybe check permissions of the key/secrets?
2021-10-19 14:19:07 Spinning new instance type=aws4gpu Error: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2] Spinning new instance type=aws4gpu ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring Error: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2] Spinning new instance type=aws4gpu Error: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2] Spinning new instance type=aws4gpu Error: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2] Spinning new instance type=aws4gpu Error: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2] 2021-10-19 14:22:07 ClearML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start 2021-10-19 14:23:08 User aborted: stopping task (1)
did you try with another availability zone?
When I run in the UI I get the following responseError: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2]
When I run programatically it just stalls and I don't get any read out
Sure, I'll check this out later in the week and get back to you
RobustRat47 to make sure this is not a configuration/instance limit/zone issue, I would try to launch such an instance using the AWS CLI in the specified zone
Spin up instance using AWS auto-scaler and use the init script to:
Get key-value pairs from AWS ssm and write to .env file clone private git repo build docker-image locally and use .env file during docker-compose enter container and spin up clearml-agent