For ClearML UI
2021-10-19 14:24:13 ClearML results page:
Spinning new instance type=aws4gpu ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring 2021-10-19 14:24:18 Error: Can not start new instance, Could not connect to the endpoint URL: "
" Spinning new instance type=aws4gpu 2021-10-19 14:24:28 Error: Can not start new instance, Could not connect to the endpoint URL: "
" Spinning new instance type=aws4gpu 2021-10-19 14:24:38 Error: Can not start new instance, Could not connect to the endpoint URL: "
" Spinning new instance type=aws4gpu 2021-10-19 14:24:48 Error: Can not start new instance, Could not connect to the endpoint URL: "
" Spinning new instance type=aws4gpu 2021-10-19 14:24:53 Error: Can not start new instance, Could not connect to the endpoint URL: "
" 2021-10-19 14:27:15 ClearML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start
For local:
` AWS Autoscaler setup wizard
Follow the wizard to configure your AWS auto-scaler service.
Once completed, you will be able to view and change the configuration in the clearml-server web UI.
It means there is no need to worry about typos or mistakes :)
Load configurations from config file 'aws_autoscaler.yaml' [Y/n]? : Y
ClearML Task: overwriting (reusing) task id=d3406f8cc24742fe9227ea4266d8e0cd
ClearML results page:
Running AWS auto-scaler as a service
Execution log
ClearML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start `
Error: Can not start new instance, Could not connect to the endpoint URL: "
"
Sure, I'll check this out later in the week and get back to you
did you try with another availability zone?
I did that recently - what are you trying to do exactly?
echo -e $(aws ssm --region=eu-west-2 get-parameter --name 'my-param' --with-decryption --query "Parameter.Value") | tr -d '"' > .env set -a source .env set +a git clone https://${PAT}@github.com/myrepo/toolbox.git mv .env toolbox/ cd toolbox/ docker-compose up -d --build docker exec -it $(docker-compose ps -q) clearml-agent daemon --detached --gpus 0 --queue default
When I run in the UI I get the following responseError: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2]
When I run programatically it just stalls and I don't get any read out
I was having an issue with availability zone. I was using 'eu-west-2' instead of 'eu-west-2c'
RobustRat47 to make sure this is not a configuration/instance limit/zone issue, I would try to launch such an instance using the AWS CLI in the specified zone
what about the stacktrace of the error:Error: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2]
?
It doesn't help that the stacktrace isn't very verbose
I would probably leave it to the ClearML team to answer you, I am not using the UI app and for me it worked just well with different regions. Maybe check permissions of the key/secrets?
I make 2x in eu-west-2 on the AWS console but still no luck
Hi SuccessfulKoala55 who's the best person on the team to speak with?
Try to spin up the instance of that type manually in that region to see if it is available
Spin up instance using AWS auto-scaler and use the init script to:
Get key-value pairs from AWS ssm and write to .env file clone private git repo build docker-image locally and use .env file during docker-compose enter container and spin up clearml-agent
2021-10-19 14:19:07 Spinning new instance type=aws4gpu Error: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2] Spinning new instance type=aws4gpu ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring Error: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2] Spinning new instance type=aws4gpu Error: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2] Spinning new instance type=aws4gpu Error: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2] Spinning new instance type=aws4gpu Error: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2] 2021-10-19 14:22:07 ClearML Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start 2021-10-19 14:23:08 User aborted: stopping task (1)
Thanks JitteryCoyote63 , I'll double check the permissions of key/secrets and if no luck I'll check with the team
RobustRat47 It can also simply be that the instance type you declared is not available in the zone you defined