data:image/s3,"s3://crabby-images/e5808/e58080445d5226013a2406e8ff15837448e4c94c" alt="Profile picture"
Reputation
Badges 1
89 × Eureka!Hi SuccessfulKoala55 I gave up after 20 mins and also got a notification from firefox "This page is slowing down Firefox. The speed up your browser, stop this page". I'm heading out soon so I could leave it on. Also, had the same behaviour in chrome.
Sure, I'll check this out later in the week and get back to you
echo -e $(aws ssm --region=eu-west-2 get-parameter --name 'my-param' --with-decryption --query "Parameter.Value") | tr -d '"' > .env set -a source .env set +a git clone https://${PAT}@github.com/myrepo/toolbox.git mv .env toolbox/ cd toolbox/ docker-compose up -d --build docker exec -it $(docker-compose ps -q) clearml-agent daemon --detached --gpus 0 --queue default
Hi SuccessfulKoala55 thanks I didn't know it was possible to use in place of the pw. So in the .conf I can just add the git PAT instead of pw?
git_user: ${GITHUB_USER} git_pass: ${GITHUB_PAT}
the agent it for replicating what you run locally elsewhere i.e. remote GPU machine
It doesn't help that the stacktrace isn't very verbose
Error: Can not start new instance, Could not connect to the endpoint URL: "
"
Spin up instance using AWS auto-scaler and use the init script to:
Get key-value pairs from AWS ssm and write to .env file clone private git repo build docker-image locally and use .env file during docker-compose enter container and spin up clearml-agent
Thanks JitteryCoyote63 , I'll double check the permissions of key/secrets and if no luck I'll check with the team
When I run in the UI I get the following responseError: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2]
When I run programatically it just stalls and I don't get any read out
nope you'll just need to install clearml
I was having an issue with availability zone. I was using 'eu-west-2' instead of 'eu-west-2c'
Hi SuccessfulKoala55 who's the best person on the team to speak with?
so I don't think it's an access issue
`
2021-10-19 14:19:07
Spinning new instance type=aws4gpu
Error: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2]
Spinning new instance type=aws4gpu
ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring
Error: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2]
S...
For ClearML UI2021-10-19 14:24:13 ClearML results page:
Spinning new instance type=aws4gpu ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring 2021-10-19 14:24:18 Error: Can not start new instance, Could not connect to the endpoint URL: "
" Spinning new instance type=aws4gpu 2021-10-19 14:24:28 Error: Can not start new instance, Could not connect to the endpoint URL: "
` "
Spinning new instance type=aws4gpu
2021-10-19 14:24:38
Error: Can no...
Not sure if it's a power outage services in London are working and Cambridge services are down 🤔 I'll keep you updated
Trying to retrieve logs now 🙂 Yes I mean the machines are not accessible. Trying to figure what's going on
Looks to be working 🚀 just need to test one more thing. Thank you CostlyOstrich36
Hey having a few issues with this
great thank you it's working. Just wanted to check before adding all env vars 🙂
okay so this could be a python script that generates the clearml.conf in the working dir in the container?
For referenceimport subprocess for i in ['1', '2']: command = ['python', 'hyp_op.py', '--testnum', f'{i}'] process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
Yep just about to do that. Just annoying to add arg parser etc