Reputation
Badges 1
89 × Eureka!It doesn't help that the stacktrace isn't very verbose
`
2021-10-19 14:19:07
Spinning new instance type=aws4gpu
Error: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2]
Spinning new instance type=aws4gpu
ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring
Error: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2]
S...
I was having an issue with availability zone. I was using 'eu-west-2' instead of 'eu-west-2c'
echo -e $(aws ssm --region=eu-west-2 get-parameter --name 'my-param' --with-decryption --query "Parameter.Value") | tr -d '"' > .env set -a source .env set +a git clone https://${PAT}@github.com/myrepo/toolbox.git mv .env toolbox/ cd toolbox/ docker-compose up -d --build docker exec -it $(docker-compose ps -q) clearml-agent daemon --detached --gpus 0 --queue default
Hi AgitatedDove14 ,
I noticed that ClearML parses clearml.automation.UniformParameterRange to configuration space to be used with BOHB. When I've used BOHB previously I can use UniformFloatHyperparameter from the configuration space package that allows me to set a parameter in logspace. That is the range is defended by something like numpy.logspace rather than numpy.linspace
great thank you it's working. Just wanted to check before adding all env vars 🙂
Still debugging.... That fixed the issue with the
nvcr.io/nvidia/tritonserver:22.02-py3 container which now returns
` =============================
== Triton Inference Server ==
NVIDIA Release 22.02 (build 32400308)
Copyright (c) 2018-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Co...
I've got it... i just remembered I can calltask_idfrom the cloned tasked and check the status of that 🙂
Yes already tried that but it seems there's some form of mismatch with a C/C++ lib.
$ curl -X 'POST' ' ' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{ "url": " " }' {"digit":5}
Hey having a few issues with this
(deepmirror) ryan@ryan:~$ python -c "import clearml print(clearml.__version__)" 1.1.4
Hi SuccessfulKoala55 yes I can see the one upload using 1.6.1 but all old datasets have now been remove. I guess you want people to start moving over?
Thanks JitteryCoyote63 , I'll double check the permissions of key/secrets and if no luck I'll check with the team
we normally do something like that - not sure what why it's freezing for you without more info
Hi SuccessfulKoala55 thanks I didn't know it was possible to use in place of the pw. So in the .conf I can just add the git PAT instead of pw?
git_user: ${GITHUB_USER} git_pass: ${GITHUB_PAT}
Hi SuccessfulKoala55 who's the best person on the team to speak with?