Reputation
Badges 1
89 × Eureka!Still debugging.... That fixed the issue with the
nvcr.io/nvidia/tritonserver:22.02-py3
container which now returns
` =============================
== Triton Inference Server ==
NVIDIA Release 22.02 (build 32400308)
Copyright (c) 2018-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Co...
Yes on the apps page is the possible to tigger programatically?
remote execution is working now. Internal worker nodes had not spun up the agent correctly 😛
Sure, I'll check this out later in the week and get back to you
Okay, I'm going to look into this further. We had around 70 volumes that were not deleted but could have been due to something else.
great thank you it's working. Just wanted to check before adding all env vars 🙂
Hi AgitatedDove14 ,
I noticed that ClearML parses clearml.automation.UniformParameterRange
to configuration space to be used with BOHB. When I've used BOHB previously I can use UniformFloatHyperparameter
from the configuration space package that allows me to set a parameter in logspace. That is the range is defended by something like numpy.logspace
rather than numpy.linspace
lmk if I can expand on this more 🙂
The latest commit to the repo is 22.02-py3
( https://github.com/allegroai/clearml-serving/blob/d15bfcade54c7bdd8f3765408adc480d5ceb4b45/clearml_serving/engines/triton/Dockerfile#L2 ) I will have a look at versions now 🙂
Thanks JitteryCoyote63 , I'll double check the permissions of key/secrets and if no luck I'll check with the team
From SuccessfulKoala55 suggestion
Yes already tried that but it seems there's some form of mismatch with a C/C++ lib.
Hi yes all sorted ! 🙂
gdn4.xlarge (the best price for 16GB of GPU ram). Not so surprising they would want a switch
I make 2x in eu-west-2 on the AWS console but still no luck
It doesn't help that the stacktrace isn't very verbose
so I don't think it's an access issue
echo -e $(aws ssm --region=eu-west-2 get-parameter --name 'my-param' --with-decryption --query "Parameter.Value") | tr -d '"' > .env set -a source .env set +a git clone https://${PAT}@github.com/myrepo/toolbox.git mv .env toolbox/ cd toolbox/ docker-compose up -d --build docker exec -it $(docker-compose ps -q) clearml-agent daemon --detached --gpus 0 --queue default
Error: Can not start new instance, Could not connect to the endpoint URL: "
"
(deepmirror) ryan@ryan:~$ python -c "import clearml print(clearml.__version__)" 1.1.4
`
2021-10-19 14:19:07
Spinning new instance type=aws4gpu
Error: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2]
Spinning new instance type=aws4gpu
ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring
Error: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2]
S...
For ClearML UI2021-10-19 14:24:13 ClearML results page:
Spinning new instance type=aws4gpu ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring 2021-10-19 14:24:18 Error: Can not start new instance, Could not connect to the endpoint URL: "
" Spinning new instance type=aws4gpu 2021-10-19 14:24:28 Error: Can not start new instance, Could not connect to the endpoint URL: "
` "
Spinning new instance type=aws4gpu
2021-10-19 14:24:38
Error: Can no...