Reputation
Badges 1
89 × Eureka!I make 2x in eu-west-2 on the AWS console but still no luck
`
import os
import glob
from clearml import Dataset
DATASET_NAME = "Bug"
DATASET_PROJECT = "ProjectFolder"
TARGET_FOLDER = "clearml_bug"
S3_BUCKET = os.getenv('S3_BUCKET')
if not os.path.exists(TARGET_FOLDER):
os.makedirs(TARGET_FOLDER)
with open(f'{TARGET_FOLDER}/data.txt', 'w') as f:
f.writelines('Hello, ClearML')
target_files = glob.glob(TARGET_FOLDER + "/**/*", recursive=True)
# upload dataset
dataset = Dataset.create(dataset_name=DATASET_NAME, dataset_project=DATASET_PR...
Hi SuccessfulKoala55 who's the best person on the team to speak with?
(deepmirror) ryan@ryan:~$ python -c "import clearml print(clearml.__version__)" 1.1.4
Trying to retrieve logs now 🙂 Yes I mean the machines are not accessible. Trying to figure what's going on
Not sure if it's a power outage services in London are working and Cambridge services are down 🤔 I'll keep you updated
Looks like it's picking up the projects but then viewing on the UI they disappear
I've got it... i just remembered I can calltask_id
from the cloned tasked and check the status of that 🙂
echo -e $(aws ssm --region=eu-west-2 get-parameter --name 'my-param' --with-decryption --query "Parameter.Value") | tr -d '"' > .env set -a source .env set +a git clone https://${PAT}@github.com/myrepo/toolbox.git mv .env toolbox/ cd toolbox/ docker-compose up -d --build docker exec -it $(docker-compose ps -q) clearml-agent daemon --detached --gpus 0 --queue default
It doesn't help that the stacktrace isn't very verbose
Can you try to go into 'Settings' -> 'Configuration' and verify that you have 'Show Hidden Projects' enabled
Same with new version(deepmirror) ryan@ryan:~/GitHub/deepmirror/ml-toolbox$ python -c "import clearml; print(clearml.__version__)" 1.6.1
Generating SHA2 hash for 1 files 100%|███████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2548.18it/s] Hash generation completed Uploading dataset changes (1 files compressed to 130 B) to BUCKET File compression and upload completed: total size 130 B, 1 chunked stored (average size 130 B)
Okay, I'm going to look into this further. We had around 70 volumes that were not deleted but could have been due to something else.
Sure, I'll check this out later in the week and get back to you
Okay great thanks SuccessfulKoala55
When I run in the UI I get the following responseError: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2]
When I run programatically it just stalls and I don't get any read out
Just for ref if anyone has this issue. I had to update my cuda drivers to 510 on system os
` docker run --gpus=0 -it nvcr.io/nvidia/tritonserver:22.02-py3
=============================
== Triton Inference Server ==
NVIDIA Release 22.02 (build 32400308)
Copyright (c) 2018-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
This container image and its contents are gove...
Still debugging.... That fixed the issue with the
nvcr.io/nvidia/tritonserver:22.02-py3
container which now returns
` =============================
== Triton Inference Server ==
NVIDIA Release 22.02 (build 32400308)
Copyright (c) 2018-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Co...
$ curl -X 'POST' '
' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{ "url": "
" }' {"digit":5}
I'm using "allegroai/clearml-serving-triton:latest" container I was just debugging using the base image
I'll add a more detailed response once it's working
This was the error I was getting from uploads using the old SDKhas been rejected for invalid domain. heap-2443312637.js:2:108655 Referrer Policy: Ignoring the less restricted referrer policy "no-referrer-when-downgrade" for the cross-site request:
The latest commit to the repo is 22.02-py3
( https://github.com/allegroai/clearml-serving/blob/d15bfcade54c7bdd8f3765408adc480d5ceb4b45/clearml_serving/engines/triton/Dockerfile#L2 ) I will have a look at versions now 🙂
Yes already tried that but it seems there's some form of mismatch with a C/C++ lib.
I was having an issue with availability zone. I was using 'eu-west-2' instead of 'eu-west-2c'
Yes, it's the dependencies. At the moment I'm doing this as a work around.
` autoscaler = AwsAutoScaler(hyper_params, configurations)
startup_bash_script = [
'...',
]
autoscaler.startup_bash_script = startup_bash_script ` I'd prefer to run it on the Web UI. Also, we seem to have problems when it's executed remotely