Reputation
Badges 1
89 × Eureka!Going for something like this:
` >>> queue = QueueMetrics(queue='queueid')
queue.avg_waiting_times `
I'll like to call Run Time
via the task object.... I think I need to calculate manually
i.e.
task = clearml.Task.get_task(id) time = task.data.last_update - task.data.started
` python upload_data_to_clearml_copy.py
Generating SHA2 hash for 1 files
100%|████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 733.91it/s]
Hash generation completed
0%| | 0/1 [00:00<?, ?it/s]
Compressing local files, chunk 1 [remaining 1 files]
100%|████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 538.77it/s]
File compression completed: t...
Same with new version(deepmirror) ryan@ryan:~/GitHub/deepmirror/ml-toolbox$ python -c "import clearml; print(clearml.__version__)" 1.6.1
Generating SHA2 hash for 1 files 100%|███████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2548.18it/s] Hash generation completed Uploading dataset changes (1 files compressed to 130 B) to BUCKET File compression and upload completed: total size 130 B, 1 chunked stored (average size 130 B)
Hi SuccessfulKoala55 yes I can see the one upload using 1.6.1 but all old datasets have now been remove. I guess you want people to start moving over?
`
import os
import glob
from clearml import Dataset
DATASET_NAME = "Bug"
DATASET_PROJECT = "ProjectFolder"
TARGET_FOLDER = "clearml_bug"
S3_BUCKET = os.getenv('S3_BUCKET')
if not os.path.exists(TARGET_FOLDER):
os.makedirs(TARGET_FOLDER)
with open(f'{TARGET_FOLDER}/data.txt', 'w') as f:
f.writelines('Hello, ClearML')
target_files = glob.glob(TARGET_FOLDER + "/**/*", recursive=True)
# upload dataset
dataset = Dataset.create(dataset_name=DATASET_NAME, dataset_project=DATASET_PR...
The latest commit to the repo is 22.02-py3
( https://github.com/allegroai/clearml-serving/blob/d15bfcade54c7bdd8f3765408adc480d5ceb4b45/clearml_serving/engines/triton/Dockerfile#L2 ) I will have a look at versions now 🙂
Okay just for clarity...
Originally, my Nvidia drivers were running on an incompatible version for the triton serverThis container was built for NVIDIA Driver Release 510.39 or later, but version 470.103.01 was detected and compatibility mode is UNAVAILABLE.
To fix this issue I updated the drivers on my base OS i.e.sudo apt install nvidia-driver-510 -y sudo reboot
Then it worked. The docker-compose logs from clearml-serving-triton
container did not make this clear (i.e. by r...
Yes already tried that but it seems there's some form of mismatch with a C/C++ lib.
I'm using "allegroai/clearml-serving-triton:latest" container I was just debugging using the base image
` client.queues.get_default()
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/opt/conda/lib/python3.9/site-packages/clearml/backend_api/session/client/client.py", line 378, in new_func
return Response(self.session.send(request_cls(*args, **kwargs)))
File "/opt/conda/lib/python3.9/site-packages/clearml/backend_api/session/client/client.py", line 122, in send
raise APIError(result)
clearml.backend_api.session.client.client.APIError: APIError: code 4...
I can raise this as an issue on the repo if that is useful?
Hi SuccessfulKoala55 I gave up after 20 mins and also got a notification from firefox "This page is slowing down Firefox. The speed up your browser, stop this page". I'm heading out soon so I could leave it on. Also, had the same behaviour in chrome.
Okay great thanks SuccessfulKoala55
great thank you it's working. Just wanted to check before adding all env vars 🙂
Yep figured this out yesterday. I had been tagging G type instances with an alarm as a fail safe if the AWS autoscaler failed. The alarm only stopped the instance and didn't terminate it (which deletes the drive). Thanks anyway CostlyOstrich36 and TimelyPenguin76 🙂
For ClearML UI2021-10-19 14:24:13 ClearML results page:
Spinning new instance type=aws4gpu ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring 2021-10-19 14:24:18 Error: Can not start new instance, Could not connect to the endpoint URL: "
" Spinning new instance type=aws4gpu 2021-10-19 14:24:28 Error: Can not start new instance, Could not connect to the endpoint URL: "
` "
Spinning new instance type=aws4gpu
2021-10-19 14:24:38
Error: Can no...
Error: Can not start new instance, Could not connect to the endpoint URL: "
"
Sure, I'll check this out later in the week and get back to you
I make 2x in eu-west-2 on the AWS console but still no luck