Reputation
Badges 1
89 × Eureka!Okay, I'm going to look into this further. We had around 70 volumes that were not deleted but could have been due to something else.
` # dataset_class.py
from PIL import Image
from torch.utils.data import Dataset as BaseDataset
class Dataset(BaseDataset):
def __init__(
self,
images_fps,
masks_fps,
augmentation=None,
):
self.augmentation = augmentation
self.images_fps = images_fps
self.masks_fps = masks_fps
self.ids = len(images_fps)
def __getitem__(self, i):
# read data
img = Image.open(self.images_fps[i])
mask = Image...
For ClearML UI2021-10-19 14:24:13 ClearML results page: Spinning new instance type=aws4gpu ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring 2021-10-19 14:24:18 Error: Can not start new instance, Could not connect to the endpoint URL: " " Spinning new instance type=aws4gpu 2021-10-19 14:24:28 Error: Can not start new instance, Could not connect to the endpoint URL: " ` "
Spinning new instance type=aws4gpu
2021-10-19 14:24:38
Error: Can no...
` client.queues.get_default()
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/opt/conda/lib/python3.9/site-packages/clearml/backend_api/session/client/client.py", line 378, in new_func
return Response(self.session.send(request_cls(*args, **kwargs)))
File "/opt/conda/lib/python3.9/site-packages/clearml/backend_api/session/client/client.py", line 122, in send
raise APIError(result)
clearml.backend_api.session.client.client.APIError: APIError: code 4...
Can you try to go into 'Settings' -> 'Configuration' and verify that you have 'Show Hidden Projects' enabled
Yes, it's the dependencies. At the moment I'm doing this as a work around.
` autoscaler = AwsAutoScaler(hyper_params, configurations)
startup_bash_script = [
'...',
]
autoscaler.startup_bash_script = startup_bash_script ` I'd prefer to run it on the Web UI. Also, we seem to have problems when it's executed remotely
I can raise this as an issue on the repo if that is useful?
I'm sure it used to be in task.artifacts but that's returning an empty dict
prev_task.artifacts {}
nope you'll just need to install clearml
so I don't think it's an access issue
Going for something like this:
` >>> queue = QueueMetrics(queue='queueid')
queue.avg_waiting_times `
I'll like to call Run Time via the task object.... I think I need to calculate manually
i.e.
task = clearml.Task.get_task(id) time = task.data.last_update - task.data.started
Hi SuccessfulKoala55 I gave up after 20 mins and also got a notification from firefox "This page is slowing down Firefox. The speed up your browser, stop this page". I'm heading out soon so I could leave it on. Also, had the same behaviour in chrome.
Yep just about to do that. Just annoying to add arg parser etc
Not sure if it's a power outage services in London are working and Cambridge services are down 🤔 I'll keep you updated
Sure, I'll check this out later in the week and get back to you
I make 2x in eu-west-2 on the AWS console but still no luck
Looks like it's picking up the projects but then viewing on the UI they disappear
Same with new version(deepmirror) ryan@ryan:~/GitHub/deepmirror/ml-toolbox$ python -c "import clearml; print(clearml.__version__)" 1.6.1Generating SHA2 hash for 1 files 100%|███████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2548.18it/s] Hash generation completed Uploading dataset changes (1 files compressed to 130 B) to BUCKET File compression and upload completed: total size 130 B, 1 chunked stored (average size 130 B)
Spin up instance using AWS auto-scaler and use the init script to:
Get key-value pairs from AWS ssm and write to .env file clone private git repo build docker-image locally and use .env file during docker-compose enter container and spin up clearml-agent
Okay just for clarity...
Originally, my Nvidia drivers were running on an incompatible version for the triton serverThis container was built for NVIDIA Driver Release 510.39 or later, but version 470.103.01 was detected and compatibility mode is UNAVAILABLE.
To fix this issue I updated the drivers on my base OS i.e.sudo apt install nvidia-driver-510 -y sudo reboot
Then it worked. The docker-compose logs from clearml-serving-triton container did not make this clear (i.e. by r...
Okay great thanks SuccessfulKoala55
It might only be a req for the docker/docker-compose-triton-gpu.yml file but I'd need to check
the agent it for replicating what you run locally elsewhere i.e. remote GPU machine
When I run in the UI I get the following responseError: Can not start new instance, An error occurred (InvalidParameterValue) when calling the RunInstances operation: Invalid availability zone: [eu-west-2]When I run programatically it just stalls and I don't get any read out