
Reputation
Badges 1
30 × Eureka!This is the docker i created and it is not working
FROM pytorch/pytorch:2.2.1-cuda12.1-cudnn8-runtime
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
libgl1-mesa-glx \
libglib2.0-0 \
git \
&& rm -rf /var/lib/apt/lists/*
# Install dependencies
COPY requirements.txt /tmp/requirements.txt
RUN pip install --no-cache-dir -r /tmp/requirements.txt
# Update CA certificates
COPY hme_root_CA.crt /usr/local/share/ca-certificates/company_roo...
──────────────────────────────────────────────────────────────────────────────────────────────────────────
sdk.apply_environment = false [50/1825]
sdk.apply_files = false
Executing task id [6e73e0db9cb14aa8b91e0a5439a5aac0]:
repository =
branch =
version_num =
tag =
docker_cmd =
entry_point = train_clearml.py
working_dir = .
created virtual environment CPython3.10.13.final.0-64 in 151ms
creator CPython3Posix(dest=/root/.cl...
Hi @<1523701205467926528:profile|AgitatedDove14> thanks!
I talked with my boss and i could install clearml-agent directly in the training machine
But now when I try to run an experiment using clearml-agent daemon --gpus 0 --queue default --foreground --docker
It gets stall in this part
Installing collected packages: attrs, rpds-py, zipp, importlib-resources, referencing, jsonschema-specifications, pkgutil-resolve-name, jsonschema, psutil, six, filelock, distlib, platformdirs, virtual...
and with system_site_packages: true,
No, I'm not @<1523701070390366208:profile|CostlyOstrich36>
> clearml-data sync --project yolo_test --name test1 --folder test1
clearml-data - Dataset Management & Versioning CLI
Creating a new dataset:
2024-08-27 11:58:26,131 - clearml.storage - ERROR - Exception encountered while uploading Failed uploading object /yolo_test/.datasets/test1/test1.9a02f7a3a5924fadbbe0c4b827deafe0/artifacts/state/state.json (405): <html>
<head><title>405 Not Allowed</title></head>
<body>
<center><h1>405 Not Allowed</h1></center>
<hr><center>nginx/1.24.0</center>
...
@<1523701070390366208:profile|CostlyOstrich36> It looks great, but I would like to know specifically if it can read the bounding boxes in Yolo format (which is the images and labels folder separated, and labels in .txt format one label file for each image with the same name)
None
Hi @<1523701205467926528:profile|AgitatedDove14> at the end we make it works
It has a lot of warning but can run the experiments hahaha
Thanks for your help
Now it tells me 🫠
Storage helper problem for clearml/vehicle_detection/.datasets/round1/round1.452107f4e6784d77be9b9c4143255579/artifacts/state/state.json: Operation returned an invalid status 'The specified blob does not exist.'
ErrorCode:BlobNotFound
Could not get dataset with ID 452107f4e6784d77be9b9c4143255579: Could not load Dataset id=452107f4e6784d77be9b9c4143255579 state
I used that
I have permissions, I deleted some datasets
But there are some of them that gives me that error
Im doing this
Dataset.delete(
dataset_project="hme_vehicle_detection",
dataset_name="test1",
force=True,
entire_dataset=True,
delete_files=False
)
And I deleted the hole azure clearml folder
@<1523701070390366208:profile|CostlyOstrich36>
Thanks for your help! I will try to solve it with that
Now it tells me No projects were found with name(s): hme_vehicle_detection/.datasets/test1
dataset_id = "..."
Dataset.get(
dataset_id = dataset_id
)
Dataset.delete(
dataset_id = dataset_id,
force=True,
entire_dataset=True,
delete_files=False,
delete_external_files=False
)
But i wanna delete the dataset in the web
For now, yes, I'm using community server
I added these lines in the config file
api {
# Notice: 'host' is the api server (default port 8008), not the web server.
api_server:
web_server:
files_server:
}
....
azure.storage {
# max_connections: 2
containers: [
{
account_name: "account_name"
account_key: "***"
container_name: "container_name"
}
]
}
I tried with verify_certificate: False
I SOLVED IT!
THANKS FOR YOUR HELP
I deleted the azure folder
Dataset.delete(
dataset_id="708f4e2bb4354d58b79916a3db7f04c7",
force=True,
)
Used
other logs when it installs
Collecting pip<20.2
Using cached pip-20.1.1-py2.py3-none-any.whl (1.5 MB)
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pi...
Hi @<1523701070390366208:profile|CostlyOstrich36> at the end we make it works
It has a lot of warning but is able to run the experiments hahaha
THanks for your help
@<1576381444509405184:profile|ManiacalLizard2>
I changhed that
But it continues trying to install the next packages:
The following additional packages will be installed:
binutils binutils-common binutils-x86-64-linux-gnu build-essential cpp cpp-9
dpkg-dev fakeroot file g++ g++-9 gcc gcc-9 gcc-9-base git-man krb5-locales
less libalgorithm-diff-perl libalgorithm-diff-xs-p...
I just want to start again because I was learning
This is the last situation
I could create a docker that works
FROM pytorch/pytorch:2.2.1-cuda12.1-cudnn8-runtime
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
libgl1-mesa-glx \
libglib2.0-0 \
git \
curl\
&& rm -rf /var/lib/apt/lists/*
# Install dependencies
COPY requirements.txt /tmp/requirements.txt
RUN pip install --no-cache-dir -r /tmp/requirements.txt
# Update CA certificates
COPY hme_root_CA.crt /usr/local/share/ca-cer...
Hi @<1576381444509405184:profile|ManiacalLizard2> at the end we make it works
It has a lot of warning but is able to run the experiments hahaha
Thanks for your help
It works if i enter to the /bin/bash of the docker and run
clearml-agent daemon --queue default --foreground
But if I want to use in the local machine
clearml-agent daemon --queue default --docker --foreground
Setting the same docker works before
In that case, the problem appears and the programs get stuck
It could be something related with conda in the docker?