
Reputation
Badges 1
89 × Eureka!Our current setup is one clearml agent per GPU on the same machine
Code to enqueue
from clearml import Task
task = Task.create(
script="script.py",
docker="ultralytics/ultralytics:latest",
docker_args=["--network=host", "--ipc=host", "--shm_size=55G"],
)
task.enqueue(task, "default")
@<1523701070390366208:profile|CostlyOstrich36> thank you for your help in advance
@<1523701070390366208:profile|CostlyOstrich36> do you have any ideas?
I think it might be related to the new run overwriting in this location
WebApp: 1.16.0-494 • Server: 1.16.0-494 • API: 2.30
Hey yes it's self deployed
Using docker="ultralytics/ultralytics:latest"
and docker_args=["--privileged"]
seems to work!
We are using allegroai/clearml:latest
API server
to achieve running both the agent and the deployment on the same machine, adding --network=host to the run arguments solved it!
Hey, yes I can see machine statistics on the experiments themselves
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
�Traceback (most recent call last):
File "/root/.clearml/venvs-builds/3.10/task_repository/script.py", line 36, in <module>
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/multiprocessing/queues.py", line 244, in _feed
obj = _ForkingPickler.dumps(obj)
File "/opt/conda/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protoco...
Trying this:
clearml_dataset = Dataset.get(
dataset_id=config.get("dataset_id"), alias=config.get("dataset_alias")
)
dataset_dir = clearml_dataset.get_local_copy()
destination_dir = os.path.join("/datasets", os.path.basename(dataset_dir))
shutil.copytree(dataset_dir, destination_dir)
results = model.train(
data=destination_dir + "/data.yaml", epochs=config.get("epochs"), device=0
)
@<1523701070390366208:profile|CostlyOstrich36> same error now 😞
Environment setup completed successfully
Starting Task Execution:
/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/torch/cuda/__init__.py:128: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11020). Please update your GPU driver by downloading and installing a new version from the URL:
Alternatively, go to:
to install a PyTo...
How to replicate on ClearML:
task = Task.create(
script="myscript.py",
packages=["opencv-python==4.6.*", "ultralytics"],
docker="nvcr.io/nvidia/pytorch:22.12-py3",
)
Contents of myscript.py:from ultralytics import YOLO
pip install ultralytics --no-deps
would also work. Is there a way to pass this to clearML?
It did work on clearml on prem with docker_args=["--network=host", "--ipc=host"]
Resetting and enqueuing task which has built successfully also fails 😞
As I get a bunch of these warnings in both of the clones that failed
How are you getting:
beautifulsoup4 @ file:///croot/beautifulsoup4-split_1681493039619/work
This comes with the docker image ultralytics/ultralytics:latest
agent.package_manager.pip_version=""
"Original PIP" is empty as for this task we can rely on the docker image to provide the python packages
Setting agent.venvs_cache
path
back to ~/.clearml/venvs-cache
seems to have done the trick!
Thank you for your help @<1523701205467926528:profile|AgitatedDove14>
Container nvcr.io/nvidia/pytorch:22.12-py3
Thank you so much for your help @<1523701205467926528:profile|AgitatedDove14> !
@<1523701205467926528:profile|AgitatedDove14> if we go with the ultralytics case:
INSTALLED PACKAGES for working manual execution
absl-py==2.1.0
albucore==0.0.13
albumentations==1.4.14
anaconda-anon-usage @ file:///croot/anaconda-anon-usage_1710965072196/work
annotated-types==0.7.0
anyio==4.4.0
archspec @ file:///croot/archspec_1709217642129/work
astor==0.8.1
asttokens @ file:///opt/conda/conda-bld/asttokens_1646925590279/work
astunparse==1.6.3
attrs @ file:///croot/attrs_169571782329...