Reputation
Badges 1
89 × Eureka!Setting ultralytics workers=0 seems to work as per the thread above!
@<1523701070390366208:profile|CostlyOstrich36> I don't think it's related to disk, I think it's related to shm
DEBUG Installing build dependencies ... [?25l- \ | / - done
[?25h Getting requirements to build wheel ... [?25l- error
[1;31merror[0m: [1msubprocess-exited-with-error[0m
[31m×[0m [32mGetting requirements to build wheel[0m did not run successfully.
[31m│[0m exit code: [1;36m1[0m
[31m╰─>[0m [31m[21 lines of output][0m
[31m [0m Traceback (most recent call last):
[31m [0m File "/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_i...
It did work on clearml on prem with docker_args=["--network=host", "--ipc=host"]
We are using allegroai/clearml:latest
API server
Although that's not ideal as it turns off CPU parallelisation
[2024-08-13 16:56:36,447] [9] [INFO] [clearml.service_repo] Returned 200 for workers.get_activity_report in 342ms
[2024-08-13 16:56:36,462] [9] [INFO] [clearml.service_repo] Returned 200 for workers.get_activity_report in 261ms
WebApp: 1.16.0-494 • Server: 1.16.0-494 • API: 2.30
@<1523701070390366208:profile|CostlyOstrich36> thank you for your help in advance
Code to enqueue
from clearml import Task
task = Task.create(
script="script.py",
docker="ultralytics/ultralytics:latest",
docker_args=["--network=host", "--ipc=host", "--shm_size=55G"],
)
task.enqueue(task, "default")
Hey yes it's self deployed
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
�Traceback (most recent call last):
File "/root/.clearml/venvs-builds/3.10/task_repository/script.py", line 36, in <module>
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/multiprocessing/queues.py", line 244, in _feed
obj = _ForkingPickler.dumps(obj)
File "/opt/conda/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protoco...
@<1523701070390366208:profile|CostlyOstrich36> I'm now running the agent with --docker
, and I'm using task.create(docker="nvidia/cuda:11.0.3-cudnn8-runtime-ubuntu20.04")
But the process is still hanging, and not proceeding to actually running the clearml task
docker="nvidia/cuda:11.8.0-base-ubuntu20.04"
I have set agent{cuda_version: 11.2}
It's hanging at
Installing collected packages: zipp, importlib-resources, rpds-py, pkgutil-resolve-name, attrs, referencing, jsonschema-specifications, jsonschema, certifi, urllib3, idna, charset-normalizer, requests, pyparsing, PyYAML, six, pathlib2, orderedmultidict, furl, pyjwt, psutil, python-dateutil, platformdirs, distlib, filelock, virtualenv, clearml-agent
Successfully installed PyYAML-6.0.2 attrs-23.2.0 certifi-2024.7.4 charset-normalizer-3.3.2 clearml-agent-1.8.1 distlib-0.3....
ERROR: This container was built for NVIDIA Driver Release 530.30 or later, but
version 460.32.03 was detected and compatibility mode is UNAVAILABLE.
[[System has unsupported display driver / cuda driver combination (CUDA_ERROR_SYSTEM_DRIVER_MISMATCH) cuInit()=803]]
This has been resolved now! Thank you for your help @<1523701070390366208:profile|CostlyOstrich36>