Reputation
Badges 1
89 × Eureka!Code to enqueue
from clearml import Task
task = Task.create(
script="script.py",
docker="ultralytics/ultralytics:latest",
docker_args=["--network=host", "--ipc=host", "--shm_size=55G"],
)
task.enqueue(task, "default")
@<1523701070390366208:profile|CostlyOstrich36> I don't think it's related to disk, I think it's related to shm
It's hanging at
Installing collected packages: zipp, importlib-resources, rpds-py, pkgutil-resolve-name, attrs, referencing, jsonschema-specifications, jsonschema, certifi, urllib3, idna, charset-normalizer, requests, pyparsing, PyYAML, six, pathlib2, orderedmultidict, furl, pyjwt, psutil, python-dateutil, platformdirs, distlib, filelock, virtualenv, clearml-agent
Successfully installed PyYAML-6.0.2 attrs-23.2.0 certifi-2024.7.4 charset-normalizer-3.3.2 clearml-agent-1.8.1 distlib-0.3....
I can install on the server with this command
This one seems to be compatible: [nvcr.io/nvidia/pytorch:22.04-py3](http://nvcr.io/nvidia/pytorch:22.04-py3)
I have set agent.package_manager.pip_version="" which resolved that message
I can install the correct torch version with this command:pip install --pre torchvision --force-reinstall --index-url ` None ```
But the process is still hanging, and not proceeding to actually running the clearml task
Collecting pip<20.2
Using cached pip-20.1.1-py2.py3-none-any.whl (1.5 MB)
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 20.0.2
Not uninstalling pip at /usr/lib/python3/dist-packages, outside environment /usr
Can't uninstall 'pip'. No files were found to uninstall.
WebApp: 1.16.0-494 • Server: 1.16.0-494 • API: 2.30
@<1523701070390366208:profile|CostlyOstrich36> do you have any ideas?
Thank you for getting back to me
to achieve running both the agent and the deployment on the same machine, adding --network=host to the run arguments solved it!
What I dont understand is how to tell clearml to install this version of pytorch and torchvision, with cu118
Hi @<1523701070390366208:profile|CostlyOstrich36> I am not specifying a version 🙂
Isn't the problem that CUDA 12 is being installed?
@<1523701070390366208:profile|CostlyOstrich36> same error now 😞
Environment setup completed successfully
Starting Task Execution:
/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/torch/cuda/__init__.py:128: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11020). Please update your GPU driver by downloading and installing a new version from the URL:
Alternatively, go to:
to install a PyTo...
Using docker="ultralytics/ultralytics:latest" and docker_args=["--privileged"] seems to work!
pip install ultralytics --no-deps would also work. Is there a way to pass this to clearML?
I think it might be related to the new run overwriting in this location
I have set agent{cuda_version: 11.2}
