@<1523701070390366208:profile|CostlyOstrich36> same error now 😞
Environment setup completed successfully
Starting Task Execution:
/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/torch/cuda/__init__.py:128: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11020). Please update your GPU driver by downloading and installing a new version from the URL:
Alternatively, go to:
to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
False
Traceback (most recent call last):
File "facility_classifier/test_gpu.py", line 8, in <module>
assert torch.cuda.is_available()
AssertionError
Hi @<1523701070390366208:profile|CostlyOstrich36> I am not specifying a version 🙂
ERROR: This container was built for NVIDIA Driver Release 530.30 or later, but
version 460.32.03 was detected and compatibility mode is UNAVAILABLE.
[[System has unsupported display driver / cuda driver combination (CUDA_ERROR_SYSTEM_DRIVER_MISMATCH) cuInit()=803]]
to achieve running both the agent and the deployment on the same machine, adding --network=host to the run arguments solved it!
This one seems to be compatible: [nvcr.io/nvidia/pytorch:22.04-py3](http://nvcr.io/nvidia/pytorch:22.04-py3)
I suggest running it in docker mode with a docker image that already has cuda installed
pip install --pre torchvision --force-reinstall --index-url
None
docker="nvidia/cuda:11.8.0-base-ubuntu20.04"
OK, then just try the docker image I suggested 🙂