Hi @<1572032849320611840:profile|HurtRaccoon43> , I'd suggest trying this docker image: nvcr.io/nvidia/pytorch:23.03-py3
Seems, I found the issue. On macbook I got torch==2.1.0
in requirements.txt
. But on AWS P3 instance I get torch==2.1.0+cu121
after reinstallation and GPU works fine. Hope, now it will work in a docker container as well.
What specific compatibility issues are you getting?
Thank you for the reply @<1523701070390366208:profile|CostlyOstrich36> . I will try the image.
The initial issue was next:
CUDA initialization: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL:
Alternatively, go to:
to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
I identified that torch==2.1.0
is not compatible with nvidia/cuda:11.4.3-cudnn8-runtime-ubuntu20.04
image - it's default image provided by ClearML GPU Compute.
After that I tried the nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu20.04
and got next error:
UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
Googling the problem I found that it's usual pain to find compatible versions of cuda, pytorch and gpu. So, I need some advice how to resolve this compatibility issue to be able to use GPU power.