JitteryCoyote63 the agent.cuda_version
(or CUDA_VERSION env) tell the agent which pytorch wheel to download. CUDNN library can be included inside any wheel and it will work as long as the cuda / cudart exist on the system, for example pytorch wheels include the cudnn they use . agent.cudnn_version
should actually be deprecated, and is not actually used.
For future reference, dependency order:
Nvidia Drivers CUDA library and CUDA-runtime libraries (libcuda.so / libcudart.so) CUDNN library
(1) & (2) are usually system installed (or docker installed), (3) can have multiple versions in different locations (i.e. inside python packages)
If you are using dockers you can control (2) as it will be part of the docker.
cudnn isn't cuda, it's a separate library.
are you running on docker on bare metal? you should have cuda installed at /usr/local/cuda-<>
I am running on bare metal, and cuda seems to be installed at /usr/lib/x86_64-linux-gnu/libcuda.so.460.39
yes what happens in the case of the installation with pip wheels files?
because I cannot locate libcudart or because cudnn_version = 0?
From my experience, I only installed cuda drivers on my machines. I didn't used conda to install torch nor cudatoolkit, I just let clearml-agent download the torch wheel file and install it
I am still confused though - from the get started page of pytorch website, when choosing "conda", the generated installation command includes cudatoolkit, while when choosing "pip" it only uses a wheel file.
Does that mean the wheel file contains cudatoolkit (cuda runtime)?
I also did run sudo apt install nvidia-cuda-toolkit
JitteryCoyote63 I still don't understand what is the actual CUDA version you are using on your machine
this is the cuda driver api. you need libcudart.so
and with this setup I can use GPU without any problem, meaning that the wheel does contain the cuda runtime
AgitatedDove14 According to the dependency order you shared, the original message of this thread isn't solved: the agent mentionned used output from nvcc (2) before checking the nvidia driver version (1)
ExcitedFish86 I have several machines with different cuda driver/runtime versions, that I why you might be confused as I am referring to one or another 🙂
note that the cuda driver was only recently added to nvidia-smi
just to be clear, multiple CUDA runtime version can coexist on a single machine, and the only thing that points to which one you are using when running an application are the library search paths (which can be set either with LD_LIBRARY_PATH
, or, preferably, by creating a file under /etc/ld.so.conf.d/
which contains the path to your cuda directory and executing ldconfig
)
and the agent says agent.cudnn_version = 0
But I can do:
` $ python
import torch
torch.cuda.is_available()
True
torch.backends.cudnn.version()
8005 `
now I can do nvcc --version
and I getCuda compilation tools, release 10.1, V10.1.243
thanks for clarifying! Maybe this could be clarified in the agent logs of the experiments with something like the following?agent.cuda_driver_version = ... agent.cuda_runtime_version = ...
Ok, but whenÂ
nvcc
 is not available, the agent uses the output fromÂ
nvidia-smi
 right? On one of my machine,Â
nvcc
 is not installed and in the experiment logs of the agent runnin there,Â
agent.cuda =
 is the version shown withÂ
nvidia-smi
Already added to the next agent's version 😉
Interesting idea! (I assume for reporting only, not configuration)
Yes for reporting only - Also to understand which version is used by the agent to define the torch wheel downloaded
regrading the cuda check with
nvcc
, I'm not saying this is a perfect solution, I just mentioned that this is how this is currently done.
I'm actually not sure if there is an easy way to get it from nvidia-smi interface, worth checking though ...
Ok, but when nvcc
is not available, the agent uses the output from nvidia-smi
right? On one of my machine, nvcc
is not installed and in the experiment logs of the agent runnin there, agent.cuda =
is the version shown with nvidia-smi
agent.cuda_driver_version = ...
agent.cuda_runtime_version = ...
Interesting idea! (I assume for reporting only, not configuration)
... The agent mentionned used output from nvcc (2) ...
The dependencies I shared are not how the agent works, but how Nvidia CUDA works 🙂
regrading the cuda check with nvcc
, I'm not saying this is a perfect solution, I just mentioned that this is how this is currently done.
I'm actually not sure if there is an easy way to get it from nvidia-smi interface, worth checking though ...