Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

Hi, in one of my agents with CUDA Version: 11.1 (from nvidia-smi), clearml agent 0.17.1 detects version 100 (I can see from experiments logs: agent.cuda_version = 100 ). Then it downloads wheels accordingly to this wrong version
... Package(s) not found: torch Warning, could not locate PyTorch torch>=1.7 matching CUDA version 100, best candidate 1.0.0 Torch CUDA 92 download page found Trying PyTorch CUDA version 92 support Warning, could not locate PyTorch torch>=1.7 matching CUDA version 92, best candidate 1.0.0 Found PyTorch version torch>=1.7 matching CUDA version 92 Collecting torch==1.7.1+cu92 Downloading (577.3 MB) Saved ./.clearml/pip-download-cache/cu100/torch-1.7.1+cu92-cp36-cp36m-linux_x86_64.whl Successfully downloaded torch ...I probably can fix that by hardcoding agent.cuda_version = 110 in clearml.conf, right? Is there something to fix in the agent?

  
  
Posted 3 years ago
Votes Newest

Answers 30


JitteryCoyote63 the agent.cuda_version (or CUDA_VERSION env) tell the agent which pytorch wheel to download. CUDNN library can be included inside any wheel and it will work as long as the cuda / cudart exist on the system, for example pytorch wheels include the cudnn they use . agent.cudnn_version should actually be deprecated, and is not actually used.

For future reference, dependency order:
Nvidia Drivers CUDA library and CUDA-runtime libraries (libcuda.so / libcudart.so) CUDNN library
(1) & (2) are usually system installed (or docker installed), (3) can have multiple versions in different locations (i.e. inside python packages)
If you are using dockers you can control (2) as it will be part of the docker.

  
  
Posted 3 years ago

cudnn isn't cuda, it's a separate library.
are you running on docker on bare metal? you should have cuda installed at /usr/local/cuda-<>

  
  
Posted 3 years ago

I am running on bare metal, and cuda seems to be installed at /usr/lib/x86_64-linux-gnu/libcuda.so.460.39

  
  
Posted 3 years ago

Ok, this I cannot locate

  
  
Posted 3 years ago

yes

  
  
Posted 3 years ago

yes what happens in the case of the installation with pip wheels files?

  
  
Posted 3 years ago

ok yea now I see it

  
  
Posted 3 years ago

so you dont have cuda installed 🙂

  
  
Posted 3 years ago

libcudart

  
  
Posted 3 years ago

JitteryCoyote63 I still don't understand what is the actual CUDA version you are using on your machine

  
  
Posted 3 years ago

this is the cuda driver api. you need libcudart.so

  
  
Posted 3 years ago

and with this setup I can use GPU without any problem, meaning that the wheel does contain the cuda runtime

  
  
Posted 3 years ago

can you initialize a tensor on the GPU?

  
  
Posted 3 years ago

ExcitedFish86 I have several machines with different cuda driver/runtime versions, that I why you might be confused as I am referring to one or another 🙂

  
  
Posted 3 years ago

just to be clear, multiple CUDA runtime version can coexist on a single machine, and the only thing that points to which one you are using when running an application are the library search paths (which can be set either with LD_LIBRARY_PATH , or, preferably, by creating a file under /etc/ld.so.conf.d/ which contains the path to your cuda directory and executing ldconfig )

  
  
Posted 3 years ago

try:
sudo updatedb locate libcudart

  
  
Posted 3 years ago

the conda sets up cuda I think

  
  
Posted 3 years ago

But I can do:
` $ python

import torch
torch.cuda.is_available()
True
torch.backends.cudnn.version()
8005 `

  
  
Posted 3 years ago

thanks for clarifying! Maybe this could be clarified in the agent logs of the experiments with something like the following?
agent.cuda_driver_version = ... agent.cuda_runtime_version = ...

  
  
Posted 3 years ago

Interesting idea! (I assume for reporting only, not configuration)

Yes for reporting only - Also to understand which version is used by the agent to define the torch wheel downloaded

regrading the cuda check with

nvcc

, I'm not saying this is a perfect solution, I just mentioned that this is how this is currently done.
I'm actually not sure if there is an easy way to get it from nvidia-smi interface, worth checking though ...

Ok, but when nvcc is not available, the agent uses the output from nvidia-smi right? On one of my machine, nvcc is not installed and in the experiment logs of the agent runnin there, agent.cuda = is the version shown with nvidia-smi

  
  
Posted 3 years ago

and the agent says agent.cudnn_version = 0

  
  
Posted 3 years ago

agent.cuda_driver_version = ...
agent.cuda_runtime_version = ...

Interesting idea! (I assume for reporting only, not configuration)

... The agent mentionned used output from nvcc (2) ...

The dependencies I shared are not how the agent works, but how Nvidia CUDA works 🙂
regrading the cuda check with nvcc , I'm not saying this is a perfect solution, I just mentioned that this is how this is currently done.
I'm actually not sure if there is an easy way to get it from nvidia-smi interface, worth checking though ...

  
  
Posted 3 years ago

From my experience, I only installed cuda drivers on my machines. I didn't used conda to install torch nor cudatoolkit, I just let clearml-agent download the torch wheel file and install it

  
  
Posted 3 years ago

AgitatedDove14 According to the dependency order you shared, the original message of this thread isn't solved: the agent mentionned used output from nvcc (2) before checking the nvidia driver version (1)

  
  
Posted 3 years ago

I also did run sudo apt install nvidia-cuda-toolkit

  
  
Posted 3 years ago

note that the cuda driver was only recently added to nvidia-smi

  
  
Posted 3 years ago

because I cannot locate libcudart or because cudnn_version = 0?

  
  
Posted 3 years ago

I am still confused though - from the get started page of pytorch website, when choosing "conda", the generated installation command includes cudatoolkit, while when choosing "pip" it only uses a wheel file.
Does that mean the wheel file contains cudatoolkit (cuda runtime)?

  
  
Posted 3 years ago

Ok, but when 

nvcc

 is not available, the agent uses the output from 

nvidia-smi

 right? On one of my machine, 

nvcc

 is not installed and in the experiment logs of the agent runnin there, 

agent.cuda =

 is the version shown with 

nvidia-smi

Already added to the next agent's version 😉

  
  
Posted 3 years ago

now I can do nvcc --version and I get
Cuda compilation tools, release 10.1, V10.1.243

  
  
Posted 3 years ago