Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hey Guys, I Am Setting Up A New Machine With Two Rtx 3070 Gpus Where I Created Two Agents (One For Each Gpu). On Both Agents, My Experiments Fail With Error:

Hey guys, I am setting up a new machine with two rtx 3070 GPUs where I created two agents (one for each GPU). On both agents, my experiments fail with error:
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Running nvidia-smi shows CUDA Version: 11.1 and in the experiments logs I can see that trains correctly downloaded and installed pytorch 1.3.1 in .../.trains/pip-download-cache/cu111/torch-1.3.1-cp36-cp36m-linux_x86_64.whl ,

But in the pip summary of installed packages section of the log I can see:
- 'torch==1.3.1 # 'Which makes me think that the wrong torch package is installed

  
  
Posted 3 years ago
Votes Newest

Answers 10


Send via PM 🙂

  
  
Posted 3 years ago

Many thanks!

  
  
Posted 3 years ago

Yes, I am preparing them 🙂

  
  
Posted 3 years ago

JitteryCoyote63 any chance you have a log of the failed torch 1.7.0 ?

  
  
Posted 3 years ago

trains was not able to pick the right wheel when I updated the torch req from 1.3.1 to 1.7.0: It downloaded wheel for cuda version 101.

Could you send a log, it should have worked 😞

  
  
Posted 3 years ago

Hi AgitatedDove14 , coming by after a few experiments this morning:
Indeed torch 1.3.1 does not support cuda, I tried with 1.7.0 and it worked, BUT trains was not able to pick the right wheel when I updated the torch req from 1.3.1 to 1.7.0: It downloaded wheel for cuda version 101. But in the experiment log, the agent correctly reported the cuda version (111). I then replaced the torch==1.7.0 with the direct https link to the torch wheel for cuda 110, and that worked (I also tried specifying torch==1.7, but this failed)

  
  
Posted 3 years ago

See here:
https://download.pytorch.org/whl/torch_stable.html
cu110/* has no torch 1.3.1 only 1.7.0

  
  
Posted 3 years ago

JitteryCoyote63

I am setting up a new machine with two rtx 3070 GPU

Nice! you are one of the lucky few who managed to buy them 🙂

Which makes me think that the wrong torch package is installed

I think that torch 1.3.1 is does not support cuda 11 😞

  
  
Posted 3 years ago

Also, from https://lambdalabs.com/blog/install-tensorflow-and-pytorch-on-rtx-30-series/ :

As of 11/6/2020, you can't pip/conda install a TensorFlow or PyTorch version that runs on NVIDIA's RTX 30 series GPUs (Ampere). These GPUs require CUDA 11.1, and the current TensorFlow/PyTorch releases aren't built against CUDA 11.1. Right now, getting these libraries to work with 30XX GPUs requires manual compilation or NVIDIA docker containers.

But what wheel is downloading trains in that case?

  
  
Posted 3 years ago

(I use trains-agent 0.16.1 and trains 0.16.2)

  
  
Posted 3 years ago
1K Views
10 Answers
3 years ago
one year ago
Tags