(I use trains-agent 0.16.1 and trains 0.16.2)
JitteryCoyote63
I am setting up a new machine with two rtx 3070 GPU
Nice! you are one of the lucky few who managed to buy them 🙂
Which makes me think that the wrong torch package is installed
I think that torch 1.3.1 is does not support cuda 11 😞
Hi AgitatedDove14 , coming by after a few experiments this morning:
Indeed torch 1.3.1 does not support cuda, I tried with 1.7.0 and it worked, BUT trains was not able to pick the right wheel when I updated the torch req from 1.3.1 to 1.7.0: It downloaded wheel for cuda version 101. But in the experiment log, the agent correctly reported the cuda version (111). I then replaced the torch==1.7.0 with the direct https link to the torch wheel for cuda 110, and that worked (I also tried specifying torch==1.7, but this failed)
See here:
https://download.pytorch.org/whl/torch_stable.html
cu110/* has no torch 1.3.1 only 1.7.0
trains was not able to pick the right wheel when I updated the torch req from 1.3.1 to 1.7.0: It downloaded wheel for cuda version 101.
Could you send a log, it should have worked 😞
JitteryCoyote63 any chance you have a log of the failed torch 1.7.0 ?
Also, from https://lambdalabs.com/blog/install-tensorflow-and-pytorch-on-rtx-30-series/ :
As of 11/6/2020, you can't pip/conda install a TensorFlow or PyTorch version that runs on NVIDIA's RTX 30 series GPUs (Ampere). These GPUs require CUDA 11.1, and the current TensorFlow/PyTorch releases aren't built against CUDA 11.1. Right now, getting these libraries to work with 30XX GPUs requires manual compilation or NVIDIA docker containers.
But what wheel is downloading trains in that case?