Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Could You Please Explain A Bit More How Trains Adapt The Torch Version Depending On The Installed Cuda Version? Here Is My Setup:

Could you please explain a bit more how trains adapt the torch version depending on the installed cuda version? Here is my setup:
cuda 102 installed and correctly detected by trains in my requirements: torch==1.3.1Since there is no wheel for this cuda version for this torch version, what will trains do?
What if I specify a direct wheel file as: torch @ https://download.pytorch.org/whl/cu101/torch-1.3.1-cp36-cp36m-linux_x86_64.whl , will trains handle it?

  
  
Posted 4 years ago
Votes Newest

Answers 30


I am using pip as a package manager, but i start the trains-agent inside a conda env 😄

  
  
Posted 4 years ago

Do you need to control the cuda drivers ?

  
  
Posted 4 years ago

No worries, condatoolkit is not part of it. "trains-agent" will create a new clean venv for every experiment, and by default it will not inherit the system packages.
So basically I think you are "stuck" with the cuda drivers you have on the system

  
  
Posted 4 years ago

Yes I agree, but I get a strange error when using dataloaders:
RuntimeError: [enforce fail at context_gpu.cu:323] error == cudaSuccess. 3 vs 0. Error at: /pytorch/caffe2/core/context_gpu.cu:323: initialization error
only when I use num_workers > 0

  
  
Posted 4 years ago

What happens is different error but it was so weird that I thought it was related to the version installed

  
  
Posted 4 years ago

no, using the system drivers

  
  
Posted 4 years ago

You can set torch to be installed last:
post_packages: ["horovod", "torch"]
Which will make sure the "trains-agent" version (the one you specified in the "installed packages" will be installed last.

  
  
Posted 4 years ago

hoo thats cool! I could place torch==1.3.1 there

  
  
Posted 4 years ago

What I mean is that I don't need to have cudatoolkit installed in the current conda env, right?

  
  
Posted 4 years ago

😞

  
  
Posted 4 years ago

What I mean is that I don't need to have cudatoolkit installed in the current conda env, right?

Wait, are you using conda as package manager ?
EDIT: meaning configured in trains.conf as package manager

  
  
Posted 4 years ago

alright I am starting to get a better picture of this puzzle

  
  
Posted 4 years ago

You can switch to docker-mode for better control over cuda drivers, or use conda and specify cudatoolkit (this feature will be part of the next RC, meanwhile it will install the cudatoolkit based on the global cuda_version).

  
  
Posted 4 years ago

Sorry, I didn't get that 😄

  
  
Posted 4 years ago

Not really: I just need to find the one that is compatible with torch==1.3.1

  
  
Posted 4 years ago

JitteryCoyote63 I think this only holds for the conda distribution.
(Actually quite interesting, I wonder what happens if you already installed cudatoolkit...)

  
  
Posted 4 years ago

My apologies, let me rephrase:
if you are using pip ans package manager and not running in docker-mode, trains-agent cannot touch the cuda/cuddn drivers (actually .so) library.
If you want to verify you can check echo $LD_LIBRARY_PATH

  
  
Posted 4 years ago

From the answers I saw on the internet, it is most likely related to the mismatch of cuda/cudnn version

  
  
Posted 4 years ago

From https://discuss.pytorch.org/t/please-help-me-understand-installation-for-cuda-on-linux/14217/4 it looks like my assumption is correct: there is no need for cudatoolkit to be installed since wheels already contain all cuda/cudnn libraries required by torch

  
  
Posted 4 years ago

i.e. change them per experiment ?

  
  
Posted 4 years ago

What probably happens is first torch is installed via "trains-agent", then it installs the other packages and they require a different version, so pip automatically replaces it.

  
  
Posted 4 years ago

That's why I suspected trains was installing a different version that the one I expected

  
  
Posted 4 years ago

I now have a different question: when installing torch from wheels files, I am guaranteed to have the corresponding cuda library and cudnn together right?

  
  
Posted 4 years ago

(obviously if you have dependencies, they will be installed before, and then the correct torch will be installed over the previous version

  
  
Posted 4 years ago

yes, that's also what I thought

  
  
Posted 4 years ago

if you have cuda 10.2, then the torch 1.3.1 from the cu101 version should work

  
  
Posted 4 years ago

Ho I see, I think we are now touching a very important point:
I thought that torch wheels already included cuda/cudnn libraries, so you don't need to care about the system cuda/cudnn version because eventually only the cuda/cudnn libraries extracted from the torch wheels were used. Is this correct? If not, then does that mean that one should use conda to install the correct cuda/cudnn cudatoolkit?

  
  
Posted 4 years ago

That was also my feeling! But I though that spawning the trains-agent from a conda env would isolate me from cuda drivers on the system

  
  
Posted 4 years ago

agent.package_manager.type = pip ... Using base prefix '/home/machine1/miniconda3/envs/py36' New python executable in /home/machine1/.trains/venvs-builds/3.6/bin/python3.6 Also creating executable in /home/machine1/.trains/venvs-builds/3.6/bin/python Installing setuptools, pip, wheel...

  
  
Posted 4 years ago