Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Everyone, I Tried To Launch Experiments Using Conda With Different Cuda Versions, I Tried To Comment This Fields From The Trains.Conf File On The Remove Machine #Cuda_Version: 10.1 #Cudnn_Version: 7.0 But It Seems That When I Comment It (Like A

Hi everyone,
I tried to launch experiments using conda with different cuda versions, I tried to comment this fields from the trains.conf file on the remove machine
#cuda_version: 10.1
#cudnn_version: 7.0
but it seems that when I comment it (like above), trains set the versions by default to
agent.cuda_version = 102
agent.cudnn_version = 0
(this is taken from the logs of the run)
and then installed cudatoolkit 102 that collapsed the experiments

Is there a way to make trains go with the cudatoolkit that exist in the python environment that I executed the training script with?

  
  
Posted 4 years ago
Votes Newest

Answers 17


How do you clone the tasks? with Task.clone ? If so, you can use cloned_task.set_base_docker(<VALUE FOR BASE DOCKER IMAGE>)

  
  
Posted 4 years ago

weird, I will try to find why is that

  
  
Posted 4 years ago

Actually you can, when you clone an experiment, in the EXECUTION section , you can change the BASE DOCKER IMAGE to the image you like the experiment to run with. This way you can use different docker images for different experiments.

You can use the same queue :)

  
  
Posted 4 years ago

I can give it a shot (I'm using conda now) what is the overhead of going into dockers with the fact that I dont have "docker hands on experience"?

  
  
Posted 4 years ago

is there a guide regarding the configuration required for dockers?

Yes we do have a guide: https://github.com/allegroai/trains-agent#starting-the-trains-agent-in-docker-mode

You can also specified the image for the docker, in the example the image is nvidia/cuda but you can put a specific one for your needs (maybe nvidia/cuda:10.1-runtime-ubuntu18.04 ?

I can give it a shot (I’m using conda now) what is the overhead of going into dockers with the fact that I dont have “docker hands on experience”?

You don’t really need “docker hands on experience”

is the flow using dockers is more supported than conda?

Its the same flow, but running inside a docker image

  
  
Posted 4 years ago

when my system was "clean" I installed cuda 10.1 (never installed cuda 10.2) hope i'm not mistaken

  
  
Posted 4 years ago

You changed the version from 10.2 to 10.1 and nvidia-smi output is the same? did you do a restart after the change?

  
  
Posted 4 years ago

The version of the cudatoolkit is 10.1 inside the experiment, and trains try to work with 10.2, probably because the same reason it displays in the nvidia-smi

  
  
Posted 4 years ago

Ohhh I thought you changed it from 10.2 to 10.1, my mistake.

What do you get for nvcc --version ?

  
  
Posted 4 years ago

Is it something that I can config from the call to task.init? (my goal is that I wont be required to change in manualy)

  
  
Posted 4 years ago

Hi RattySeagull0 ,

If not specified, the values are taken from nvidia-smi for cuda_version, can you share you output for nvidia-smi ?

  
  
Posted 4 years ago

Didnt use it so far, but I will start 🙂

  
  
Posted 4 years ago

BTW, what about running trains-agent in docker mode? That can solve all your cuda issues

  
  
Posted 4 years ago

what do you mean change?

  
  
Posted 4 years ago

is the flow using dockers is more supported than conda? is there a guide regarding the configuration required for dockers?

  
  
Posted 4 years ago

got it thanks!
Is it possible to use different dockers (containing different cuda versions) in different experiments?
or I have to open different queues for that? (or something like that)

  
  
Posted 4 years ago

Hi TimelyPenguin76
you are right, it written cuda version 10.2 (even though I installed only cuda 10.1, weird)
do you know why it's 10.2?
and do you know why trains count on that? (instead of looking in the python environment of the executed script?)

  
  
Posted 4 years ago