BTW, what about running trains-agent in docker mode? That can solve all your cuda issues
Hi  TimelyPenguin76
you are right, it written cuda version 10.2 (even though I installed only cuda 10.1, weird)
do you know why it's 10.2?
and do you know why trains count on that? (instead of looking in the python environment of the executed script?)
Didnt use it so far, but I will start 🙂
Hi RattySeagull0 ,
If not specified, the values are taken from  nvidia-smi  for cuda_version, can you share you output for  nvidia-smi ?
Actually you can, when you clone an experiment, in the  EXECUTION   section ,  you can change the  BASE DOCKER IMAGE  to the image you like the experiment to run with. This way you can use different docker images for different experiments.
You can use the same queue :)
I can give it a shot (I'm using conda now) what is the overhead of going into dockers with the fact that I dont have "docker hands on experience"?
when my system was "clean" I installed cuda 10.1 (never installed cuda 10.2) hope i'm not mistaken
You changed the version from 10.2 to 10.1 and  nvidia-smi  output is the same? did you do a restart after the change?
is there a guide regarding the configuration required for dockers?
Yes we do have a guide: https://github.com/allegroai/trains-agent#starting-the-trains-agent-in-docker-mode
You can also specified the image for the docker, in the example the image is  nvidia/cuda but you can put a specific one for your needs (maybe  nvidia/cuda:10.1-runtime-ubuntu18.04 ?
I can give it a shot (I’m using conda now) what is the overhead of going into dockers with the fact that I dont have “docker hands on experience”?
You don’t really need “docker hands on experience”
is the flow using dockers is more supported than conda?
Its the same flow, but running inside a docker image
Is it something that I can config from the call to task.init? (my goal is that I wont be required to change in manualy)
got it thanks!
Is it possible to use different dockers (containing different cuda versions) in different experiments?
or I have to open different queues for that? (or something like that)
The version of the cudatoolkit is 10.1 inside the experiment, and trains try to work with 10.2, probably because the same reason it displays in the nvidia-smi
Ohhh I thought you changed it from 10.2 to 10.1, my mistake.
What do you get for  nvcc --version ?
How do you clone the tasks? with  Task.clone ? If so, you can use  cloned_task.set_base_docker(<VALUE FOR BASE DOCKER IMAGE>)
is the flow using dockers is more supported than conda? is there a guide regarding the configuration required for dockers?