Didnt use it so far, but I will start 🙂
How do you clone the tasks? with Task.clone
? If so, you can use cloned_task.set_base_docker(<VALUE FOR BASE DOCKER IMAGE>)
Is it something that I can config from the call to task.init? (my goal is that I wont be required to change in manualy)
Actually you can, when you clone an experiment, in the EXECUTION
section , you can change the BASE DOCKER IMAGE
to the image you like the experiment to run with. This way you can use different docker images for different experiments.
You can use the same queue :)
got it thanks!
Is it possible to use different dockers (containing different cuda versions) in different experiments?
or I have to open different queues for that? (or something like that)
is there a guide regarding the configuration required for dockers?
Yes we do have a guide: https://github.com/allegroai/trains-agent#starting-the-trains-agent-in-docker-mode
You can also specified the image for the docker, in the example the image is nvidia/cuda
but you can put a specific one for your needs (maybe nvidia/cuda:10.1-runtime-ubuntu18.04
?
I can give it a shot (I’m using conda now) what is the overhead of going into dockers with the fact that I dont have “docker hands on experience”?
You don’t really need “docker hands on experience”
is the flow using dockers is more supported than conda?
Its the same flow, but running inside a docker image
is the flow using dockers is more supported than conda? is there a guide regarding the configuration required for dockers?
I can give it a shot (I'm using conda now) what is the overhead of going into dockers with the fact that I dont have "docker hands on experience"?
BTW, what about running trains-agent in docker mode? That can solve all your cuda issues
The version of the cudatoolkit is 10.1 inside the experiment, and trains try to work with 10.2, probably because the same reason it displays in the nvidia-smi
Ohhh I thought you changed it from 10.2 to 10.1, my mistake.
What do you get for nvcc --version
?
when my system was "clean" I installed cuda 10.1 (never installed cuda 10.2) hope i'm not mistaken
You changed the version from 10.2 to 10.1 and nvidia-smi
output is the same? did you do a restart after the change?
Hi TimelyPenguin76
you are right, it written cuda version 10.2 (even though I installed only cuda 10.1, weird)
do you know why it's 10.2?
and do you know why trains count on that? (instead of looking in the python environment of the executed script?)
Hi RattySeagull0 ,
If not specified, the values are taken from nvidia-smi
for cuda_version, can you share you output for nvidia-smi
?