Hi OutrageousGrasshopper93
which framework are you using? trains-agent will pull the correct torch based on the cuda version it detects, but no such thing for TF the default venv mode, trains-agent creates a new venv for the experiment (not conda) then everything is installed there. If you need conda you need to change the package_manager to conda: https://github.com/allegroai/trains-agent/blob/de332b9e6b66a2e7c6736d12614de9870eff48bc/docs/trains.conf#L49 The safest way to control CUDA drivers / frameworks is to sue dockers, then you can select the correct docker image for you, inside the docker the agent will clone the code, and install your packages, so you get the benefit of broth worlds, (controlling the packages on the one hand and selecting the cuda drivers on the other)What do you think?
thanks!
wrt 1 and 3: my bad, i had too high expectations for the default Docker image 🙂 , thought it was ready to run tensorflow out of the box, but apparently it isn't. I managed to run my rounds with another image.
wrt 2: yes, i already changed the package_manager
to conda
and added tensorflow-gpu
as dependency, as i do in my local environment, but the environment that is created doesn't have access to the GPUs, as the other one does. How can i set the base python version for the newly created conda env?
OutrageousGrasshopper93tensorflow-gpu
is not needed, it will convert tensorflow to tensorflow-gpu based on the detected cuda version (you can see it in the summary configuration when the experiment sins inside the docker)
How can i set the base python version for the newly created conda env?
You mean inside the docker ?