Hi All, I Have A Broad Question On How A

Answered

hi all, I have a broad question on how a trains-agent deals with the environment, mainly the CUDA libraries. On my local machine i use conda and i managed to have GPUs correctly utilised just with conda install tensorflow-gpu . I installed trains-agent inside this conda env via pip but when i run trains-agent daemon --gpus all a new venv is created and when i use pip for installing dependencies the GPUs are not utilised. Same goes if i switch pip to conda and try to install tensorflow-gpu hardcoding it in the Installed packages .
TL; DR which is the quickest way to have the GPUs of the worker correctly used by Tensorflow when a task in enqueued in a worker?

In addition to this, i am also making experiments using --docker , using nvidia/cuda:10.1-runtime-ubuntu18.04 as base image. Also in this case, installing tensorflow-gpu via pip doesn't expose them to the training script. Any best practice for exposing GPUs in a worker in docker mode?

Thanks

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					OutrageousGrasshopper93
				
					0
					 × 1

Votes Newest

Answers 3

OutrageousGrasshopper93
tensorflow-gpu is not needed, it will convert tensorflow to tensorflow-gpu based on the detected cuda version (you can see it in the summary configuration when the experiment sins inside the docker)

How can i set the base python version for the newly created conda env?

You mean inside the docker ?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

thanks!
wrt 1 and 3: my bad, i had too high expectations for the default Docker image 🙂 , thought it was ready to run tensorflow out of the box, but apparently it isn't. I managed to run my rounds with another image.
wrt 2: yes, i already changed the package_manager to conda and added tensorflow-gpu as dependency, as i do in my local environment, but the environment that is created doesn't have access to the GPUs, as the other one does. How can i set the base python version for the newly created conda env?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					OutrageousGrasshopper93
				
					0
					 × 1

Hi OutrageousGrasshopper93
which framework are you using? trains-agent will pull the correct torch based on the cuda version it detects, but no such thing for TF the default venv mode, trains-agent creates a new venv for the experiment (not conda) then everything is installed there. If you need conda you need to change the package_manager to conda: https://github.com/allegroai/trains-agent/blob/de332b9e6b66a2e7c6736d12614de9870eff48bc/docs/trains.conf#L49 The safest way to control CUDA drivers / frameworks is to sue dockers, then you can select the correct docker image for you, inside the docker the agent will clone the code, and install your packages, so you get the benefit of broth worlds, (controlling the packages on the one hand and selecting the cuda drivers on the other)What do you think?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

957 Views

3 Answers

4 years ago

one year ago