Hi, I Went Through This Slack'S History And The Problem Already Popped Up A Couple Of Times But Doesn'T Look Like Solved. On My Machine I Currently Have 4 Gpus, No Problems If I Want To Allocate All 4 Or Just 1 Using

Answered

hi, i went through this Slack's history and the problem already popped up a couple of times but doesn't look like solved. On my machine i currently have 4 GPUs, no problems if I want to allocate all 4 or just 1 using trains-agent , I am having problems when i try to allocate 2. If i run
trains-agent daemon --gpus 0,1 [...] i receive:
Error response from daemon: cannot set both Count and DeviceIDs on device request.I tried some of the fixes proposed in this Slack like ( --gpus "0,1" ) but none works, if i run plain Docker i need a weird combination of quotes to make it work
docker run -it --gpus '"device=0,1"' tensorflow/tensorflow:latest-gpu bash , but apparently cannot be recreated using --gpus via trains-agent that just append to the --gpus device= args. Anyone managed to make trains work with multiple GPUs (but not all )? thanks

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					OutrageousGrasshopper93
				
					0
					 × 1

Votes Newest

Answers 15

OutrageousGrasshopper93 is "--gpus all" working ?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

yes

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					OutrageousGrasshopper93
				
					0
					 × 1

Okay, checking...

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Ubuntu? which version?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi OutrageousGrasshopper93
Are you working with venv or docker mode?
Also notice that is you need all gpus you can pass --gpus all

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Are you working with venv or docker mode?

sorry, important info! Docker mode

Also notice that is you need all gpus you can pass

--gpus all

yes, i know, but i need to use 2 out of 4 for a queue

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					OutrageousGrasshopper93
				
					0
					 × 1

Also what is the docker vserion?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

BTW:

Error response from daemon: cannot set both Count and DeviceIDs on device request.

Googling it points to a docker issue (which makes sense considering):
https://github.com/NVIDIA/nvidia-docker/issues/1026
What is the host OS?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Okay, I'll make sure we always qoute " , since it seems to work either way.
We will release an RC soon, with this fix.
Sounds good?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

no, it's SUSE on a server, and bash

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					OutrageousGrasshopper93
				
					0
					 × 1

Hmm, let me check something

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

indeed, i managed to make a docker run command to work with the fix you mentioned ( docker run --gpus '"device=1,2"' nvidia/cuda:9.0-base nvidia-smi ) but trains-agent just appends to --gpus device= and there is no way to make the quoting like this

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					OutrageousGrasshopper93
				
					0
					 × 1

Docker version 19.03.7, build 7141c199a2 on Linux, btw

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					OutrageousGrasshopper93
				
					0
					 × 1

amazing and thanks! keep me posted

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					OutrageousGrasshopper93
				
					0
					 × 1

Are you using zsh by any chance?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

2K Views

15 Answers

5 years ago

2 years ago