Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I Went Through This Slack'S History And The Problem Already Popped Up A Couple Of Times But Doesn'T Look Like Solved. On My Machine I Currently Have 4 Gpus, No Problems If I Want To Allocate All 4 Or Just 1 Using

hi, i went through this Slack's history and the problem already popped up a couple of times but doesn't look like solved. On my machine i currently have 4 GPUs, no problems if I want to allocate all 4 or just 1 using trains-agent , I am having problems when i try to allocate 2. If i run
trains-agent daemon --gpus 0,1 [...] i receive:
Error response from daemon: cannot set both Count and DeviceIDs on device request.I tried some of the fixes proposed in this Slack like ( --gpus "0,1" ) but none works, if i run plain Docker i need a weird combination of quotes to make it work
docker run -it --gpus '"device=0,1"' tensorflow/tensorflow:latest-gpu bash , but apparently cannot be recreated using --gpus via trains-agent that just append to the --gpus device= args. Anyone managed to make trains work with multiple GPUs (but not all )? thanks

  
  
Posted 3 years ago
Votes Newest

Answers 15


Okay, I'll make sure we always qoute " , since it seems to work either way.
We will release an RC soon, with this fix.
Sounds good?

  
  
Posted 3 years ago

Hi OutrageousGrasshopper93
Are you working with venv or docker mode?
Also notice that is you need all gpus you can pass --gpus all

  
  
Posted 3 years ago

Okay, checking...

  
  
Posted 3 years ago

BTW:

Error response from daemon: cannot set both Count and DeviceIDs on device request.

Googling it points to a docker issue (which makes sense considering):
https://github.com/NVIDIA/nvidia-docker/issues/1026
What is the host OS?

  
  
Posted 3 years ago

Also what is the docker vserion?

  
  
Posted 3 years ago

no, it's SUSE on a server, and bash

  
  
Posted 3 years ago

yes

  
  
Posted 3 years ago

Are you working with venv or docker mode?

sorry, important info! Docker mode

Also notice that is you need all gpus you can passĀ 

--gpus all

yes, i know, but i need to use 2 out of 4 for a queue

  
  
Posted 3 years ago

amazing and thanks! keep me posted

  
  
Posted 3 years ago

indeed, i managed to make a docker run command to work with the fix you mentioned ( docker run --gpus '"device=1,2"' nvidia/cuda:9.0-base nvidia-smi ) but trains-agent just appends to --gpus device= and there is no way to make the quoting like this

  
  
Posted 3 years ago

Docker version 19.03.7, build 7141c199a2 on Linux, btw

  
  
Posted 3 years ago

Are you using zsh by any chance?

  
  
Posted 3 years ago

Hmm, let me check something

  
  
Posted 3 years ago

Ubuntu? which version?

  
  
Posted 3 years ago

OutrageousGrasshopper93 is "--gpus all" working ?

  
  
Posted 3 years ago