Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi All, After Solving My Multiprocessing Issue I'Ve Found The Following Issue: I Have A Machine With 2 Gpus. Starting An Agent There Specifying

Hi all,
After solving my multiprocessing issue I've found the following issue:
I have a machine with 2 GPUs. Starting an agent there specifying --gpus all didn't work and it used the CPU instead as when I checked torch.cuda.is_available() the output was false. It does work when specifying --gpus 0,1 so I guess it's a bug in the agent. the agent used was 0.14.2rc2

  
  
Posted 3 years ago
Votes Newest

Answers 7


AgitatedDove14 I'm using that code in the meanwhile
` ### This script checks the number of GPUs, create a list like 0,1,2...

Then adds '--gpus' before that list of GPUs

NUM_GPUS=nvidia-smi -L | wc -l
NUM_GPUS=$(($NUM_GPUS-1))
OUT=()
if [ $NUM_GPUS -ge 0 ]
then
for i in $(seq 0 $NUM_GPUS); do OUT+=( "$i" ); done
echo ${OUT[*]// /|} | tr ' ' ',' | awk '{print "--gpus "$1}'
else
echo ""
fi `

  
  
Posted 3 years ago

p.s. any chance you can get me the nvidia driver version? I can't seem to find the one for v22 on amazon

  
  
Posted 3 years ago

PompousBeetle71 , These are cuda versions, I'm looking for the nvidia driver version for example 440.xx or 418.xx .
The reason is, we set an OS environment for the driver, and I remember that old drivers did not support it . Basically they do not support NVIDIA_VISIBLE_DEVICES=all , so I'm trying to see if that's the case, then we could add fix .

  
  
Posted 3 years ago

AgitatedDove14 I can't try the new agent at the moment, the OS is Ubuntu 18.04 more specifically: amazon/Deep Learning Base AMI (Ubuntu 18.04) Version 22.0 and no docker. Running on the machine.

  
  
Posted 3 years ago

PompousBeetle71 , what you are saying that for some reason the --gpus all will not configure the Nvidia drivers to use all the gpus, when running bare metal (i.e no docker). Did I understand you correctly ?

  
  
Posted 3 years ago

AgitatedDove14 yes, you're right. it was 10.2 or 10.1 if I recall.

  
  
Posted 3 years ago

PompousBeetle71 could you try trains-agent 0.15.0rc0 ? What's the OS you are using? Are you running in docker mode, if so, what's the docker version?

  
  
Posted 3 years ago
682 Views
7 Answers
3 years ago
one year ago
Tags