Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi All, I Have A Broad Question On How A

hi all, I have a broad question on how a trains-agent deals with the environment, mainly the CUDA libraries. On my local machine i use conda and i managed to have GPUs correctly utilised just with conda install tensorflow-gpu . I installed trains-agent inside this conda env via pip but when i run trains-agent daemon --gpus all a new venv is created and when i use pip for installing dependencies the GPUs are not utilised. Same goes if i switch pip to conda and try to install tensorflow-gpu hardcoding it in the Installed packages .
TL; DR which is the quickest way to have the GPUs of the worker correctly used by Tensorflow when a task in enqueued in a worker?

In addition to this, i am also making experiments using --docker , using nvidia/cuda:10.1-runtime-ubuntu18.04 as base image. Also in this case, installing tensorflow-gpu via pip doesn't expose them to the training script. Any best practice for exposing GPUs in a worker in docker mode?

Thanks

  
  
Posted 3 years ago
Votes Newest

Answers 3


OutrageousGrasshopper93
tensorflow-gpu is not needed, it will convert tensorflow to tensorflow-gpu based on the detected cuda version (you can see it in the summary configuration when the experiment sins inside the docker)

How can i set the base python version for the newly created conda env?

You mean inside the docker ?

  
  
Posted 3 years ago

thanks!
wrt 1 and 3: my bad, i had too high expectations for the default Docker image 🙂 , thought it was ready to run tensorflow out of the box, but apparently it isn't. I managed to run my rounds with another image.
wrt 2: yes, i already changed the package_manager to conda and added tensorflow-gpu as dependency, as i do in my local environment, but the environment that is created doesn't have access to the GPUs, as the other one does. How can i set the base python version for the newly created conda env?

  
  
Posted 3 years ago

Hi OutrageousGrasshopper93
which framework are you using? trains-agent will pull the correct torch based on the cuda version it detects, but no such thing for TF the default venv mode, trains-agent creates a new venv for the experiment (not conda) then everything is installed there. If you need conda you need to change the package_manager to conda: https://github.com/allegroai/trains-agent/blob/de332b9e6b66a2e7c6736d12614de9870eff48bc/docs/trains.conf#L49 The safest way to control CUDA drivers / frameworks is to sue dockers, then you can select the correct docker image for you, inside the docker the agent will clone the code, and install your packages, so you get the benefit of broth worlds, (controlling the packages on the one hand and selecting the cuda drivers on the other)What do you think?

  
  
Posted 3 years ago
461 Views
3 Answers
3 years ago
one year ago
Tags