Reputation
Badges 1
15 × Eureka!indeed, i managed to make a docker run
command to work with the fix you mentioned ( docker run --gpus '"device=1,2"' nvidia/cuda:9.0-base nvidia-smi
) but trains-agent
just appends to --gpus device=
and there is no way to make the quoting like this
Docker version 19.03.7, build 7141c199a2
on Linux, btw
what i can say is that when tasks are running locally the task name can have spaces, when executed remotely they cannot. I tired to remove the spaces in a remote execution and the artifacts are linked without problems (in both cases they are created just fine on GCS, it's just a matter of linking them in the Server UI)
task name, at the end, if it helps
yes, looks like. Is it possible?
ok, i got the problem, it isn't really related to spaces or local vs remote. It is the presence of characters like !
. Indeed the artifacts on GCS are created converting !
to %21
and are tracked succesfully like this on the server. When the request is sent to actually download the artifacts or to see pictures in Debug samples the %21
is converted back to !
and there is no such object in GCS with !
. Hope it's clear. Not a big deal to me, can just avoid spe...
Sounds odd...
Whats the exact project/task name?
And what is the output_uri?
project_name="allegro_mnist_tree_git"
, task_name="Run from CD + FS"
the output_uri isn't set, but the fileserver is set to the GCS location in trains.conf
and indeed the artifacts and the metrics are correctly stored where supposed to be
thanks!
wrt 1 and 3: my bad, i had too high expectations for the default Docker image 🙂 , thought it was ready to run tensorflow out of the box, but apparently it isn't. I managed to run my rounds with another image.
wrt 2: yes, i already changed the package_manager
to conda
and added tensorflow-gpu
as dependency, as i do in my local environment, but the environment that is created doesn't have access to the GPUs, as the other one does. How can i set the base python versi...
yes, in general, i want to control the behavior of git clone
. Is it possible?
Are you working with venv or docker mode?
sorry, important info! Docker mode
Also notice that is you need all gpus you can passÂ
--gpus all
yes, i know, but i need to use 2 out of 4 for a queue