Reputation
Badges 1
15 × Eureka!task name, at the end, if it helps
ok, i got the problem, it isn't really related to spaces or local vs remote. It is the presence of characters like !
. Indeed the artifacts on GCS are created converting !
to %21
and are tracked succesfully like this on the server. When the request is sent to actually download the artifacts or to see pictures in Debug samples the %21
is converted back to !
and there is no such object in GCS with !
. Hope it's clear. Not a big deal to me, can just avoid spe...
Sounds odd...
Whats the exact project/task name?
And what is the output_uri?
project_name="allegro_mnist_tree_git"
, task_name="Run from CD + FS"
the output_uri isn't set, but the fileserver is set to the GCS location in trains.conf
and indeed the artifacts and the metrics are correctly stored where supposed to be
yes, looks like. Is it possible?
Are you working with venv or docker mode?
sorry, important info! Docker mode
Also notice that is you need all gpus you can passÂ
--gpus all
yes, i know, but i need to use 2 out of 4 for a queue
yes, in general, i want to control the behavior of git clone
. Is it possible?
indeed, i managed to make a docker run
command to work with the fix you mentioned ( docker run --gpus '"device=1,2"' nvidia/cuda:9.0-base nvidia-smi
) but trains-agent
just appends to --gpus device=
and there is no way to make the quoting like this
thanks!
wrt 1 and 3: my bad, i had too high expectations for the default Docker image 🙂 , thought it was ready to run tensorflow out of the box, but apparently it isn't. I managed to run my rounds with another image.
wrt 2: yes, i already changed the package_manager
to conda
and added tensorflow-gpu
as dependency, as i do in my local environment, but the environment that is created doesn't have access to the GPUs, as the other one does. How can i set the base python versi...
Docker version 19.03.7, build 7141c199a2
on Linux, btw
what i can say is that when tasks are running locally the task name can have spaces, when executed remotely they cannot. I tired to remove the spaces in a remote execution and the artifacts are linked without problems (in both cases they are created just fine on GCS, it's just a matter of linking them in the Server UI)