Makes sense
So I assume, trains assumes I have nvidia-docker installed on the agent machine?
Moreover, since I'm going to use Task.execute_remotely
(and not through the UI) is there any code way to specify the docker image to be used?
Yep, the trains server is basically a docker-compose based service.
All you have to do is change the ports in the docker-compose.yml
file.
If you followed the instructions in the docs you should find that file in /opt/trains/docker-compose.yml
and then you will see that there are multiple services ( apiserver
, elasticsearch
, redis
etc.) and in each there might be a section called ports
which then states the mapping of the ports.
The number on the left, is ...
Okay, looks interesting but actually there is no final task, this is the pipeline layout
And yes, it makes perfect sense, thanks for the answer
I want to collect the dataframes from teh red tasks, and display them in the pipeline task
and the machine I have is 10.2.
I also tried nvidia/cuda:10.2-base-ubuntu18.04 which is the latest
AgitatedDove14
So nope, this doesn't solve my case, I'll explain the full use case from the beginning.
I have a pipeline controller task, which launches 30 tasks. Semantically there are 10 applications, and I run 3 tasks for each (those 3 are sequential, so in the UI it looks like 10 lines of 3 tasks).
In one of those 3 tasks that run for every app, I save a dataframe under the name "my_dataframe".
What I want to achieve is once all tasks are over, to collect all those "my_dataframe" arti...
I guess not many tensorflowers running agents around here if this wasn't brought up already
glad I managed to help back in some way
Okay, so if my python script imports some other scripts I've written - I must use git?
what if i want it to use ssh creds?
could be 192.168.1.255?
Thanks very much
Now something else is failing, but I'm pretty sure its on my side now... So have a good day and see you in the next question 😄
Oh I get it, I thought it is only a UI issue... but it actually doesn't send it O_O
I was sure you are on Israel times as well, sorry for the night time thing 😄
I assume trains passes it as is, so I think the quoting I mentioned might work
Thanks a lot, that clarifies things
this is the full one TimelyPenguin76
By the way, just inspecting, the CUDA version on the output of nvidia-smi
is matching the driver installed on the host, and not the container - look at the image below
This error just keeps coming back... I already made the watermarks like 0.5gb
Very nice thanks, I'm going to try the SA server + agents setup this week, let's see how it goes ✌