Unanswered
Hey!
I Would Like To Connect To Same Task From Multiple Consumer And Upload Debug Image.
Is It Possibile?
It Seems Like I Can Connect To The Task. Get The Logger But Nothing Is Uploaded.
FranticCormorant35 As far as I understand what you have going is a multi-node setup, that you manage yourself. Something like Horovod Torch distributed or any MPI setup. Since Trains support all of the above standard multi-node. The easiest way is to do the following:
On the master Node set OS environment:OMPI_COMM_WORLD_NODE_RANK=0
Then on any client node:OMPI_COMM_WORLD_NODE_RANK=unique_client_node_number
In all processes you can Call Task.init - with all the automagic kicking in. The Master node will be the only one registering the execution section of the experiment (i.e. git arg parser etc.) while all the rest will be logged as usual (console output, tensorboard matplotlib etc.)
How does that sound?
166 Views
0
Answers
4 years ago
one year ago
Tags