Hi, I Recently Started Evaluating Trains. Given That Tensorboard Is Much More Mature, And Our Team Is Used To It, I Think It Is Likely We Won’T Want To Stop Using Tensorboard Completely And Just Switch To Trains. But I Am Thinking It Could Be Pretty Use

Answered

Hi, I recently started evaluating trains. Given that TensorBoard is much more mature, and our team is used to it, I think it is likely we won’t want to stop using TensorBoard completely and just switch to trains. But I am thinking it could be pretty useful if trains had a feature where you could select a set of models and it would launch a tensorboard instance to serve the logs from that set of models. Is that something that could happen?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					LivelyLion31
				
					0
					 × 1

Votes Newest

Answers 4

Hi LivelyLion31
Yes, the reason we designed Trains with an automagic integration is exactly that reason, so users do not need to learn another package and that with almost no effort you get most of the benefits.
Regrading the TB files, from experience most users will use the TB files short after they executed the experiment, usually for debugging and in depth capabilities (like network debugger profile etc), metric view is something that is much easier to do on a centralized server (like on the Trains-Server).
So we could not find good uses cases for constantly storing the TB protobuf files on the backend (they are extremely large!).
That said you can always upload the TB protobuf as an artifact at the end of the experiment:
Task.current_task().upload_artifact('tensorboard', '/tmp/my.tensorboard_file/pb')

If you guys feel spinning a TB serving all the tensorboard is something you will use. You can quickly write a code that will do just that, and launch it with trains-agent. There is a nice example of using trains-agent as a way to spin a jupyter notebook that can server as a good reference:
https://github.com/allegroai/trains/blob/master/examples/execute_jupyter_notebook_server.py

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

OK, I will look into agents and think about this. One pain we have is that tensorboard logs are stuck on the machine used for training, and we can’t compare models training on two different machines in one tensorboard (unless they mount the same network filesystem). But it is also important to be able to see TB both during training and after it is finished (and even though the log files are large, storage is cheap, so maybe it would be OK to keep them around). I need to think about the best way to organize this though. For instance, maybe we should log logs directly to S3? We would still need some system for keeping track of where exactly they are and for launching tensorboard instances to show a given set of logs.

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					LivelyLion31
				
					0
					 × 1

We haven’t done anything about it yet. But we are planning to try out a few experiment management systems soon, including trains

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					LivelyLion31
				
					0
					 × 1

Hi LivelyLion31 I missed your S3 question, apologies. What did you guys end up doing?
BTW you could always upload the entire TB log folder as artifact, it's simple task.upload_artifact('tensorboard', './tblogsfolder')

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

2K Views

4 Answers

5 years ago

2 years ago