ClearML FAQ | Hi Trains Community - Question: I Am Using

Answered

Hi Trains Community - Question: I Am Using

Hi Trains community - question: I am using http://Fast.ai on pycharm (not jupyter). I was wondering if there's a way to show a loss and accuracy metrics such as tensorboard style during training in Trains
. Right now it shows "NO CHART DATA" message under results/plots. (the graphs under scalaras (gpu and machine) are showing fine)

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MinuteWalrus85
				
					0
					 × 1

Votes Newest

Answers 34

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MinuteWalrus85
				
					0
					 × 1

Hi MinuteWalrus85 .

Good news about fastai , the integration in almost done and a version will be release in the coming days :)

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					TimelyPenguin76
				
					0
					 Administrator

Thanks for letting me know, I'd be very happy to update.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MinuteWalrus85
				
					0
					 × 1

in the meantime, I got this error message, this time regarding Trains:

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MinuteWalrus85
				
					0
					 × 1

` Traceback (most recent call last):
File "/home/ubuntu/MultiClassLabeling/myenv/lib/python3.6/site-packages/torch/utils/tensorboard/init.py", line 2, in <module>
from tensorboard.summary.writer.record_writer import RecordWriter # noqa F401
File "/home/ubuntu/MultiClassLabeling/myenv/lib/python3.6/site-packages/trains/binding/import_bind.py", line 59, in __patched_import3
level=level)
ModuleNotFoundError: No module named 'tensorboard'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/home/ubuntu/MultiClassLabeling/myenv/lib/python3.6/site-packages/fastai/callbacks/tensorboard.py", line 234, in _queue_processor
request.write()
File "/home/ubuntu/MultiClassLabeling/myenv/lib/python3.6/site-packages/fastai/callbacks/tensorboard.py", line 424, in write
self.tbwriter.add_graph(model=self.model, input_to_model=self.input_to_model)
File "/home/ubuntu/MultiClassLabeling/myenv/lib/python3.6/site-packages/tensorboardX/writer.py", line 793, in add_graph
from torch.utils.tensorboard._pytorch_graph import graph
File "/home/ubuntu/MultiClassLabeling/myenv/lib/python3.6/site-packages/trains/binding/import_bind.py", line 59, in __patched_import3
level=level)
File "/home/ubuntu/MultiClassLabeling/myenv/lib/python3.6/site-packages/torch/utils/tensorboard/init.py", line 4, in <module>
raise ImportError('TensorBoard logging requires TensorBoard with Python summary writer installed. '
ImportError: TensorBoard logging requires TensorBoard with Python summary writer installed. This should be available in 1.14 or above. `

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MinuteWalrus85
				
					0
					 × 1

MinuteWalrus85 checking it

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					TimelyPenguin76
				
					0
					 Administrator

then this:

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MinuteWalrus85
				
					0
					 × 1

TRAINS Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start TRAINS Monitor: Reporting detected, reverting back to iteration based reporting

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MinuteWalrus85
				
					0
					 × 1

Hi MinuteWalrus85 ,

Do you have tensorboard installed too?

I installed trains , fastai , tensorboard and tensorboardx and run a simple example, can be view in this link -
https://demoapp.trains.allegro.ai/projects/bf5c5ffa40304b2dbef7bfcf915a7496/experiments/e0b68d0fe80a4ff6be332690c0d968be/execution

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					TimelyPenguin76
				
					0
					 Administrator

i did not install tensorboard 😞

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MinuteWalrus85
				
					0
					 × 1

let me see

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MinuteWalrus85
				
					0
					 × 1

yes, that solved the errors, however the two lines "could not detect iteration reporting" and "reporting detected" a few moments later, still show up

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MinuteWalrus85
				
					0
					 × 1

it isn't an error, but just an observation

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MinuteWalrus85
				
					0
					 × 1

Hi MinuteWalrus85 ,

You are right and this is an observation about the reports on your experiment.
Onvce trains can't detect reports sending by the script, it will fallback to report iterations according to const time, until reports will be detected.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					TimelyPenguin76
				
					0
					 Administrator

Good morning Alon, since you helped me so much getting tensorboard to show results yesterday, I'm hoping you can help me understand why some results I'm getting are strange:

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MinuteWalrus85
				
					0
					 × 1

The valid_loss and Accuracy are showing on the Tboard in the same number values as they show up on the terminal, but the train_loss is showing in a different scale and I can't figure out why. I did not change anything in the core files of either torc, Tboard or fastai, and used the intialization in the same way that you showed, and was on fastai docs, using learn.callback_fns.append(partial(LearnerTensorboardWriter, base_dir=tboard_path, name=taskName))

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MinuteWalrus85
				
					0
					 × 1

Here is an example of the results from terminal:

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MinuteWalrus85
				
					0
					 × 1

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MinuteWalrus85
				
					0
					 × 1

the train_loss is on the second from left column (the far left is epoch num 30-36)

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MinuteWalrus85
				
					0
					 × 1

here is the result on the Tboard in Trains:

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MinuteWalrus85
				
					0
					 × 1

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MinuteWalrus85
				
					0
					 × 1

it is on the scale of 15K...

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MinuteWalrus85
				
					0
					 × 1

This is the valin_loss, which is correct:

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MinuteWalrus85
				
					0
					 × 1

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MinuteWalrus85
				
					0
					 × 1

Hi MinuteWalrus85 , Good morning 🌞

What do you get when view the results with Tensorboard dashboard?

(you can view those with tensorboard --logdir=<tboard_path> )

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					TimelyPenguin76
				
					0
					 Administrator

haven't tried yet

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MinuteWalrus85
				
					0
					 × 1

I will try and let you know in the next experiment

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MinuteWalrus85
				
					0
					 × 1

This is the whole Tboard

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MinuteWalrus85
				
					0
					 × 1

MinuteWalrus85 thanks for the screenshot, asking about TB dashboard to understand where the issue is coming from.

Trains is patching TB stats and showing it to you in the web-app, so if the results are the same in TB dashboard, the reporting of the values can be wrong, if the TB dashboard and the web-app have different results, there can be an issue with web-app reporting

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					TimelyPenguin76
				
					0
					 Administrator

Understood. If there is something I can tweak in the reporting, I couldn't find where I tweak it since it is supposed to be related to the one line of activation of the reporting learn.callback_fns.append(partial(LearnerTensorboardWriter, base_dir=tboard_path, name=taskName)) do you have any ideas what are the options I can do to change the report of the train_loss?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					MinuteWalrus85
				
					0
					 × 1

Show more results

Write your answer

23K Views

34 Answers

4 years ago

7 months ago