in the meantime, I got this error message, this time regarding Trains:
Good morning Alon, since you helped me so much getting tensorboard to show results yesterday, I'm hoping you can help me understand why some results I'm getting are strange:
Thanks for letting me know, I'd be very happy to update.
Hi MinuteWalrus85 ,
You are right and this is an observation about the reports on your experiment.
Onvce trains can't detect reports sending by the script, it will fallback to report iterations according to const time, until reports will be detected.
Hi MinuteWalrus85 ,
Do you have tensorboard
installed too?
I installed trains
, fastai
, tensorboard
and tensorboardx
and run a simple example, can be view in this link -
https://demoapp.trains.allegro.ai/projects/bf5c5ffa40304b2dbef7bfcf915a7496/experiments/e0b68d0fe80a4ff6be332690c0d968be/execution
TRAINS Monitor: Could not detect iteration reporting, falling back to iterations as seconds-from-start TRAINS Monitor: Reporting detected, reverting back to iteration based reporting
yes, that solved the errors, however the two lines "could not detect iteration reporting" and "reporting detected" a few moments later, still show up
The valid_loss and Accuracy are showing on the Tboard in the same number values as they show up on the terminal, but the train_loss is showing in a different scale and I can't figure out why. I did not change anything in the core files of either torc, Tboard or fastai, and used the intialization in the same way that you showed, and was on fastai docs, using learn.callback_fns.append(partial(LearnerTensorboardWriter, base_dir=tboard_path, name=taskName))
it isn't an error, but just an observation
the train_loss is on the second from left column (the far left is epoch num 30-36)
Understood. If there is something I can tweak in the reporting, I couldn't find where I tweak it since it is supposed to be related to the one line of activation of the reporting learn.callback_fns.append(partial(LearnerTensorboardWriter, base_dir=tboard_path, name=taskName))
do you have any ideas what are the options I can do to change the report of the train_loss?
no, I meant to change the way it is reported. I'm still interested in the train_loss graph, naturally 🙂 but obviously it is reporting something that is the inverse of the train_loss, since in the graph it is exploding, and in reality (as reported in the terminal) it is decaying to 9e-2
MinuteWalrus85 didn’t success to reproduce it, can you share with me your experiment (without any data, just how to reproduce)? We can continue in DM if you like
change the report of the train_loss?
Are you referring for not sending the train_loss
results?
I will try and let you know in the next experiment
` Traceback (most recent call last):
File "/home/ubuntu/MultiClassLabeling/myenv/lib/python3.6/site-packages/torch/utils/tensorboard/init.py", line 2, in <module>
from tensorboard.summary.writer.record_writer import RecordWriter # noqa F401
File "/home/ubuntu/MultiClassLabeling/myenv/lib/python3.6/site-packages/trains/binding/import_bind.py", line 59, in __patched_import3
level=level)
ModuleNotFoundError: No module named 'tensorboard'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/home/ubuntu/MultiClassLabeling/myenv/lib/python3.6/site-packages/fastai/callbacks/tensorboard.py", line 234, in _queue_processor
request.write()
File "/home/ubuntu/MultiClassLabeling/myenv/lib/python3.6/site-packages/fastai/callbacks/tensorboard.py", line 424, in write
self.tbwriter.add_graph(model=self.model, input_to_model=self.input_to_model)
File "/home/ubuntu/MultiClassLabeling/myenv/lib/python3.6/site-packages/tensorboardX/writer.py", line 793, in add_graph
from torch.utils.tensorboard._pytorch_graph import graph
File "/home/ubuntu/MultiClassLabeling/myenv/lib/python3.6/site-packages/trains/binding/import_bind.py", line 59, in __patched_import3
level=level)
File "/home/ubuntu/MultiClassLabeling/myenv/lib/python3.6/site-packages/torch/utils/tensorboard/init.py", line 4, in <module>
raise ImportError('TensorBoard logging requires TensorBoard with Python summary writer installed. '
ImportError: TensorBoard logging requires TensorBoard with Python summary writer installed. This should be available in 1.14 or above. `
MinuteWalrus85 thanks for the screenshot, asking about TB dashboard to understand where the issue is coming from.
Trains is patching TB stats and showing it to you in the web-app, so if the results are the same in TB dashboard, the reporting of the values can be wrong, if the TB dashboard and the web-app have different results, there can be an issue with web-app reporting
Hi MinuteWalrus85 , Good morning 🌞
What do you get when view the results with Tensorboard dashboard?
(you can view those with tensorboard --logdir=<tboard_path>
)
This is the valin_loss, which is correct: