no, I meant to change the way it is reported. I'm still interested in the train_loss graph, naturally 🙂 but obviously it is reporting something that is the inverse of the train_loss, since in the graph it is exploding, and in reality (as reported in the terminal) it is decaying to 9e-2
` Traceback (most recent call last):
File "/home/ubuntu/MultiClassLabeling/myenv/lib/python3.6/site-packages/torch/utils/tensorboard/init.py", line 2, in <module>
from tensorboard.summary.writer.record_writer import RecordWriter # noqa F401
File "/home/ubuntu/MultiClassLabeling/myenv/lib/python3.6/site-packages/trains/binding/import_bind.py", line 59, in __patched_import3
level=level)
ModuleNotFoundError: No module named 'tensorboard'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/home/ubuntu/MultiClassLabeling/myenv/lib/python3.6/site-packages/fastai/callbacks/tensorboard.py", line 234, in _queue_processor
request.write()
File "/home/ubuntu/MultiClassLabeling/myenv/lib/python3.6/site-packages/fastai/callbacks/tensorboard.py", line 424, in write
self.tbwriter.add_graph(model=self.model, input_to_model=self.input_to_model)
File "/home/ubuntu/MultiClassLabeling/myenv/lib/python3.6/site-packages/tensorboardX/writer.py", line 793, in add_graph
from torch.utils.tensorboard._pytorch_graph import graph
File "/home/ubuntu/MultiClassLabeling/myenv/lib/python3.6/site-packages/trains/binding/import_bind.py", line 59, in __patched_import3
level=level)
File "/home/ubuntu/MultiClassLabeling/myenv/lib/python3.6/site-packages/torch/utils/tensorboard/init.py", line 4, in <module>
raise ImportError('TensorBoard logging requires TensorBoard with Python summary writer installed. '
ImportError: TensorBoard logging requires TensorBoard with Python summary writer installed. This should be available in 1.14 or above. `
Hi MinuteWalrus85 ,
Do you have tensorboard
installed too?
I installed trains
, fastai
, tensorboard
and tensorboardx
and run a simple example, can be view in this link -
https://demoapp.trains.allegro.ai/projects/bf5c5ffa40304b2dbef7bfcf915a7496/experiments/e0b68d0fe80a4ff6be332690c0d968be/execution
Thanks for letting me know, I'd be very happy to update.
it isn't an error, but just an observation
This is the valin_loss, which is correct:
The valid_loss and Accuracy are showing on the Tboard in the same number values as they show up on the terminal, but the train_loss is showing in a different scale and I can't figure out why. I did not change anything in the core files of either torc, Tboard or fastai, and used the intialization in the same way that you showed, and was on fastai docs, using learn.callback_fns.append(partial(LearnerTensorboardWriter, base_dir=tboard_path, name=taskName))
Hi MinuteWalrus85 ,
You are right and this is an observation about the reports on your experiment.
Onvce trains can't detect reports sending by the script, it will fallback to report iterations according to const time, until reports will be detected.
Hi MinuteWalrus85 .
Good news about fastai
, the integration in almost done and a version will be release in the coming days :)
change the report of the train_loss?
Are you referring for not sending the train_loss
results?
I will try and let you know in the next experiment
MinuteWalrus85 thanks for the screenshot, asking about TB dashboard to understand where the issue is coming from.
Trains is patching TB stats and showing it to you in the web-app, so if the results are the same in TB dashboard, the reporting of the values can be wrong, if the TB dashboard and the web-app have different results, there can be an issue with web-app reporting
Good morning Alon, since you helped me so much getting tensorboard to show results yesterday, I'm hoping you can help me understand why some results I'm getting are strange:
the train_loss is on the second from left column (the far left is epoch num 30-36)
MinuteWalrus85 didn’t success to reproduce it, can you share with me your experiment (without any data, just how to reproduce)? We can continue in DM if you like
yes, that solved the errors, however the two lines "could not detect iteration reporting" and "reporting detected" a few moments later, still show up
here is the result on the Tboard in Trains:
Hi MinuteWalrus85 , Good morning 🌞
What do you get when view the results with Tensorboard dashboard?
(you can view those with tensorboard --logdir=<tboard_path>
)
in the meantime, I got this error message, this time regarding Trains: