So, accordintg to the article (and the code as far as I could tell), OpenNmt-tf automatically enabled TensorBoard. That is, it auto-logs the relevant features through tf.summary ( https://www.tensorflow.org/api_docs/python/tf/summary ). This is output on the cmd line with the likes of:
INFO:tensorflow:Evaluation result for step 9000: loss = 1.190986 ; perplexity = 3.290324 ; bleu = 63.569644 INFO:tensorflow:Step = 9100 ; steps/s = 2.17, source words/s = 28293, target words/s = 39388 ; Learning rate = 0.000927 ; Loss = 1.381563However, this data is not picked up automatically by ClearML. I am specifically looking at opennmt's Runner.train: https://opennmt.net/OpenNMT-tf/package/opennmt.Runner.html
No TB (Tesnorboard) is not enabled. I just googled it and found this: https://forum.opennmt.net/t/running-tensorboard/4242 . I will try enabling TB and see if that fixes it.
From the docs I think what's going on is that the https://opennmt.net/OpenNMT-tf/package/opennmt.Runner.html#opennmt.Runner.train is spinning a new subprocess, and the training itself happens on the subprocess.
If this is the case this will explain the lack of automagic, as the subprocess is lacking the "Task.init" call
wdyt, could that be the case ?
In Tensorflow's init .py, tensorboard appears to be initialized (including tf.summary):
` # Hook external TensorFlow modules.
Import compat before trying to import summary from tensorboard, so that
reexport_tf_summary can get compat from sys.modules. Only needed if using
_current_module.compat.v2 # pylint: disable=pointless-statement
from tensorboard.summary._tf import summary
_current_module.path = (
[_module_util.get_parent_dir(summary)] + _current_module.path)
setattr(_current_module, "summary", summary)
"Limited tf.summary API due to missing TensorBoard installation.")
I call Task.init ` after I import tensorflow (and thus tensorboard?) but before I create the opennmt runner. Should this be ok? Are you referring to something else when saying "call Task.init beofre TB is created"?
after I import tensorflow (and thus tensorboard?)
That should have worked...
Can you manually add a TB report before calling
opennmt function ?
(I want to verify the Task.init is indeed catching the TB calls, my theory is that somewhere inside the
opennmt we loose the TB)