Answered

I Am Using Opennmt-Tf (2.18.1) And Clearml (1.1.2) For Training And Testing My Translation Models. I Am Wanting To Register The Incremental Bleu Scores And Final Test Data With Clearml (For Plotting, Comparison, Etc.), But It Is Not Working. I Cannot Fi

I am using opennmt-tf (2.18.1) and clearml (1.1.2) for training and testing my translation models. I am wanting to register the incremental bleu scores and final test data with clearml (for plotting, comparison, etc.), but it is not working. I cannot figure out if:
This is already done auto-magically with opennmt, and I am just doing it wrong (missing an import or a simple statement somewhere) This is not done and should be done automagically (needs dev work in clearml) This will need to be done manually in some way
Does someone know how to resolve this?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					StrangePelican34
				
					0
					 × 1

Votes Newest

Answers 10

It worked! I added this call shortly after Task.init :
tf.summary.create_file_writer("C:/mypath/logs")

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					StrangePelican34
				
					0
					 × 1

I call

Task.init

after I import tensorflow (and thus tensorboard?)

That should have worked...
Can you manually add a TB report before calling opennmt function ?
(I want to verify the Task.init is indeed catching the TB calls, my theory is that somewhere inside the opennmt we loose the TB)

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi StrangePelican34
What exactly I not working? Are you getting any TB reports?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

In Tensorflow's init .py, tensorboard appears to be initialized (including tf.summary):
` # Hook external TensorFlow modules.

Import compat before trying to import summary from tensorboard, so that

reexport_tf_summary can get compat from sys.modules. Only needed if using

lazy loading.

_current_module.compat.v2 # pylint: disable=pointless-statement
try:
from tensorboard.summary._tf import summary
_current_module.path = (
[_module_util.get_parent_dir(summary)] + _current_module.path)
setattr(_current_module, "summary", summary)
except ImportError:
_logging.warning(
"Limited tf.summary API due to missing TensorBoard installation.") I call Task.init ` after I import tensorflow (and thus tensorboard?) but before I create the opennmt runner. Should this be ok? Are you referring to something else when saying "call Task.init beofre TB is created"?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					StrangePelican34
				
					0
					 × 1

So, accordintg to the article (and the code as far as I could tell), OpenNmt-tf automatically enabled TensorBoard. That is, it auto-logs the relevant features through tf.summary ( https://www.tensorflow.org/api_docs/python/tf/summary ). This is output on the cmd line with the likes of:
INFO:tensorflow:Evaluation result for step 9000: loss = 1.190986 ; perplexity = 3.290324 ; bleu = 63.569644 INFO:tensorflow:Step = 9100 ; steps/s = 2.17, source words/s = 28293, target words/s = 39388 ; Learning rate = 0.000927 ; Loss = 1.381563However, this data is not picked up automatically by ClearML. I am specifically looking at opennmt's Runner.train: https://opennmt.net/OpenNMT-tf/package/opennmt.Runner.html

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					StrangePelican34
				
					0
					 × 1

No TB (Tesnorboard) is not enabled.

That explains it 🙂 did you manage to get it working ?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

The only place I see subprocess being called in opennmt is to determine the batch size, but not for the primary training task.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					StrangePelican34
				
					0
					 × 1

Hmm StrangePelican34
Can you verify you call Task.init before TB is created ? (basically at the start of everything)

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

No TB (Tesnorboard) is not enabled. I just googled it and found this: https://forum.opennmt.net/t/running-tensorboard/4242 . I will try enabling TB and see if that fixes it.

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					StrangePelican34
				
					0
					 × 1

From the docs I think what's going on is that the https://opennmt.net/OpenNMT-tf/package/opennmt.Runner.html#opennmt.Runner.train is spinning a new subprocess, and the training itself happens on the subprocess.
If this is the case this will explain the lack of automagic, as the subprocess is lacking the "Task.init" call
wdyt, could that be the case ?

  				
Posted 
	4 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

2K Views

10 Answers

4 years ago

2 years ago