Hi Guys! Love Using Trains And Love The Great Support In This Channel. Say I Have Two Different Training Experiments Which Report Every 20 Iteration, But The Batch Size Between Them Is Different, Resulting In Different Number Of Iterations Per Epoch. I Wo

Answered

Hi guys! love using trains and love the great support in this channel.
Say I have two different training experiments which report every 20 iteration, but the batch size between them is different, resulting in different number of iterations per epoch. I would like to be able to somehow specify how many iterations are considered an epoch, or mark ahen an epoch ends so I'll be able to comapre the two experiments over time. Is this possible?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					ShallowCat10
				
					0
					 × 1

Votes Newest

Answers 12

thanks! so basically for experiments that are already finished I have no way to compare ATM, right?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					ShallowCat10
				
					0
					 × 1

yeah. something like that

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					ShallowCat10
				
					0
					 × 1

So obviously the straight forward solution is to report normalize the step value when reporting to TB, i.e. int(step/batch_size). Which makes sense as I suppose the batch size is known and is part of the hyper-parameters. Normalization itself can be done when comparing experiments in the UI, and in the backend can do that, if given the correct normalization parameter. I think this feature request should actually be posted on GitHub, as it is not as simple as one might think (the UI needs to allow you to select parameter for comparison, then the question is do we normalize all the scalars or just a few etc.)
Anyhow if we have enough people interested we can definitely add it :)

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I mean manually you can get the results and rescale but, not through the UI

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I doesn't really matter to me. One solution I had in mind is that this can be done by the web client on demand, meaning you can manually (or using the Task object) specify how many iteration constitute a single epoch, and instead of scaling the plots will just be subsampled (or interpolated)

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					ShallowCat10
				
					0
					 × 1

ShallowCat10 Thank you for the kind words 🙂

so I'll be able to compare the two experiments over time. Is this possible?

You mean like match the loss based on "images seen" ?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

right now the situation is problematic, because as I mentioned, I can't compare the training process between different batch sizes (or effective batch size, if I use a different number of GPUs)

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					ShallowCat10
				
					0
					 × 1

How do you report it?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

tensorboard automagic 😉

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					ShallowCat10
				
					0
					 × 1

Hmm... scaling these scalars while reporting might be a bit too much to do in the background, don't you think you will loose transparency as in the TB you'll see graphs that are diff from what you see in the system ?

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

ok. thank you 🙂

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					ShallowCat10
				
					0
					 × 1

yes 😞

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

2K Views

12 Answers

5 years ago

2 years ago