When It Comes To Continuous Training, I Wanted To Know How You Train Or Would Train If You Have Annotated Data Incoming? Do You Train Completely Online Where You Train As Soon As You Have A Training Example Available? Do You Instead Train When You Have A

Answered

When it comes to continuous training, I wanted to know how you train or would train if you have annotated data incoming?
Do you train completely online where you train as soon as you have a training example available?
Do you instead train when you have a sufficient data available?
Also, when it comes to evaluating the model, do I just split the data into train and test sets? That would mean I'm not using all available data to train?

Basically my use case is e.g I have a base model that is trained on a regular dataset. But the company will have different clients and the data we'll get from them will all be from different distributions. So we were thinking that we'll have different models for each client, where their model will be trained on data from their distribution. I hope my query was clear.

How would you go about this? Or if you've faced a similar problem and maybe solved it before, how did you go about this?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

Votes Newest

Answers 15

Sorry for pinging you on this old thread.
...
And what was the learning strategy? ADAM? RMSProp?

Sorry, missed it...
I would actually use the HPO to test various setups (it uses Optuna under the hood so really SOTA hyper band Bayesian optimization ontop of them)
https://github.com/allegroai/clearml/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

My main query is do I wait for it to be a sufficient batch size or do I just send each image as soon as it comes to train

This is usually a cost optimization issue, generally speaking if GPU up time is not an issue that the process is stochastic anyhow, so waiting for a batch or not is not the most important factor (unless you use batchnorm layer, in that case this is basically a must)

I would not be able to split the data into train test splits, and that it would be very expensive and inefficient to train online.
...
What would be a good evaluation strategy? Splitting the batch into train test? that would mean less data for training but we can test it asap. Another idea I had was training on the current batch, then evaluating it on incoming batches. Any other ideas?

Well you could mark the new samples (50% for training, 50% for testing), then only use the testing ones (for example by rename the files or moving into a diff folder).
That said, if this is a video stream, then a sequence of frames contains very little change so splitting it to train/test basically means the test set is very very close to the train one.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Thank you, I'll start reading up on this once I've finished setting up the basic pipeline

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

My main query is do I wait for it to be a sufficient batch size or do I just send each image as soon as it comes to train

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

Lastly, I have asked this question multiple times, but since the MLOps process is so new, I want to learn from others experience regarding evaluation strategies. What would be a good evaluation strategy? Splitting the batch into train test? that would mean less data for training but we can test it asap. Another idea I had was training on the current batch, then evaluating it on incoming batches. Any other ideas?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

It's basically data for binary image classification, simple.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

With online learning, my two main concerns are that the training would be completely stochastic in nature, I would not be able to split the data into train test splits, and that it would be very expensive and inefficient to train online.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

What about the epochs though? Is there a recommended number of epochs when you train on that new batch?

I'm assuming you are also using the "old" images ?
The main factor here is the ratio between the previously used data and the newly added data, you might also want to resample (i.e. train on more) new data vs old data. make sense ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

It'll be labeled in the folder I'm watching it.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

AgitatedDove14 Sorry for pinging you on this old thread. I had an additional query. If you've worked on a process similar to the one mentioned above, how do you set the learning rate? And what was the learning strategy? ADAM? RMSProp?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

Hi VexedCat68
What type of data is it? And what type of annotations?
Streaming data into the training process is great, but is it post quality control?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Would you know what the pros would be to learning online other than the fact that the incoming data is as close to the current distribution of data based on time as possible for us. Also would those benefits worth it to train online?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

Understandable. I mainly have regular image data, not video sequences so I can do the train test splits like you mentioned normally. What about the epochs though? Is there a recommended number of epochs when you train on that new batch?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

Should I just train for 1 epoch? Or multiple epochs? Given I'm only training on the new batch of data and not the whole dataset?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

I get what you're saying. I was considering training on just the new data to see how it works. To me it felt like that was the fastest way to deal with data drift. I understand that it may introduce instability however. I was curious how other developers who have successfully managed to set up continuous training deal with it. 100% new data, or a ratio between new and old data. And if it is the latter, what should be the case, which should be the majority, old data or new data?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					VexedCat68
				
					0
					 × 1

Write your answer

1K Views

15 Answers

3 years ago

2 years ago