My main query is do I wait for it to be a sufficient batch size or do I just send each image as soon as it comes to train
This is usually a cost optimization issue, generally speaking if GPU up time is not an issue that the process is stochastic anyhow, so waiting for a batch or not is not the most important factor (unless you use batchnorm layer, in that case this is basically a must)
I would not be able to split the data into train test splits, and that it would be very expensive and inefficient to train online.
...
What would be a good evaluation strategy? Splitting the batch into train test? that would mean less data for training but we can test it asap. Another idea I had was training on the current batch, then evaluating it on incoming batches. Any other ideas?
Well you could mark the new samples (50% for training, 50% for testing), then only use the testing ones (for example by rename the files or moving into a diff folder).
That said, if this is a video stream, then a sequence of frames contains very little change so splitting it to train/test basically means the test set is very very close to the train one.