Create a "sample by sample" model in Keras - python

I want to create a Model in Keras that can learn "sample by sample"; this type of machine is called online learning, a model that receives and fit data by data. My question is:
How can I do that in Keras? Is it possible to do this just by setting batch_size=1 while fitting?

In Keras batch size has nothing to do with how data is fed in. Batch size determines how many parallel samples are going to be fed into the network per gradient update. A more clear explanation of batch size depends on what the network is. For example in a stateful RNN, batch size of N means the input tensor contains N independent series. A single batch process moves forward on all N series by one sample. So, in each batch, N samples (1 of each N independent series) are processed and the gradient is updated.
Therefore, in your case, it seems that there's only one stream for samples, If the samples are of type time series data, so we definitely have batch_size=1, If before deploying the model you have a data set to train the model on, you can read them all in memory and fit the model, and after deployment as new observations is provided you can train_on_batch or fit the model again and again. There's no limit how many times you fit the model.

Related

How to deal with batch normalization for multiple datasets?

I am working on a task of generating synthetic data to help the training of my model. This means that the training is performed on synthetic + real data, and tested on real data.
I was told that batch normalization layers might be trying to find weights that are good for all while training, which is a problem since the distribution of my synthetic data is not exactly equal to the distribution of the real data. So, the idea would be to have different ‘copies’ of the weights of batch normalization layers. So that the neural network estimates different weights for synthetic and real data, and uses just the weights of real data for evaluation.
Could someone suggest me good ways to actually implement that in pytorch? My idea was the following, after each epoch of training in a dataset I would go through all batchnorm layers and save their weights. Then at the beginning of the next epoch I would iterate again loading the right weights. Is this a good approach? Still, I am not sure how I should deal with the batch-norm weights at test time since batch-norm treats it differently.
It sounds like the problem you're worried about is that your neural network will learn weights that work well when the batch norm is computed for a batch of both real and synthetic data, and then later at test time it will compute a batch norm on just real data?
Rather than trying to track multiple batch norms, you probably just want to set track_running_stats to True for your batch norm layer, and then put it into eval mode when testing. This will cause it to compute a running mean and variance over multiple batches while training, and then it will use that mean and variance later at test time, rather than looking at the batch stats for the test batches.
(This is often what you want anyway, because depending on your use case, you might be sending very small batches to the deployed model, and so you want to use a pre-computed mean and variance rather than relying on stats for those small batches.)
If you really want to be computing fresh means and variances at test time, what I would do is instead of passing a single batch with both real and synthetic data into your network, I'd pass in one batch of real data, then one batch of synthetic data, and average the two losses together before backprop. (Note that if you do this you should not rely on the running mean and variance later -- you'll have to either set track_running_stats to False, or reset it when you're done and run through a few dummy batches with only real data to compute reasonable values. This is because the running mean and variance stats are only useful if they're expected to be roughly the same for every batch, and you're instead polarizing the values by feeding in different types of data in different batches.)

In Keras, after you train a stateful LSTM model, do you have to re-train the model as you predict values?

I've been trying to create a stateful LSTM model with keras, and I pretty much figured out the training part, but I don't get the predicting part.
So, let's imagine that we had 10000 time-series datapoints. we use 9000 in front for training, and the other 1000 for testing. So, as we start training, we set the window length to 2, and slide the window forward as we set the input(X) as the first datapoint and set the output(y) as the second datapoint.
And as we train, the model converges because of it's stateful nature. Finally we finish training.
Now, we are left with a model, and some test data. The problem begins here. We test the first datapoint.
It returns a guessed value. Nice.
We test the second datapoint of the test set.
We get an output. But, the problem is that because we were using a stateful model, and we only one value as an input, the only way the model is going to figure out the next value is from memory of the previous time-series.
But since we didn't train the data on the first datapoint of the test set, the time-series is broken, and the model will think that the second datapoint on the test set is the first datapoint on the test set!
So, my question is,
does keras take care of this and automaticaly train the network as it's predicting?
or do I have to train the net as I am predicting
or is there some other reason that enables me to just keep predicting without training the model farther?
For a stateful LSTM, if will retain information in its cells as you predict. If you were to take any random point in the train or test dataset and repeatedly predict on it, your answer will change each time, because it keeps seeing this data and uses it every time it predicts. The only way to get a repeatable answer would be to call reset_states().
You should be calling reset_states() after each training epoch, and when you save the model, those cells should be empty. Then if you want to start predicting on the test set, you can predict on the last n training points (without saving the values anywhere), then start saving values once you get to your first test point.
It is often good practice to seed the model before prediction. If I want to evaluate on test_set[10:20,:], I can let the model predict on test_set[:10,:] first to seed the model then start saving my predicted values once I get to the range I am interested in.
To address the further training question, you do not need to train the model further to predict. Training will only be for tuning the model's weights. Look into this blog for more information on Stateful vs Stateless LSTM.

Difference between TensorFlow model fit and train_on_batch

I am building a vanilla DQN model to play the OpenAI gym Cartpole game.
However, in the training step where I feed in the state as input and the target Q values as the labels, if I use model.fit(x=states, y=target_q), it works fine and the agent can eventually play the game well, but if I use model.train_on_batch(x=states, y=target_q), the loss won't decrease and the model will not play the game anywhere better than a random policy.
I wonder what is the difference between fit and train_on_batch? To my understanding, fit calls train_on_batch with a batch size of 32 under the hood which should make no difference since specifying the batch size to equal the actual data size I feed in makes no difference.
The full code is here if more contextual information is needed to answer this question: https://github.com/ultronify/cartpole-tf
model.fit will train 1 or more epochs. That means it will train multiple batches. model.train_on_batch, as the name implies, trains only one batch.
To give a concrete example, imagine you are training a model on 10 images. Let's say your batch size is 2. model.fit will train on all 10 images, so it will update the gradients 5 times. (You can specify multiple epochs, so it iterates over your dataset.) model.train_on_batch will perform one update of the gradients, as you only give the model on batch. You would give model.train_on_batch two images if your batch size is 2.
And if we assume that model.fit calls model.train_on_batch under the hood (though I don't think it does), then model.train_on_batch would be called multiple times, likely in a loop. Here's pseudocode to explain.
def fit(x, y, batch_size, epochs=1):
for epoch in range(epochs):
for batch_x, batch_y in batch(x, y, batch_size):
model.train_on_batch(batch_x, batch_y)

Does fitting a model to a new data set overwrite previous training progress?

I'd like to implement some active learning algorithm (modAL) with Keras. But I'd like to know whether initiating multiple training instances (i.e., running .fit() more than once) will build on previous training, or if the weights are reset. In other words, is training additive or iterative?
In case training starts from scratch each time, is there a way to have the model build on previous training?
For a given iteration the inputs are provided to the network with the network weights set to a certain value. For each applied input back propagation is used to calculate a gradients. For a batch size of 100 ,100 inputs are applied and the resultant 100 gradients averaged to determine a new value for the network weights. These weights are then used to process the next batch of 100 inputs. So the process is iterative not additive. There are many possible explanations for why the network appears not to be learning.

What does train_on_batch() do in keras model?

I saw a sample of code (too big to paste here) where the author used model.train_on_batch(in, out) instead of model.fit(in, out). The official documentation of Keras says:
Single gradient update over one batch of samples.
But I don't get it. Is it the same as fit(), but instead of doing many feed-forward and backprop steps, it does it once? Or am I wrong?
Yes, train_on_batch trains using a single batch only and once.
While fit trains many batches for many epochs. (Each batch causes an update in weights).
The idea of using train_on_batch is probably to do more things yourself between each batch.
It is used when we want to understand and do some custom changes after each batch training.
A more precide use case is with the GANs.
You have to update discriminator but during update the GAN network you have to keep the discriminator untrainable. so you first train the discriminator and then train the gan keeping discriminator untrainable.
see this for more understanding:
https://medium.com/datadriveninvestor/generative-adversarial-network-gan-using-keras-ce1c05cfdfd3
The method fit of the model train the model for one pass through the data you gave it, however because of the limitations in memory (especially GPU memory), we can't train on a big number of samples at once, so we need to divide this data into small piece called mini-batches (or just batchs). The methode fit of keras models will do this data dividing for you and pass through all the data you gave it.
However, sometimes we need more complicated training procedure we want for example to randomly select new samples to put in the batch buffer each epoch (e.g. GAN training and Siamese CNNs training ...), in this cases we don't use the fancy an simple fit method but instead we use the train_on_batch method. To use this methode we generate a batch of inputs and a batch of outputs(labels) in each iteration and pass it to this method and it will train the model on the whole samples in the batch at once and gives us the loss and other metrics calculated with respect to the batch samples.

Categories

Resources