Built-in standardization in Keras model - python

I wonder if it is possible to include standardization of input features directly into a keras model, in a way that it would be automatically included when loading the model with models.load_model? This would avoid the need for carrying around the normalization transformation from the training set when applying the model elsewhere.
I understand a possible solution is to include the Keras model into a scikit-learn pipeline ( see for instance How to insert Keras model into scikit-learn pipeline? ). However I would prefer not to set up a pipeline and ideally just use models.load_model. Are there possible solutions that do not involve using anything other than Keras or Tensorflow?
Another possible solution is simply using a BatchNormalization layer as the first layer in the network. This builds normalization into the network, but during training the initial normalization then depends on the statistics of the (small) batches rather than on the entire training set, and would thus vary in an unnecessary way between training batches.
( A similar question to this was asked earlier ( How to include normalization of features in Keras regression model? ), but neither the question nor answer seemed very clear. )

Related

Keras data augmentation layers in model or out of model

So this may be a silly question but how exactly do the preprocessing layers in keras work, especially in the context of as a part of the model itself. This being compared to preprocessing being applied outside the model then inputting the results for training.
I'm trying to understand running data augmentation in keras models. Lets say I have 1000 images for training. Out of model I can apply augmentation 10x and get 10000 resultant images for training.
But I don't understand what's happening when you use a preprocess layer for augmentation. Does this (or these if you use many) layers take each image and apply the transformations before training? Does this mean the total number of images used for training (and validation I assume) to be the number of epochs*the original number of images?
Is one option better than the other? Does that depend on the number of images one originally has before augmentation?
The benefit of preprocessing layers is that the model is truly end-to-end, i.e. raw data comes in and a prediction comes out. It makes your model portable since the preprocessing procedure is included in the SavedModel.
However, it will run everything on the GPU. Usually it makes sense to load the data using CPU worker(s) in the background while the GPU optimizes the model.
Alternatively, you could use a preprocessing layer outside of the model and inside a Dataset. The benefit of that is that you can easily create an inference-only model including the layers, which then gives you the portability at inference time but still the speedup during training.
For more information, see the Keras guide.

Does model get retrained entirely using .fit() in sklearn and tensorflow

I am trying to use machine learning in Python. Right now I am using sklearn and TensorFlow. I was wondering what to do if I have a model that needs updating when new data comes. For example, I have financial data. I built an LSTM model with TensorFlow and trained it. But new data comes in every day, and I don't want to retrain the model every day. Is there a way just to update the model and not retrain it from scratch?
In sklearn, the documentation for .fit() method (using DecisionTreeClassifier as an example) says that it
Build a decision tree classifier from the training set (X, y).
So it seems like it will retrain the entire model from scratch.
In tensorflow, .fit() method (using Sequential as an example) say
Trains the model for a fixed number of epochs (iterations on a
dataset).
So it seems like it does update the model instead of retraining. But I am not sure if my understanding is correct. I would be grateful for some clarification. And if sklearn indeed retrains the entire model using .fit(), is there a function that would just update the model instead of retraining from scratch?
When you say update and not train. Is it just updating the weights using the new data?
If so you can adopt two approaches with Transfer learning.
Finetune: Initialise a model with the weights from old model and retrain it on new data.
Add a new layer: Add a new layer and update the weights in this layer only while freezing the remaining weights in the network.
for more details read the tensorflow guide on tansferlearning
In tensorflow, there is a method called train_on_batch() that you can call on your model.
Say you defined your model as sequential, and you initially trained it on the existing initial_dataset using the fit method.
Now, you have new data in your hand -> call it X_new_train,y_new_train
so you can update the existing model using train_on_batch()
An example would be:
#generate some X_new_train (one batch)
X_new_train = tf.random.normal(shape=[no_of_samples_in_one_batch,100])
#generate corresponding y_new_train
y_new_train = tf.constant([[1.0]]*no_of_samples_in_one_batch)
model.train_on_batch(X_new_train,y_new_train)
Note that the idea of no_of_samples_in_one_batch (also called batch size) is not so important here. I mean whatever number of samples that you have in your data will be considered as one batch!
Now, coming to sklearn, I am not sure whether all machine learning models can incrementally learn (update weights from new examples). There is a list of models that support incremental learning:
https://scikit-learn.org/0.15/modules/scaling_strategies.html#incremental-learning
In sklearn, the .fit() method retrains on the dataset i.e as you use .fit() on any dataset, any info pertaining to previous training will all be discarded. So assuming you have new data coming in every day you will have to retrain each time in the case of most sklearn algorithms.
Although, If you like to retrain the sklearn models instead of training from scratch, some algorithms of sklearn (like SGDClassifier) provide a method called partial_fit(). These can be used to retrain and update the weights of an existing model.
As per Tensorflow, the .fit() method actually trains the model without discarding any info pertaining to previous trainings. Hence each time .fit() is used via TF it will actually retrain the model.
Tip: you can use SaveModel from TF to save the best model and reload and re-train the model as and when more data keeps flowing in.

tensorflow shuffle and batch necessary if building the model sequentially?

I am looking at the recurrent neural net walkthrough here. In the tutorial they have a line item that does:
dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
However, if you're doing a sequential build, is that still necessary? Looking at the sequential documentation, a shuffle is performed automatically? If not, why is it done here? Is there a simple numerical example of the effect?
The tf.keras.models.Sequential can also batch and shuffle the data, similar to what tf.data.Dataset does. These preprocessing features are provided in Sequential because it can take up data in several types like NumPy arrays, tf.data.Dataset, dict object as well as a tf.keras.utils.Sequence.
The tf.data.Dataset API provides these features because the API is consistent with other TensorFlow APIs ( in which Keras is not involved ).
I don't think the shuffling and batching need to be done twice. You may remove the if you wish, it will not affect the model's training. I think the author wanted to use tf.data.Dataset to get the data into a Keras model. dataset.shuffle( ... ).batch( ... ) have been colloquial with the Dataset.

Tensorflow: Combining Loss Functions in LSTM Model for Domain Adaptation

Can any one please help me out?
I am working on my thesis work. Its about Predicting Parkinson disease, Since i want to build an LSTM model to adapt independent of patients. Currently i have implemented it using TensorFlow with my own loss function.
Since i am planning to introduce both labeled train and unlabeled train data in every batch of data to train the model. I want to apply my own loss function on this both labeled and unlabeled train data and also want to apply cross entropy loss only on labeled train data. Can i do this in tensorflow?
So my question is, Can i have combination of loss functions in a single model training on different set of train data?
From an implementation perspective, the short answer would be yes. However, I believe your question could be more specific, maybe what you mean is whether you could do it with tf.estimator?

Are there some pre-trained LSTM, RNN or ANN models for time-series prediction?

I am trying to solve a time series prediction problem. I tried with ANN and LSTM, played around a lot with the various parameters, but all I could get was 8% better than the persistence prediction.
So I was wondering: since you can save models in keras; are there any pre-trained model (LSTM, RNN, or any other ANN) for time series prediction? If so, how to I get them? Are there in Keras?
I mean it would be super useful if there a website containing pre trained models, so that people wouldn't have to speent too much time training them..
Similarly, another question:
Is it possible to do the following?
1. Suppose I have a dataset now and I use it to train my model. Suppose that in a month, I will have access to another dataset (corresponding to same data or similar data, in the future possibly, but not exclusively). Will it be possible to continue training the model then? It is not the same thing as training it in batches. When you do it in batches you have all the data in one moment.
Is it possible? And how?
I'll answer your last questions first.
Will it be possible to continue training the model then? It is not the same thing as training it in batches. When you do it in batches you have all the data in one moment. Is it possible? And how?
Yes, it is possible. In general, it's called transfer learning. But keep in mind that if two datasets represent very different populations, the network will soon "forget" what it learned on the first run and will optimize to the second one. To do this, you simply start training from a loaded state instead of random initialization and save the model afterwards. It is also recommended to use a smaller learning rate on the second run in order to adapt it gradually to the new data.
are there any pre-trained model (LSTM, RNN, or any other ANN) for time
series prediction? If so, how to I get them? Are there in Keras?
I haven't found exactly a pre-trained model, but a quick search gave me several active GitHub projects that you can just run and get a result for yourself: Time Series Prediction with Machine Learning (LSTM, GRU implementation in tensorflow), LSTM Neural Network for Time Series Prediction (keras and tensorflow), Time series predictions with Keras (keras and theano), Neural-Network-with-Financial-Time-Series-Data (keras and tensorflow). See also this post.
Now you can use BERT or related variants and here you can find all the pre-trained models: https://huggingface.co/transformers/pretrained_models.html
And it is possible to pre-train and fine-tune RNN, and you can refer to this paper: TimeNet: Pre-trained deep recurrent neural network for time series classification.

Categories

Resources