Predicting Past End of Dataset with RNN in Keras

Predicting Past End of Dataset with RNN in Keras - python

I have a dataset spanning hundreds of values regarding temperature. Obviously, in meteorology, it is helpful to predict what future values will be based on the past.
I have the following stateful model, built in Keras:
look_back = 1
model.add(LSTM(32, batch_input_shape=(batch_size, look_back, 1), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
for i in range(10):
model.fit(trainX, trainY, epochs=4, batch_size=batch_size, verbose=2, shuffle=False)
model.reset_states()
# make predictions
trainPredict = model.predict(trainX, batch_size=batch_size)
I have successfully been able to train and test the model on my dataset to reasonable results, however am struggling to comprehend what is required to predict the next, say, 20 points in the dataset. Obviously, these 20 points are outside of the dataset, and they have yet to "occur".
I would appreciate anything that would be of help; I feel like I am missing some simple functionality in Keras.
Thank you.

I feel like I am missing some simple functionality in Keras.
You have all you need right there. To obtain predictions on new data you have to use model.predict() again, but on the desired range. This depends on how your data looks.
Lets assume your timeseries trainX had events with x ranging from [0,100].
Then to predict the next 20 events you want to call predict() on values 101 to 120, something like:
futureData = np.array(range(101,121)) #[101,102,...,120]
futurePred = model.predict(futureData)
Again, this depends on how your "next 20" events look. If you bin size were instead 0.1 (100, 100.1, 100.2,...) you should evaluate the prediction accordingly.
You may also like to check this page where they give examples and explain more about Timeseries in Keras with RNNs, if you are interested.

Related

Keras LSTM neural network for Time Series Predictions shows nan during model fit

I am training a neural network to predict a whole day of availability (144 samples, 6 features) by passing yesterday's availability (144 samples). I'm having trouble finding good resources or explanations on how to define a neural network to predict time series in a regression problem. The training is defined as a supervised learning problem. My definition of the neural network is,
lstm_neurons = 30
model = Sequential()
model.add(LSTM(lstm_neurons * 2, input_shape=(self.train_x.shape[1], sel f.train_x.shape[2]), return_sequences=True))
model.add(LSTM(lstm_neurons * 2))
model.add(Dense(len_day, activation='softmax'))
model.compile(loss='mean_squared_error', optimizer='adam', metrics = [rm se, 'mae', 'mape'])
I am training for 20 epochs with a batch size of 200 where the used datasets have the following shapes,
Train X (9631, 144, 6)
Train Y (9631, 144)
Test X (137, 144, 6)
Test Y (137, 144)
Validation X (3990, 144, 6)
Validation Y (3990, 144)
All of this produces nan values during training for loss, rmse, mae... While this looks like it's a problem I can use the generated model to generate predictions and they look good-ish.

The first question to ask - are you trying to predict a time series based on interpreting availability as a probability measure?
The softmax activation function would work best under this scenario - but you may be misspecifying it when you are in fact attempting to forecast an interval time series - hence why you are obtaining NaN readings for your results.
This example might be of use to you - LSTM is used to this example to forecast weekly fluctuations in hotel cancellations.
Similarly to your example, X_train and X_val are reshaped as samples, time steps, features:
X_train = np.reshape(X_train, (X_train.shape[0], 1, X_train.shape[1]))
X_val = np.reshape(X_val, (X_val.shape[0], 1, X_val.shape[1]))
The LSTM network is defined as follows:
# Generate LSTM network
model = tf.keras.Sequential()
model.add(LSTM(4, input_shape=(1, previous)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X_train, Y_train, epochs=20, batch_size=1, verbose=2)
As you can see, the mean squared error is used as the loss function since the cancellation variable in question is interval (i.e. can take on a wide range of values and is not necessarily restricted by any particular scale).
I can only speculate as I have not seen your data or results, but you may be going wrong by defining softmax as your activation function when it is not appropriate - I suspect this is the case as you are also using mean squared error as the loss measurement.
In the above example, the Dense layer does not specify an activation function per se.
In terms of how you might choose to validate whether your time series forecast with LSTM is effective, a potentially good idea is to compare the findings to that of a simpler time series model; e.g. ARIMA.
Using our example, ARIMA performed better when forecasting for Hotel 1, but LSTM performed better when forecasting for Hotel 2:
H1 Results
Reading ARIMA LSTM
MDA 0.86 0.8
RMSE 57.95 63.89
MFE -12.72 -54.25
H2 Results
Reading ARIMA LSTM
MDA 0.86 0.8
RMSE 274.07 95.28
MFE 156.32 38.65
Finally, when creating your datasets using the train and validation sets, you must also ensure that you are using the correct previous parameter, i.e. the number of time periods going back with which you choose to regress against the observations at time t.
For instance, you are using yesterday's availability - but you might find that the model is improved using the previous 5 or 10 days, for instance.
# Number of previous
previous = 5
X_train, Y_train = create_dataset(train, previous)
X_val, Y_val = create_dataset(val, previous)
In your situation, the first thing I would check is the use of the softmax activation function, and work from there.

By which technique adapted to time-series can I replace cross-validation in my Keras MLP regression model in Python

I'm currently working with a time series dataset of 46 lines about meteorological measurements on approximately each 3 hours by day during one week. My explanatory variables (X) is composed of 26 variables and some variable has different units of measurement (degree, minimeters, g/m3 etc.). My variable to explain (y) is composed of only one variable temperature.
My goal is to predict temperature (y) on a slot of 12h-24h with the ensemble of variables (X)
For that I used Keras Tensorflow and Python, with MLP regressor model :
X = df_forcast_cap.loc[:, ~df_forcast_cap.columns.str.startswith('l')]
X = X.drop(['temperature_Y'],axis=1)
y = df_forcast_cap['temperature_Y']
y = pd.DataFrame(data=y)
# normalize the dataset X
scaler = MinMaxScaler(feature_range=(0, 1))
scaler.fit_transform(X)
normalized = scaler.transform(X)
# normalize the dataset y
scaler = MinMaxScaler(feature_range=(0, 1))
scaler.fit_transform(y)
normalized = scaler.transform(y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# define base model
def norm_model():
# create model
model = Sequential()
model.add(Dense(26, input_dim=26, kernel_initializer='normal', activation='relu'))# 30 is then number of neurons
#model.add(Dense(6, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
# Compile model
model.compile(loss='mean_squared_error', optimizer='adam')
return model
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# evaluate model with standardized dataset
estimator = KerasRegressor(build_fn=norm_model, epochs=(100), batch_size=5, verbose=1)
kfold = KFold(n_splits=10, random_state=seed)
results = cross_val_score(estimator, X, y, cv=kfold)
print(results)
[-0.00454741 -0.00323181 -0.00345096 -0.00847261 -0.00390925 -0.00334816
-0.00239754 -0.00681044 -0.02098541 -0.00140129]
# invert predictions
X_train = scaler.inverse_transform(X_train)
y_train = scaler.inverse_transform(y_train)
X_test = scaler.inverse_transform(X_test)
y_test = scaler.inverse_transform(y_test)
results = scaler.inverse_transform(results)
print("Results: %.2f (%.2f) MSE" % (results.mean(), results.std()))
Results: -0.01 (0.01) MSE
(1) I read that cross-validation is not adapted for time series prediction. So, I'm wondering which others techniques exist and which one is more adapted to time-series.
(2) In a second place, I decided to normalize my data because my X dataset is composed of different metrics (degree, minimeters, g/m3 etc.) and my variable to explain y is in degree. In this way, I know that have to deal with a more complicated interpretation of the MSE because its result won't be in the same unity that my y variable. But for the next step of my study I need to save the result of the y predicted (made by the MLP model) and I need that these values be in degree. So, I tried to inverse the normalization but without success, when I print my results, the predicted values are still in normalized format (see in my code above). Does anyone see my mistake.s ?

The model that you present above is looking at a single instance of 26 measurements to make a prediction. From your description it seems that you would like to make predictions from a sequence of these measurements. I'm not sure if I fully understood the description but I'll assume that you have a sequence of 46 measurements, each with 26 values that you believe should be good predictors of the temperature. If that is the case, the input shape of your model should be (46, 26,). The 46 here is called time_steps, 26 is the number of features.
For a time series you need to select a model design. There are 2 approaches: a recurrent network or a convolutional network (or a mixture of the 2nd). A convolutional network is typically used to detect patterns in the input data which may be located somewhere in the data. For instance, suppose you want to detect a given shape in an image. Convolutional Networks are a good starting point. Recurrent networks, update their internal state after each time step. They can detect patterns as well as a convolutional network, but you can think of them as being less position independent.
Simple example of a convolutional approach.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import *
from tensorflow.keras.models import Sequential, Model
average_tmp = 0.0
model = Sequential([
InputLayer(input_shape=(46,26,)),
Conv1D(16, 4),
Conv1D(32, 4),
Conv1D(64, 2),
Conv1D(128, 4),
MaxPooling1D(),
Flatten(),
Dense(256, activation='relu'),
Dense(1, bias_initializer=keras.initializers.Constant(average_tmp)),
])
model.compile('adam', 'mse')
model.summary()
A mixed approach, would replace the ```Flatten`` layer above with an LSTM node. That would probably be a reasonable starting point to start experimenting.
(1) I read that cross-validation is not adapted for time series prediction. So, I'm wondering which others techniques exist and which one is more adapted to time-series.
cross validation is a technique that is very well suited for this problem. If you try the example model above, I can almost guarantee that it will overfit your dataset very significantly. cross-validation can help you determine the right regularisation parameters for your model in order to avoid overfitting.
Examples of regularisation techniques that you probably want to consider:
Saving the model weights at the epoch with lower validation score.
Dropout and/or BatchNormalization.
kernel regularisation.
(2) In a second place, I decided to normalize my data because my X dataset is composed of different metrics (degree, minimeters, g/m3 etc.) and my variable to explain y is in degree.
Good call. It will avoid training cycles of your model trying to discover the bias at very high values from the random initialisation.
In this way, I know that have to deal with a more complicated interpretation of the MSE because its result won't be in the same unity that my y variable.
This is orthogonal. The inputs are not assumed to be in the same unit as y. We assume in a DNN that we can create a combination of linear transformation of weights (plus non-linear activations). That has no implicit assumption of units.
But for the next step of my study I need to save the result of the y predicted (made by the MLP model) and I need that these values be in degree. So, I tried to inverse the normalization but without success, when I print my results, the predicted values are still in normalized format (see in my code above). Does anyone see my mistake.s ?
scaler.inverse_transform(results) should do the trick.
It doesn't make sense to inverse transform the inputs X_ and Y_. And it would probably help you keep your code straight to not use the same variable name for both the X and Y scalers.
It is also possible to refrain from scaling Y. If you choose to do so, I'd suggest that you initialise the output layer bias with the mean of the Ys.

CNN on small dataset is overfiting

I want to classify pattern on image. My original image shape are 200 000*200 000 i reshape it to 96*96, pattern are still recognizable with human eyes. Pixel value are 0 or 1.
i'm using the following neural network.
train_X, test_X, train_Y, test_Y = train_test_split(cnn_mat, img_bin["Classification"], test_size = 0.2, random_state = 0)
class_weights = class_weight.compute_class_weight('balanced',
np.unique(train_Y),
train_Y)
train_Y_one_hot = to_categorical(train_Y)
test_Y_one_hot = to_categorical(test_Y)
train_X,valid_X,train_label,valid_label = train_test_split(train_X, train_Y_one_hot, test_size=0.2, random_state=13)
model = Sequential()
model.add(Conv2D(24,kernel_size=3,padding='same',activation='relu',
input_shape=(96,96,1)))
model.add(MaxPool2D())
model.add(Conv2D(48,kernel_size=3,padding='same',activation='relu'))
model.add(MaxPool2D())
model.add(Conv2D(64,kernel_size=3,padding='same',activation='relu'))
model.add(MaxPool2D())
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(256, activation='relu'))
model.add(Dense(16, activation='softmax'))
model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])
train = model.fit(train_X, train_label, batch_size=80,epochs=20,verbose=1,validation_data=(valid_X, valid_label),class_weight=class_weights)
I have already run some experiment to find a "good" number of hidden layer and fully connected layer. it's probably not the most optimal architecture since my computer is slow, i just ran different model once and selected best one with matrix confusion, i didn't use cross validation,I didn't try more complex architecture since my number of data is small, i have read small architecture are the best, is it worth to try more complex architecture?
here the result with 5 and 12 epoch, bach size 80. This is the confusion matrix for my test set
As you can see it's look like i'm overfiting. When i only run 5 epoch, most of the class are assigned to class 0; With more epoch, class 0 is less important but classification is still bad
I added 0.8 dropout after each convolutional layer
e.g
model.add(Conv2D(48,kernel_size=3,padding='same',activation='relu'))
model.add(MaxPool2D())
model.add(Dropout(0.8))
model.add(Conv2D(64,kernel_size=3,padding='same',activation='relu'))
model.add(MaxPool2D())
model.add(Dropout(0.8))
With drop out, 95% of my image are classified in class 0.
I tryed image augmentation; i made rotation of all my training image, still used weighted activation function, result didnt improve. Should i try to augment only class with small number of image? Most of the thing i read says to augment all the dataset...
To resume my question are:
Should i try more complex model?
Is it usefull to do image augmentation only on unrepresented class? then should i still use weight class (i guess no)?
Should i have hope to find a "good" model with cnn when we see the size of my dataset?

I think according to the imbalanced data, it is better to create a custom data generator for your model so that each of it's generated data batch, contains at least one sample from each class. And also it is better to use Dropout layer after each dense layer instead of conv layer. For data augmentation it is better to at least use combination of rotate, horizontal flip and vertical flip. there are some other approaches for data augmentation like using GAN network or random pixel replacement.
For Gan you can check This SO post
For using Gan as data augmenter you can read This Article.
For combination of pixel level augmentation and GAN pixel level data augmentation

What I used - in a different setting - was to upsample my data with ADASYN. This algorithm calculates the amount of new data required to balance your classes, and then takes available data to sample novel examples.
There is an implementation for Python. Otherwise, you also have very little data. SVMs are good performing even with little data. You might want to try them or other image classification algorithms depending where the expected pattern is always at the same position, or varies. Then you could also try the Viola–Jones object detection framework.

InvalidArgumentError with RNN/LSTM in Keras

I'm throwing myself into machine learning, and wish to use Keras for a university project that's time-critical. I realise it would be best to learn individual concepts and building blocks, but it's important that this is done soon.
I'm working with someone who has some experience and interest in machine learning, but we cannot seem to get further than this. The below code was adapted from GitHub code mentioned in a guide in Machine Learning Mastery.
For context, I've got data from multiple physical sensors (where each sensor is a column), with each sample from those sensors represented by one row. I wish to use machine learning to determine who the sensors were tracking at any given time. I'm trying to allocate approximately 80% of the rows to training and 20% to testing, and am creating my own "y" set of data (with the first 521,549 rows being from one participant, and the remainder from another). My data (training and test) has a total of 1,019,802 rows, and 16 columns (all populated), but the number of columns can be reduced if need be.
I would love to know the following:
What does this error mean in the context of what I'm trying to achieve, and how can I change my code to avoid it?
Is the below code suitable for what I'm trying to achieve?
Does this code represent any specific fundamental flaw in my understanding of what machine learning (generally or specifically) is designed to achieve?
Below is the Python code I'm trying to run to make use of machine learning:
x_all = pd.read_csv("(redacted)...csv",
delim_whitespace=True, header=None, low_memory=False).values
y_all = np.append(np.full((521549,1), 0), np.full((498253,1),1))
limit = 815842
x_train = x_all[:limit]
y_train = y_all[:limit]
x_test = x_all[limit:]
y_test = y_all[limit:]
max_features = 16
maxlen = 80
batch_size = 32
model = Sequential()
model.add(Embedding(500, 32, input_length=max_features))
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=15,
validation_data=(x_test, y_test))
score, acc = model.evaluate(x_test, y_test,
batch_size=batch_size)
This is an excerpt from the CSV referenced in the code:
6698.486328125 4.28260869565217 4.6304347826087 10.6195652173913 2.4392579293836 2.56134051466188 9.05326152004788 0.0 1.0812 924.898261191267 -1.55725190839695 -0.244274809160305 0.320610687022901 -0.122938530734633 0.490254872563718 0.382308845577211
6706.298828125 4.28260869565217 4.58695652173913 10.5978260869565 2.4655894673848 2.50867743865949 9.04368641532017 0.0 1.0812 924.898261191267 -1.64885496183206 -0.366412213740458 0.381679389312977 -0.122938530734633 0.490254872563718 0.382308845577211
6714.111328125 4.26086956521739 4.64130434782609 10.5978260869565 2.45601436265709 2.57809694793537 9.03411131059246 0.0 1.0812 924.898261191267 -0.931297709923664 -0.320610687022901 0.320610687022901 -0.125937031484258 0.493253373313343 0.371814092953523
The following error occurs when running this:
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,0] = 972190 is not in [0, 500)
[[Node: embedding_1/embedding_lookup = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_FLOAT, _class=["loc:#training/Adam/Assign_2"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](embedding_1/embeddings/read, embedding_1/Cast, training/Adam/gradients/embedding_1/embedding_lookup_grad/concat/axis)]]
For reference, I'm on a 2017 27-inch iMac Retina 5K with 4.2 GHz i7, 32 GB RAM, with a Radeon Pro 580 8 GB.

There are some more tutorials on Machine Learning Mastery for what you want to accomplish
https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
https://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/
And I'll give my own quick explanation of what you probably want to do.
Right now it looks like you are using the exact same data for the X and y inputs into your model. The y inputs are the labels which in your case is "who the sensors were tracking". So in the binary case of having 2 possible people it is set to 0 for the first person and 1 for the second person.
The sigmoid activation on the final layer will output a number between 0 and 1. If the number is bellow 0.5 then it is predicting that the sensor is tracking person 0 and if it above 0.5 then it is predicting person 1. This will be represented in the accuracy score.
You will probably not want to use an embedding layer, its possible that you might but I would drop it to start with. Normalize your data though before feeding it into the net to improve training. Scikit-Learn has good tools for this if you want a quick solution.
http://scikit-learn.org/stable/modules/preprocessing.html
When working with time series data you often want to feed in a window of time points rather than a single point. If you send your time series to Keras model.fit() then it will use a single point as input.
In order to have a time window as input you need to reorganize each example in the data set to be a whole window, or you can use a generator if that will take up to much memory. This is described in the Machine Learning Mastery pages that I linked.
Keras has a generator that you can use called TimeseriesGenerator
from keras.preprocessing.sequence import TimeseriesGenerator
timeseries_generator = TimeseriesGenerator(data, targets, length, sampling_rate)
where data is your time series of features and targets is your time series of labels.
If you use the timeseries generator then when fitting you will have to use fit_generator
model.fit_generator(timeseries_generator)
same with evaluating using evaluate_generator()
If you have your data set up correctly then your model should work
model = Sequential()
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))
you could also try a simpler dense model
model = Sequential()
model.add(Flatten())
model.add(Dense(64, dropout=0.2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
One more issue I see is that it appears you would be splitting off a test set that contains only one type of label which is not only bad practice but will also weight your training set towards the other label which might hurt your results.
Hopefully that gets you started. Make sure you get your data set up correctly!

Keras batch training online predicting not learning

I have been working on a side project trying to learn machine learning with Keras myself and I think I am stuck here.
My intention is to predict the bike availability of a public sharing system that has 31 stations. For now I am only training my model to predict the availability of one station only. I'd like to do online predictions with batch training. I'd like to start giving it a number of bikes at, for example, 00:00 with N given time steps plus the day of the year and weekday.
The input data is this:
Day of the year, encoded as ints, 1-JAN is 0, 2-JAN is 1...
Time in 5' intervals encoded as ints the same way as before, 00:00 is 0, 00:05 is 1...
Weekday, again encoded as int
Those 3 columns are then normalized, then i add the columns that refer to the bikes, they are one hot encode, if the station has 20 bikes the encoded array will have length 21. The data is then transformed to a supervised problem more or less following this tutorial.
Now I divide my dataset into training (65%) and test (35%) samples. And then define the neural network as this:
model = Sequential()
model.add(LSTM(lstm_neurons, batch_input_shape=(1000, 5, 24), stateful=False))
model.add(Dense(max_cases, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics = ['accuracy', 'mse', 'mae'])
Fit the model
for i in range(epochs):
model.fit(train_x, train_y, epochs=1, batch_size=new_batch_size, verbose=2, shuffle=False)
model.reset_states()
w = model.get_weights()
Accuracy plot looks good but the loss one does weird things.
Once the training finishes I predict values, I change from stateless to stateful and modify the batch size
model = Sequential()
model.add(LSTM(lstm_neurons, batch_input_shape=(1, 5, 24), stateful=True))
model.add(Dense(max_cases, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics = ['accuracy', 'mse', 'mae'])
model.set_weights(w)
I now predict using the test values I got from before
for i in range(0, len(test_x)):
auxx = test_x[i].reshape((batch_size, test_x[i].shape[0], test_x[i].shape[1])) # (...,n_in,4)
yhat = model.predict(auxx, batch_size = batch_size)
This is the result, I am zooming it a bit to get a closer look and not a crowded plot. It doesn't look bad at all, it has some errors but overall the predictions looks good enough.
After this I create my set of data to do the online prediction and predict
for i in range(0,290):
# ...
predicted_bikes = model.predict(data_to_feed, batch_size = 1)
# ...
The result is this one, a continuous line.
As I've seen in the previous plot the predicted value is moved like an interval later to the real value which makes me think that the neural network has learnt to repeat the previous values. That's why here I got a straight line.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.