Keras batch training online predicting not learning

Keras batch training online predicting not learning - python

I have been working on a side project trying to learn machine learning with Keras myself and I think I am stuck here.
My intention is to predict the bike availability of a public sharing system that has 31 stations. For now I am only training my model to predict the availability of one station only. I'd like to do online predictions with batch training. I'd like to start giving it a number of bikes at, for example, 00:00 with N given time steps plus the day of the year and weekday.
The input data is this:
Day of the year, encoded as ints, 1-JAN is 0, 2-JAN is 1...
Time in 5' intervals encoded as ints the same way as before, 00:00 is 0, 00:05 is 1...
Weekday, again encoded as int
Those 3 columns are then normalized, then i add the columns that refer to the bikes, they are one hot encode, if the station has 20 bikes the encoded array will have length 21. The data is then transformed to a supervised problem more or less following this tutorial.
Now I divide my dataset into training (65%) and test (35%) samples. And then define the neural network as this:
model = Sequential()
model.add(LSTM(lstm_neurons, batch_input_shape=(1000, 5, 24), stateful=False))
model.add(Dense(max_cases, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics = ['accuracy', 'mse', 'mae'])
Fit the model
for i in range(epochs):
model.fit(train_x, train_y, epochs=1, batch_size=new_batch_size, verbose=2, shuffle=False)
model.reset_states()
w = model.get_weights()
Accuracy plot looks good but the loss one does weird things.
Once the training finishes I predict values, I change from stateless to stateful and modify the batch size
model = Sequential()
model.add(LSTM(lstm_neurons, batch_input_shape=(1, 5, 24), stateful=True))
model.add(Dense(max_cases, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics = ['accuracy', 'mse', 'mae'])
model.set_weights(w)
I now predict using the test values I got from before
for i in range(0, len(test_x)):
auxx = test_x[i].reshape((batch_size, test_x[i].shape[0], test_x[i].shape[1])) # (...,n_in,4)
yhat = model.predict(auxx, batch_size = batch_size)
This is the result, I am zooming it a bit to get a closer look and not a crowded plot. It doesn't look bad at all, it has some errors but overall the predictions looks good enough.
After this I create my set of data to do the online prediction and predict
for i in range(0,290):
# ...
predicted_bikes = model.predict(data_to_feed, batch_size = 1)
# ...
The result is this one, a continuous line.
As I've seen in the previous plot the predicted value is moved like an interval later to the real value which makes me think that the neural network has learnt to repeat the previous values. That's why here I got a straight line.

Related

Keras LSTM neural network for Time Series Predictions shows nan during model fit

I am training a neural network to predict a whole day of availability (144 samples, 6 features) by passing yesterday's availability (144 samples). I'm having trouble finding good resources or explanations on how to define a neural network to predict time series in a regression problem. The training is defined as a supervised learning problem. My definition of the neural network is,
lstm_neurons = 30
model = Sequential()
model.add(LSTM(lstm_neurons * 2, input_shape=(self.train_x.shape[1], sel f.train_x.shape[2]), return_sequences=True))
model.add(LSTM(lstm_neurons * 2))
model.add(Dense(len_day, activation='softmax'))
model.compile(loss='mean_squared_error', optimizer='adam', metrics = [rm se, 'mae', 'mape'])
I am training for 20 epochs with a batch size of 200 where the used datasets have the following shapes,
Train X (9631, 144, 6)
Train Y (9631, 144)
Test X (137, 144, 6)
Test Y (137, 144)
Validation X (3990, 144, 6)
Validation Y (3990, 144)
All of this produces nan values during training for loss, rmse, mae... While this looks like it's a problem I can use the generated model to generate predictions and they look good-ish.

The first question to ask - are you trying to predict a time series based on interpreting availability as a probability measure?
The softmax activation function would work best under this scenario - but you may be misspecifying it when you are in fact attempting to forecast an interval time series - hence why you are obtaining NaN readings for your results.
This example might be of use to you - LSTM is used to this example to forecast weekly fluctuations in hotel cancellations.
Similarly to your example, X_train and X_val are reshaped as samples, time steps, features:
X_train = np.reshape(X_train, (X_train.shape[0], 1, X_train.shape[1]))
X_val = np.reshape(X_val, (X_val.shape[0], 1, X_val.shape[1]))
The LSTM network is defined as follows:
# Generate LSTM network
model = tf.keras.Sequential()
model.add(LSTM(4, input_shape=(1, previous)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X_train, Y_train, epochs=20, batch_size=1, verbose=2)
As you can see, the mean squared error is used as the loss function since the cancellation variable in question is interval (i.e. can take on a wide range of values and is not necessarily restricted by any particular scale).
I can only speculate as I have not seen your data or results, but you may be going wrong by defining softmax as your activation function when it is not appropriate - I suspect this is the case as you are also using mean squared error as the loss measurement.
In the above example, the Dense layer does not specify an activation function per se.
In terms of how you might choose to validate whether your time series forecast with LSTM is effective, a potentially good idea is to compare the findings to that of a simpler time series model; e.g. ARIMA.
Using our example, ARIMA performed better when forecasting for Hotel 1, but LSTM performed better when forecasting for Hotel 2:
H1 Results
Reading ARIMA LSTM
MDA 0.86 0.8
RMSE 57.95 63.89
MFE -12.72 -54.25
H2 Results
Reading ARIMA LSTM
MDA 0.86 0.8
RMSE 274.07 95.28
MFE 156.32 38.65
Finally, when creating your datasets using the train and validation sets, you must also ensure that you are using the correct previous parameter, i.e. the number of time periods going back with which you choose to regress against the observations at time t.
For instance, you are using yesterday's availability - but you might find that the model is improved using the previous 5 or 10 days, for instance.
# Number of previous
previous = 5
X_train, Y_train = create_dataset(train, previous)
X_val, Y_val = create_dataset(val, previous)
In your situation, the first thing I would check is the use of the softmax activation function, and work from there.

My Neural Network Model for predicting soccer outcomes doesn't increase to more than 50%

I am building a model to predict soccer outcomes, I've tried a lot of different models, such as Neural Network in Matlab, Deep Networks using keras in Python and LSTM. My accuracy doesn't increase more than 50%
I have collected 24000 matches from 13 different leagues that have the same aspect (win by points). My data is consisted by:
The week of the league that is ocurring that match
Ratings of all the 22 titular players
Goals Average
Goals Against Average
Shots_H, Shots_A,
ShotsOnTarget_H, ShotsOnTarget_A,
PassSuccess_H, PassSuccess_A,
AerialDuelSuccess_H, AerialDuelSuccess_A,
DribblesWon_H, DribblesWon_A,
Tackles_H, Tackles_A,
Posesion_H, Posesion_A,
Rating_H, Rating_A,
DribbleSucces_H, DribbleSucces_A,
TacklesSucces_H, TacklesSucces_A,
Interceptions_H, Interceptions_A,
Dispossesse_H, Dispossesse_A,
Corners_H, Corners_A,
I have made a mean weighted average of the last 5 matches from all that data to use as inputs in a neural network model playing home, and playing away. So, if a home team are going to play a match, the data will be the mean weighted average from the last 5 matches playing in the home field (the last one have more weight than the second last one, and so on). The output is one hot encoded: (1 0 0) for win home, (0 1 0) draw...
I've tried several models, and now I'm trying Keras Dense model.
I am scaling the input data using:
sc = StandardScaler()
entrada_training = sc.fit_transform(entrada_training)
entrada_validation = sc.transform (entrada_validation)
entrada_teste = sc.transform (entrada_teste)
and the model is:
drop=0.3
model = Sequential()
model.add(Dense(40, input_shape=(57,), activation='relu'))
#model.add(layers.BatchNormalization())
model.add(Dropout(drop))
model.add(Dense(20, activation='relu'))
model.add(Dropout(drop))
#model.add(layers.BatchNormalization())
model.add(Dense(10, activation='relu'))
model.add(Dropout(drop))
#model.add(layers.BatchNormalization())
model.add(Dense(3, activation='softmax'))
adam = optimizers.Adam(lr=0.0001)
mcp_save = ModelCheckpoint('.mdl_wts.hdf5', save_best_only=True, monitor='val_loss', mode='min')
reduce_lr = keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.2,
patience=5, min_lr=0.0001)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])
history = model.fit(entrada_training, saida_training,
batch_size=28,
epochs=100,
verbose=2,
callbacks=[mcp_save, reduce_lr],
# class_weight=d_class_weights,
validation_data=(entrada_validation, saida_validation))
The maximum result i could achieve was 50% accuracy on test set, but with the confusion matrix i could see that the model was not predicting any draws, so I add a weight to classes:
from sklearn.utils.class_weight import compute_class_weight
y_integers = np.argmax(saida_training, axis=1)
class_weights = compute_class_weight('balanced', np.unique(y_integers), y_integers)
d_class_weights = dict(enumerate(class_weights))
And achieved 46% of accuracy but with a better confusion matrix.
I really don't know what to do, I've tried a lot of different models, activation functions, and methods but the accuracy doesn't seem to change too much.
Edit: I did the correlation matrix and saw that the maximum correlation with the result is no higher than 0.2. And I saw that 6 variables are correlated with each other with 0.87 correlation coeficient. I have deleted the variables with correlation less than 0.05 with the result. From 57 variables, I have now 44, and still got the same accuracy of 45%.

Keras RNN (GRU, LSTM) produces plateau and then improvement

I'm new to Keras (with TensorFlow backend) and am using it to do some simple sentiment analysis on user reviews. For some reason, my recurrent neural network is producing some unusual results that I do not understand.
First, my data is a straight-forward sentiment analysis training and test set from the UCI ML archive. There were 2061 training instances, which is small. The data looks like this:
text label
0 So there is no way for me to plug it in here i... 0
1 Good case, Excellent value. 1
2 Great for the jawbone. 1
3 Tied to charger for conversations lasting more... 0
4 The mic is great. 1
Second, here is a FFNN implementation that produces good results.
# FFNN model.
# Build the model.
model_ffnn = Sequential()
model_ffnn.add(layers.Embedding(input_dim=V, output_dim=32))
model_ffnn.add(layers.GlobalMaxPool1D())
model_ffnn.add(layers.Dense(10, activation='relu'))
model_ffnn.add(layers.Dense(1, activation='sigmoid'))
model_ffnn.summary()
# Compile and train.
model_ffnn.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
EPOCHS = 50
history_ffnn = model_ffnn.fit(x_train, y_train, epochs=EPOCHS,
batch_size=128, validation_split=0.2, verbose=3)
As you can see, the learning curves produce a smooth improvement as the number of epochs increases.
Third, here is the problem. I trained a recurrent neural network with a GRU, as shown below. I also tried an LSTM and saw the same results.
# GRU model.
# Build the model.
model_gru = Sequential()
model_gru.add(layers.Embedding(input_dim=V, output_dim=32))
model_gru.add(layers.GRU(units=32))
model_gru.add(layers.Dense(units=1, activation='sigmoid'))
model_gru.summary()
# Compile and train.
model_gru.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
EPOCHS = 50
history_gru = model_gru.fit(x_train, y_train, epochs=EPOCHS,
batch_size=128, validation_split=0.2, verbose=3)
However, the learning curves are quite unusual. You can see a plateau where neither the loss nor the accuracy improve up to about epoch 17, and then the model starts learning and improving. I have never seen this type of plateau at the start of training before.
Can anyone explain why this plateau is occurring, why it stops and gives way to gradual learning, and how I can avoid it?

Following the comment by #Gerges Dib, I tried out different learning rates in increasing order.
lr = 0.0001
lr = 0.001 (the default learning rate for RMSprop)
lr = 0.01
lr = 0.05
lr = 0.1
This is very interesting. It looks like the plateau was caused by the optimizer's learning rate being too low. The parameters were stuck in a local optima until it could break out. I have not seen this pattern before.

InvalidArgumentError with RNN/LSTM in Keras

I'm throwing myself into machine learning, and wish to use Keras for a university project that's time-critical. I realise it would be best to learn individual concepts and building blocks, but it's important that this is done soon.
I'm working with someone who has some experience and interest in machine learning, but we cannot seem to get further than this. The below code was adapted from GitHub code mentioned in a guide in Machine Learning Mastery.
For context, I've got data from multiple physical sensors (where each sensor is a column), with each sample from those sensors represented by one row. I wish to use machine learning to determine who the sensors were tracking at any given time. I'm trying to allocate approximately 80% of the rows to training and 20% to testing, and am creating my own "y" set of data (with the first 521,549 rows being from one participant, and the remainder from another). My data (training and test) has a total of 1,019,802 rows, and 16 columns (all populated), but the number of columns can be reduced if need be.
I would love to know the following:
What does this error mean in the context of what I'm trying to achieve, and how can I change my code to avoid it?
Is the below code suitable for what I'm trying to achieve?
Does this code represent any specific fundamental flaw in my understanding of what machine learning (generally or specifically) is designed to achieve?
Below is the Python code I'm trying to run to make use of machine learning:
x_all = pd.read_csv("(redacted)...csv",
delim_whitespace=True, header=None, low_memory=False).values
y_all = np.append(np.full((521549,1), 0), np.full((498253,1),1))
limit = 815842
x_train = x_all[:limit]
y_train = y_all[:limit]
x_test = x_all[limit:]
y_test = y_all[limit:]
max_features = 16
maxlen = 80
batch_size = 32
model = Sequential()
model.add(Embedding(500, 32, input_length=max_features))
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=15,
validation_data=(x_test, y_test))
score, acc = model.evaluate(x_test, y_test,
batch_size=batch_size)
This is an excerpt from the CSV referenced in the code:
6698.486328125 4.28260869565217 4.6304347826087 10.6195652173913 2.4392579293836 2.56134051466188 9.05326152004788 0.0 1.0812 924.898261191267 -1.55725190839695 -0.244274809160305 0.320610687022901 -0.122938530734633 0.490254872563718 0.382308845577211
6706.298828125 4.28260869565217 4.58695652173913 10.5978260869565 2.4655894673848 2.50867743865949 9.04368641532017 0.0 1.0812 924.898261191267 -1.64885496183206 -0.366412213740458 0.381679389312977 -0.122938530734633 0.490254872563718 0.382308845577211
6714.111328125 4.26086956521739 4.64130434782609 10.5978260869565 2.45601436265709 2.57809694793537 9.03411131059246 0.0 1.0812 924.898261191267 -0.931297709923664 -0.320610687022901 0.320610687022901 -0.125937031484258 0.493253373313343 0.371814092953523
The following error occurs when running this:
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,0] = 972190 is not in [0, 500)
[[Node: embedding_1/embedding_lookup = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_FLOAT, _class=["loc:#training/Adam/Assign_2"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](embedding_1/embeddings/read, embedding_1/Cast, training/Adam/gradients/embedding_1/embedding_lookup_grad/concat/axis)]]
For reference, I'm on a 2017 27-inch iMac Retina 5K with 4.2 GHz i7, 32 GB RAM, with a Radeon Pro 580 8 GB.

There are some more tutorials on Machine Learning Mastery for what you want to accomplish
https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/
https://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/
And I'll give my own quick explanation of what you probably want to do.
Right now it looks like you are using the exact same data for the X and y inputs into your model. The y inputs are the labels which in your case is "who the sensors were tracking". So in the binary case of having 2 possible people it is set to 0 for the first person and 1 for the second person.
The sigmoid activation on the final layer will output a number between 0 and 1. If the number is bellow 0.5 then it is predicting that the sensor is tracking person 0 and if it above 0.5 then it is predicting person 1. This will be represented in the accuracy score.
You will probably not want to use an embedding layer, its possible that you might but I would drop it to start with. Normalize your data though before feeding it into the net to improve training. Scikit-Learn has good tools for this if you want a quick solution.
http://scikit-learn.org/stable/modules/preprocessing.html
When working with time series data you often want to feed in a window of time points rather than a single point. If you send your time series to Keras model.fit() then it will use a single point as input.
In order to have a time window as input you need to reorganize each example in the data set to be a whole window, or you can use a generator if that will take up to much memory. This is described in the Machine Learning Mastery pages that I linked.
Keras has a generator that you can use called TimeseriesGenerator
from keras.preprocessing.sequence import TimeseriesGenerator
timeseries_generator = TimeseriesGenerator(data, targets, length, sampling_rate)
where data is your time series of features and targets is your time series of labels.
If you use the timeseries generator then when fitting you will have to use fit_generator
model.fit_generator(timeseries_generator)
same with evaluating using evaluate_generator()
If you have your data set up correctly then your model should work
model = Sequential()
model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))
you could also try a simpler dense model
model = Sequential()
model.add(Flatten())
model.add(Dense(64, dropout=0.2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
One more issue I see is that it appears you would be splitting off a test set that contains only one type of label which is not only bad practice but will also weight your training set towards the other label which might hurt your results.
Hopefully that gets you started. Make sure you get your data set up correctly!

Predicting Past End of Dataset with RNN in Keras

I have a dataset spanning hundreds of values regarding temperature. Obviously, in meteorology, it is helpful to predict what future values will be based on the past.
I have the following stateful model, built in Keras:
look_back = 1
model.add(LSTM(32, batch_input_shape=(batch_size, look_back, 1), stateful=True))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
for i in range(10):
model.fit(trainX, trainY, epochs=4, batch_size=batch_size, verbose=2, shuffle=False)
model.reset_states()
# make predictions
trainPredict = model.predict(trainX, batch_size=batch_size)
I have successfully been able to train and test the model on my dataset to reasonable results, however am struggling to comprehend what is required to predict the next, say, 20 points in the dataset. Obviously, these 20 points are outside of the dataset, and they have yet to "occur".
I would appreciate anything that would be of help; I feel like I am missing some simple functionality in Keras.
Thank you.

I feel like I am missing some simple functionality in Keras.
You have all you need right there. To obtain predictions on new data you have to use model.predict() again, but on the desired range. This depends on how your data looks.
Lets assume your timeseries trainX had events with x ranging from [0,100].
Then to predict the next 20 events you want to call predict() on values 101 to 120, something like:
futureData = np.array(range(101,121)) #[101,102,...,120]
futurePred = model.predict(futureData)
Again, this depends on how your "next 20" events look. If you bin size were instead 0.1 (100, 100.1, 100.2,...) you should evaluate the prediction accordingly.
You may also like to check this page where they give examples and explain more about Timeseries in Keras with RNNs, if you are interested.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.