Predict daily value with LSTM which has hourly timeseries as input - python

I am training a single layer LSTM that is coded as follows:
model = keras.Sequential()
model.add(keras.layers.LSTM(units=64,
input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(keras.layers.Dense(units=1, activation='sigmoid'))
model.compile(
loss='binary_crossentropy',
optimizer = keras.optimizers.Adam(lr=0.0001),
metrics=['acc']
)
The input to my LSTM is a hourly timeseries. I want to predict values at a daily level based on the hourly series.
Currently what I do is generate hourly predictions and then take the first prediction as the prediction for each day. However I wanted to know if there is a way to generate the same prediction at a daily level.
Thank you!

You have two options as my opinion.
Train the model using daily based training data set. When filtering out what the most suitable datapoint for the day is, you can use the datapoint having the greatest number of repeatitions ( mode ) or the mean.
Take the outputs hourly based but forecasting 24 outputs for the next 24 hours and get the mean or mode of those 24 as the prediction for the day.
The best way is probably second one. It would be much accurate.

One option is you can also give an argument called batch_input_shape instead of input_shape. The difference is now you have to give a fixed batch size and your input array shape will look like (24, X_train.shape[1], X_train.shape[2]).
You also have an option to set another argument return_sequences. This argument tells whether to return the output at each time step instead of the final time step. As we set the return_sequences to True, the output shape becomes a 3D array, instead of a 2D array.
model = keras.Sequential()
model.add(keras.layers.LSTM(units=64,
batch_input_shape=(24, X_train.shape[1], X_train.shape[2])))
model.add(keras.layers.Dense(units=1, activation='sigmoid'))
model.compile(
loss='binary_crossentropy',
optimizer = keras.optimizers.Adam(lr=0.0001),
metrics=['acc']
)

Related

Why do we reshape data?

I have got a dataset which consists of 100,000 rows and 12 columns, where each column stands of a certain input to train a sequential GRU model to predict only 1 output. The following is the code for the model:
model = Sequential()
model.add(GRU(units=70, return_sequences=True, input_shape=(1,12),activity_regularizer=regularizers.l2(0.0001)))
model.add(GRU(units=50, return_sequences=True,dropout=0.1))
model.add(GRU(units=30, dropout=0.1))
model.add(Dense(units=5))
model.add(Dense(units=3))
model.add(Dense(units=1, activation='relu'))
model.compile(loss=['mae'], optimizer=Adam(lr=0.0001),metrics=['mse'])
model.summary()
history=model.fit(X_train, y_train, batch_size=1000,epochs=30,validation_split=0.1, verbose=1)
However, before that I had to transform the training dataset from 2D to 3D using x_train=x.reshape(-1,1,12) and the output from 1D to 2D using y_train=y.reshape(-1,1). That is the part I really don't understand, why not just keep them as they are?
You had to describe your data in order to be decisive.
But since each layers output is the input of the next layer, their shape must be equal. In the incomplete example you gave your labels need to be a single value for each sample and I think that's why reshape was used.

Comprehending which inputs have the highest weight in a neuronal network

I am currently working on a Supervised Machine Learning Solution to categorize some data into two classes.
So far I have worked on a keras/tensorflow Python Scipt which seems to manage that just fine:
input_dim = len(data.columns) - 1
print(input_dim)
model = Sequential()
model.add(Dense(8, input_dim=input_dim, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(2, activation='softmax'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(train_x, train_y, validation_split=0.33, epochs=1500, batch_size=1000, verbose=1)
The input Data I use is a csv data with 168 input features. As I was first running this script successfully I was very surprised to see that I actually got an accuracy of over 99% after only a couple hundred epochs of training. I didn't even bother to normalize the input data yet.
What I am trying to find out now is which of my 168 input features is responsible for such a high accuracy rate and which features dont take much of an effect while training.
Is there a way to check the weights of each input column to see which of them is being used most, respectively which make the most impact.
Answering your last question:
model.layers[0].get_weights()
However, unless there is an obviously dominating weight, it is unlikely that a single sample gives you good accuracy. For feature selection, try replacing some features of your input by their mean and check how the prediction fluctuates. Little-to-no fluctuation means that the feature is not important.
Also, please consider posting ML questions on https://datascience.stackexchange.com/
There is going to be a connection from each 'column' to each neuron in first layer. You could go two ways (apart from randomizing or dropping (equivalent to replacing with mean as suggested in the answer above) the columns values) about finding the relative importance of columns using the weights. Please keep in mind that these methods make sense only if you input standardized dataset
You could use L1 or L2 norm of each columns weight in the first layer
Say your input has 100 columns. You create a layer that dot products the input with a tensor (trainable) of size (100,). Now, you input the output of this layer to your sequential model. Your trained (100,) tensor is the relative importance of your columns

Keras LSTM neural network for Time Series Predictions shows nan during model fit

I am training a neural network to predict a whole day of availability (144 samples, 6 features) by passing yesterday's availability (144 samples). I'm having trouble finding good resources or explanations on how to define a neural network to predict time series in a regression problem. The training is defined as a supervised learning problem. My definition of the neural network is,
lstm_neurons = 30
model = Sequential()
model.add(LSTM(lstm_neurons * 2, input_shape=(self.train_x.shape[1], sel f.train_x.shape[2]), return_sequences=True))
model.add(LSTM(lstm_neurons * 2))
model.add(Dense(len_day, activation='softmax'))
model.compile(loss='mean_squared_error', optimizer='adam', metrics = [rm se, 'mae', 'mape'])
I am training for 20 epochs with a batch size of 200 where the used datasets have the following shapes,
Train X (9631, 144, 6)
Train Y (9631, 144)
Test X (137, 144, 6)
Test Y (137, 144)
Validation X (3990, 144, 6)
Validation Y (3990, 144)
All of this produces nan values during training for loss, rmse, mae... While this looks like it's a problem I can use the generated model to generate predictions and they look good-ish.
The first question to ask - are you trying to predict a time series based on interpreting availability as a probability measure?
The softmax activation function would work best under this scenario - but you may be misspecifying it when you are in fact attempting to forecast an interval time series - hence why you are obtaining NaN readings for your results.
This example might be of use to you - LSTM is used to this example to forecast weekly fluctuations in hotel cancellations.
Similarly to your example, X_train and X_val are reshaped as samples, time steps, features:
X_train = np.reshape(X_train, (X_train.shape[0], 1, X_train.shape[1]))
X_val = np.reshape(X_val, (X_val.shape[0], 1, X_val.shape[1]))
The LSTM network is defined as follows:
# Generate LSTM network
model = tf.keras.Sequential()
model.add(LSTM(4, input_shape=(1, previous)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X_train, Y_train, epochs=20, batch_size=1, verbose=2)
As you can see, the mean squared error is used as the loss function since the cancellation variable in question is interval (i.e. can take on a wide range of values and is not necessarily restricted by any particular scale).
I can only speculate as I have not seen your data or results, but you may be going wrong by defining softmax as your activation function when it is not appropriate - I suspect this is the case as you are also using mean squared error as the loss measurement.
In the above example, the Dense layer does not specify an activation function per se.
In terms of how you might choose to validate whether your time series forecast with LSTM is effective, a potentially good idea is to compare the findings to that of a simpler time series model; e.g. ARIMA.
Using our example, ARIMA performed better when forecasting for Hotel 1, but LSTM performed better when forecasting for Hotel 2:
H1 Results
Reading ARIMA LSTM
MDA 0.86 0.8
RMSE 57.95 63.89
MFE -12.72 -54.25
H2 Results
Reading ARIMA LSTM
MDA 0.86 0.8
RMSE 274.07 95.28
MFE 156.32 38.65
Finally, when creating your datasets using the train and validation sets, you must also ensure that you are using the correct previous parameter, i.e. the number of time periods going back with which you choose to regress against the observations at time t.
For instance, you are using yesterday's availability - but you might find that the model is improved using the previous 5 or 10 days, for instance.
# Number of previous
previous = 5
X_train, Y_train = create_dataset(train, previous)
X_val, Y_val = create_dataset(val, previous)
In your situation, the first thing I would check is the use of the softmax activation function, and work from there.

Keras batch training online predicting not learning

I have been working on a side project trying to learn machine learning with Keras myself and I think I am stuck here.
My intention is to predict the bike availability of a public sharing system that has 31 stations. For now I am only training my model to predict the availability of one station only. I'd like to do online predictions with batch training. I'd like to start giving it a number of bikes at, for example, 00:00 with N given time steps plus the day of the year and weekday.
The input data is this:
Day of the year, encoded as ints, 1-JAN is 0, 2-JAN is 1...
Time in 5' intervals encoded as ints the same way as before, 00:00 is 0, 00:05 is 1...
Weekday, again encoded as int
Those 3 columns are then normalized, then i add the columns that refer to the bikes, they are one hot encode, if the station has 20 bikes the encoded array will have length 21. The data is then transformed to a supervised problem more or less following this tutorial.
Now I divide my dataset into training (65%) and test (35%) samples. And then define the neural network as this:
model = Sequential()
model.add(LSTM(lstm_neurons, batch_input_shape=(1000, 5, 24), stateful=False))
model.add(Dense(max_cases, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics = ['accuracy', 'mse', 'mae'])
Fit the model
for i in range(epochs):
model.fit(train_x, train_y, epochs=1, batch_size=new_batch_size, verbose=2, shuffle=False)
model.reset_states()
w = model.get_weights()
Accuracy plot looks good but the loss one does weird things.
Once the training finishes I predict values, I change from stateless to stateful and modify the batch size
model = Sequential()
model.add(LSTM(lstm_neurons, batch_input_shape=(1, 5, 24), stateful=True))
model.add(Dense(max_cases, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics = ['accuracy', 'mse', 'mae'])
model.set_weights(w)
I now predict using the test values I got from before
for i in range(0, len(test_x)):
auxx = test_x[i].reshape((batch_size, test_x[i].shape[0], test_x[i].shape[1])) # (...,n_in,4)
yhat = model.predict(auxx, batch_size = batch_size)
This is the result, I am zooming it a bit to get a closer look and not a crowded plot. It doesn't look bad at all, it has some errors but overall the predictions looks good enough.
After this I create my set of data to do the online prediction and predict
for i in range(0,290):
# ...
predicted_bikes = model.predict(data_to_feed, batch_size = 1)
# ...
The result is this one, a continuous line.
As I've seen in the previous plot the predicted value is moved like an interval later to the real value which makes me think that the neural network has learnt to repeat the previous values. That's why here I got a straight line.

Keras: masking zero-padded input for non-RNN

A colleague of mine pointed out the very cool option to use sample_weight instead of a masking layer when you need to mask input to a non-RNN in Keras.
In my case, I have 62 columns in the input, with the 63rd being the response. Over 97% of the nonzero entries in the 62 columns are contained in the first 30 columns. I'm trying to just get this working, so I'd like to weight the last 32 columns to be 0 in training, essentially creating a 'poor-man's mask'.
This is an 8-class classification task, using an MLP. The response variable has been transformed using the to_categorical() function in Keras.
Here's the implementation:
model = Sequential()
model.add(Dense(100, input_dim=X.shape[1], init='uniform', activation='relu'))
model.add(Dense(8, init='uniform', activation='sigmoid'))
hist = model.fit(X, y,
validation_data=(X_test, ytest),
nb_epoch=epochs_,
batch_size=batch_size_,
callbacks=callbacks_list,
sample_weight = np.array([X.shape[1]-32, 30]))
I'm getting this error:
in standardize_weights
assert y.shape[:sample_weight.ndim] == sample_weight.shape
How can I fix my sample_weight to 'mask' the first 32 columns of the input?
Sample weight isn't working like that:
sample_weight: optional array of the same length as x, containing weights to apply to the model's loss for each sample. In the case of temporal data, you can pass a 2D array with shape (samples, sequence_length), to apply a different weight to every timestep of every sample. In this case you should make sure to specify sample_weight_mode="temporal" in compile(). source
In other words, this setting puts different weights on the samples of the training data, not on the features of each sample. This is used only at training step.
I think you should use masking if you don't want the layer to use those features. Or just remove them from your dataset? Or, if it's not too complicated, let the network learn by itself which the useful features are.
Does this help?

Categories

Resources