I nearly found what I need in the accepted answer here. But had memory issues because the test df provided was only 11 rows.
What I'm trying to predict is using LSTM to forecast 10 days ahead of a Time Series data in a regression model (not classifier!). My dataframe X has around 1500 rows and 2000 features, being of shape (1500, 2000) while the truth values y are just 1500 rows of 1 feature (that can range any value between -1 and 1).
Since LSTM needs 3D vector as an input, I'm really struggling how to reshape the data.
Again, following the example at first paragraph, it crashes for MemoryError when padding values, more specifically at df.cumulative_input_vectors.tolist().
My test (read forecast) is a dataframe of shape (10, 2000).
Due to sensitive data I can't actually share the values/example. How can I help you help me with this?
So, to enable the LSTM to learn from the 1500 rows of y, how should I reshape my x of 1500 rows and 2000 features? Also, how should I reshape my forecast of 10 rows and 2000 features?
They'll undergo -at first because I'm learning LSTM- a simple LSTM model of:
model = Sequential()
model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(train_X, train_y , epochs=50, batch_size=2, verbose=1)
what I've tried, but when predictin got error:
# A function to make a 3d data of what I understood needed done:
def preprocess_data(stock, seq_len):
amount_of_features = len(stock.columns)
data = stock.values
sequence_length = seq_len #+ 1
result = []
for index in range(len(data) - sequence_length):
result.append(data[index : index + sequence_length])
X_train = np.array(result)
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], amount_of_features))
return X_train
# creating the train as:
# X == the DF of 1500 rows and 2000 features
window = 10
train_X = preprocess_data(X[::-1], window)
After a while I've managed to understand properly what the dimensions where. Keras expect a 3d array of .shape (totalRows, sequences, totalColumns). The sequences one was the most confusing to me.
That was because when reshaping the df df.reshape(len(df), 1, len(df.columns)) meaning keras would learn for a matrix of 1 line it gave me bad results because I didn't know it's best to scale the data for me MinMaxScaler(-1,1) worked best, but could be (0,1).
What made me understand that was to first use sequence of more than 1 row (or days, since my dataset was a Time Series). Meaning that instead of feeding 1 row of features X results in 1 value of y, I used something like 5 rows of features X results in 1 value of y. as in:
# after scaling the df, resulted in "scaled_dataset"
sequences = 5
result = []
# for loop will walk for each of the 1500 rows
for i in range(0,len(scaled_dataset)):
# every group must have the same length, so if current loop position i + number
# of sequences is higher than df length, breaks
if i+sequences <= len(scaled_dataset):
# this will add into the list as [[R1a,R1b...R1t],[R2a,R2b...R2t],...[R5a,R5b...R5t]]
result.append(scaled_dataset[i:i+sequences].values)
# Converting to array + keras takes float32 better than 64
train_x = np.array(result).astype('float32')
# making the y into same length as X
train_y = np.array(y.tail(train_x.shape[0]).values)
train_x.shape, train_y.shape
'>>> (1495, 5, 2400), (1495,)
Writen in another way the mentality on keras shapes for my problem:
Considering it's a time series, above means that 5 days (rows 0 to 4) of data results in the value y of row 5.
Then, less the first day + the next day after last - still 5 days - (rows 1 to 5) of data results in the value y of row 6.
Then, less the second day + the next day after last - still 5 days - (rows 2 to 6) of data results in the value y of row 7.
It's quite confusing for starters of keras/LSTM, but I hope I could details this for people who might land here.
Related
I would like to use a RNN for time series prediction to use 96 backwards steps to predict 96 steps into the future. For this I have the following code:
#Import modules
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.preprocessing import StandardScaler
from tensorflow import keras
# Define the parameters of the RNN and the training
epochs = 1
batch_size = 50
steps_backwards = 96
steps_forward = 96
split_fraction_trainingData = 0.70
split_fraction_validatinData = 0.90
randomSeedNumber = 50
helpValueStrides = int(steps_backwards /steps_forward)
#Read dataset
df = pd.read_csv('C:/Users1/Desktop/TestValues.csv', sep=';', header=0, low_memory=False, infer_datetime_format=True, parse_dates={'datetime':[0]}, index_col=['datetime'])
# standardize data
data = df.values
indexWithYLabelsInData = 0
data_X = data[:, 0:3]
data_Y = data[:, indexWithYLabelsInData].reshape(-1, 1)
scaler_standardized_X = StandardScaler()
data_X = scaler_standardized_X.fit_transform(data_X)
data_X = pd.DataFrame(data_X)
scaler_standardized_Y = StandardScaler()
data_Y = scaler_standardized_Y.fit_transform(data_Y)
data_Y = pd.DataFrame(data_Y)
# Prepare the input data for the RNN
series_reshaped_X = np.array([data_X[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])
series_reshaped_Y = np.array([data_Y[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])
timeslot_x_train_end = int(len(series_reshaped_X)* split_fraction_trainingData)
timeslot_x_valid_end = int(len(series_reshaped_X)* split_fraction_validatinData)
X_train = series_reshaped_X[:timeslot_x_train_end, :steps_backwards]
X_valid = series_reshaped_X[timeslot_x_train_end:timeslot_x_valid_end, :steps_backwards]
X_test = series_reshaped_X[timeslot_x_valid_end:, :steps_backwards]
Y_train = series_reshaped_Y[:timeslot_x_train_end, steps_backwards:]
Y_valid = series_reshaped_Y[timeslot_x_train_end:timeslot_x_valid_end, steps_backwards:]
Y_test = series_reshaped_Y[timeslot_x_valid_end:, steps_backwards:]
# Build the model and train it
np.random.seed(randomSeedNumber)
tf.random.set_seed(randomSeedNumber)
model = keras.models.Sequential([
keras.layers.SimpleRNN(10, return_sequences=True, input_shape=[None, 3]),
keras.layers.SimpleRNN(10, return_sequences=True),
keras.layers.Conv1D(16, helpValueStrides, strides=helpValueStrides),
keras.layers.TimeDistributed(keras.layers.Dense(1))
])
model.compile(loss="mean_squared_error", optimizer="adam", metrics=['mean_absolute_percentage_error'])
history = model.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, validation_data=(X_valid, Y_valid))
#Predict the test data
Y_pred = model.predict(X_test)
prediction_lastValues_list=[]
for i in range (0, len(Y_pred)):
prediction_lastValues_list.append((Y_pred[i][0][1 - 1]))
# Create thw dataframe for the whole data
wholeDataFrameWithPrediciton = pd.DataFrame((X_test[:,1]))
wholeDataFrameWithPrediciton.rename(columns = {indexWithYLabelsInData:'actual'}, inplace = True)
wholeDataFrameWithPrediciton.rename(columns = {1:'Feature 1'}, inplace = True)
wholeDataFrameWithPrediciton.rename(columns = {2:'Feature 2'}, inplace = True)
wholeDataFrameWithPrediciton['predictions'] = prediction_lastValues_list
wholeDataFrameWithPrediciton['difference'] = (wholeDataFrameWithPrediciton['predictions'] - wholeDataFrameWithPrediciton['actual']).abs()
wholeDataFrameWithPrediciton['difference_percentage'] = ((wholeDataFrameWithPrediciton['difference'])/(wholeDataFrameWithPrediciton['actual']))*100
# Inverse the scaling (traInv: transformation inversed)
data_X_traInv = scaler_standardized_X.inverse_transform(data_X)
data_Y_traInv = scaler_standardized_Y.inverse_transform(data_Y)
series_reshaped_X_notTransformed = np.array([data_X_traInv[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])
X_test_notTranformed = series_reshaped_X_notTransformed[timeslot_x_valid_end:, :steps_backwards]
predictions_traInv = scaler_standardized_Y.inverse_transform(wholeDataFrameWithPrediciton['predictions'].values.reshape(-1, 1))
edictions_traInv = wholeDataFrameWithPrediciton['predictions'].values.reshape(-1, 1)
# Create thw dataframe for the inversed transformed data
wholeDataFrameWithPrediciton_traInv = pd.DataFrame((X_test_notTranformed[:,0]))
wholeDataFrameWithPrediciton_traInv.rename(columns = {indexWithYLabelsInData:'actual'}, inplace = True)
wholeDataFrameWithPrediciton_traInv.rename(columns = {1:'Feature 1'}, inplace = True)
wholeDataFrameWithPrediciton_traInv['predictions'] = predictions_traInv
wholeDataFrameWithPrediciton_traInv['difference_absolute'] = (wholeDataFrameWithPrediciton_traInv['predictions'] - wholeDataFrameWithPrediciton_traInv['actual']).abs()
wholeDataFrameWithPrediciton_traInv['difference_percentage'] = ((wholeDataFrameWithPrediciton_traInv['difference_absolute'])/(wholeDataFrameWithPrediciton_traInv['actual']))*100
wholeDataFrameWithPrediciton_traInv['difference'] = (wholeDataFrameWithPrediciton_traInv['predictions'] - wholeDataFrameWithPrediciton_traInv['actual'])
Here you can have some test data (don't care about the actual values as I made them up, just the shape is important) Download test data
How can the output of the Y_pred data be interpreted? Which of those values yields me the predicted values 96 steps into the future? I have attached a screenshot of the 'Y_pred' data. One time with 5 output neurons in the last layer and one time only with 1. Can anyone tell me, how to interpret the 'Y_pred' data meaning what exactly is the RNN predicting? I can use any values in the output (last layer ) of the RNN model. The 'Y_pred' data always has the shape (Batch size of X_test, timesequence, Number of output neurons). My question is targeting at the last dimension. I thought that these might be the features, but this is not true in my case, as I only have 1 output features (you can see that in the shape of the Y_train, Y_test and Y_valid data).
**Reminder **: The bounty is expiring soon and unfortunately I still have not received any answer. So I would like to remind you on the question and the bounty. I'll highly appreciate every comment.
It may be useful to step through the model inputs/outputs in detail.
When using the keras.layers.SimpleRNN layer with return_sequences=True, the output will return a 3-D tensor where the 0th axis is the batch size, the 1st axis is the timestep, and the 2nd axis is the number of hidden units (in the case for both SimpleRNN layers in your model, 10).
The Conv1D layer will produce an output tensor where the last dimension becomes the number of hidden units (in the case for your model, 16), as it's just being convolved with the input.
keras.layers.TimeDistributed, the layer supplied (in the example provided, Dense(1)) will be applied to each timestep in the batch independently. So with 96 timesteps, we have 96 outputs for each record in the batch.
So stepping through your model:
model = keras.models.Sequential([
keras.layers.SimpleRNN(10, return_sequences=True, input_shape=[None, 3]), # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 10)
keras.layers.SimpleRNN(10, return_sequences=True), # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 10)
keras.layers.Conv1D(16, helpValueStrides, strides=helpValueStrides) # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 16),
keras.layers.TimeDistributed(keras.layers.Dense(1)) # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 1)
])
To answer your question, the output tensor from your model contains the predicted values for 96 steps into the future, for each sample. If it's easier to conceptualize, for the case of 1 output, you can apply np.squeeze to the result of model.predict, which will make the output 2-D:
Y_pred = model.predict(X_test) # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 1)
Y_pred_squeezed = np.squeeze(Y_pred) # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS)
In that way, you have a rectangular matrix where each row corresponds to a sample in the batch, and each column i corresponds to the prediction for the timestep i.
In the loop after the prediction step, all the timestep predictions are being discarded except for the first one:
for i in range(0, len(Y_pred)):
prediction_lastValues_list.append((Y_pred[i][0][1 - 1]))
which means the end result is just a list of predictions for the first timestep for each sample in the batch. If you wanted the prediction for the 96th timestep, you could do:
for i in range(0, len(Y_pred)):
prediction_lastValues_list.append((Y_pred[i][-1][1 - 1]))
Notice the -1 instead of 0 for the second bracket, to ensure we grab the last predicted timestep instead of the first.
As a side note, to replicate the results, I had to make one change to your code, specifically when creating series_reshaped_X and series_reshaped_Y. I hit an exception when using np.array to create the array from the list: ValueError: cannot copy sequence with size 192 to array axis with dimension 3 , but looking at what you were doing (joining tensors along a new axis), I changed it to np.stack, which will accomplish the same goal (https://numpy.org/doc/stable/reference/generated/numpy.stack.html):
series_reshaped_X = np.stack([data_X[i:i + (steps_backwards + steps_forward)].copy() for i in
range(len(data) - (steps_backwards + steps_forward))])
series_reshaped_Y = np.stack([data_Y[i:i + (steps_backwards + steps_forward)].copy() for i in
range(len(data) - (steps_backwards + steps_forward))])
Update
"What are those 5 values representing when I only have 1 target feature?"
That's actually just the broadcasting feature of the Tensorflow API (which is also a feature of NumPy). If you perform an arithmetic operation on two tensors with differing shapes, it will try to make them compatible. In this case, if you change the output layer size to be "5" instead of "1" (keras.layers.Dense(5)), the output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 5) instead of (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 1), which just means the output from the convolutional layer is going into 5 neurons instead of 1. When the loss (mean squared error) is computed between the two, the size of the label tensor ((BATCH_SIZE, NUMBER_OF_TIMESTEPS, 1)) is broadcast to the size of the prediction tensor ((BATCH_SIZE, NUMBER_OF_TIMESTEPS, 5)). In this case, the broadcasting is accomplished by replicating the column. For example, if Y_train had [-1.69862224] in the first row for the first timestep, and Y_pred had [-0.6132075 , -0.6621697 , -0.7712653 , -0.60011995, -0.48753992] in the first row for the first timestep, to perform the subtraction operation, the entry in Y_train is converted to [-1.69862224, -1.69862224, -1.69862224, -1.69862224, -1.69862224].
And which of those 5 values is the "correct" value to choose for the 96 time step ahead prediciton?
There is no real "correct" value when trained this way - as detailed above, this just a feature of the API. All output should converge to the single target value for the timestep, they're all being compared to that value, so you could technically train that way, but it's just adding parameters and complexity to the model (and you would just have to choose one to be the "real" prediction). The correct approach for getting the prediction for 96 timesteps ahead is detailed in the original answer, but just to reiterate, the output of the model contains future timestep predictions for each sample in the batch. The output tensor could be iterated over to retrieve the predictions for each timestep, for each sample. Furthermore, ensure the number of neurons in the final dense layer matches the number of target values you are trying to predict, otherwise you'll hit the broadcasting issue (and the "correct" output will be unclear).
Just to be exhaustive (and I am not recommending this), if you really wanted to incorporate several neurons in the output despite only having one target value, you could do something like averaging the results:
for i in range(0, len(Y_pred)):
prediction_lastValues_list.append(np.mean(Y_pred[i][0]))
But there is absolutely no benefit to this approach, so I would recommend just sticking with the previous suggestion.
Update 2
Is my model only predicting one time slot which is 96 time steps into the future or is it also predicting everything in between?
The model is predicting everything in between. So for a sample at timestep t, the output of the model are predictions [t + 1, t + 2, ..., t + NUMBER_OF_TIMESTEPS]. Per my original answer, "the output tensor from your model contains the predicted values for 96 steps into the future, for each sample". To specify that in your evaluation code, you can do something like:
Y_pred = np.squeeze(Y_pred)
predictions_for_all_samples_and_timesteps = Y_pred.tolist()
This results in a list of length BATCH_SIZE, and each element in the list is a list of length NUMBER_OF_TIMESTEPS (to be clear, predictions_for_all_samples_and_timesteps is a list of lists). The element at index i in predictions_for_all_samples_and_timesteps contains the predictions for each timestep from 1-96 for the i^th sample (row) in X_test.
As a side note, you could omit np.squeeze, but then you will have a list of lists of lists, where each element in the inner list is a list of one item (instead of [[1, 2, 3, ...], ], the output would look like [[[1], [2], [3], ...], ].
Update 3
Y_test and Y_pred are both 3-D numpy arrays of size (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 1). To compare them, you can take the absolute (or squared) difference between the two:
abs_diff = np.abs(Y_pred - Y_test)
This results in an array of the same dimensions, (BATCH_SIZE, NUMBER_OF_TIMESTEPS). You can then iterate over the rows and generate a plot of the timestep error for each row.
for diff in abs_diff:
print(diff.shape)
plt.plot(list(range(diff)), diff)
It may get a bit unwieldy with a large batch size (as you can see in the image), so maybe you plot a subset of the rows. You can also transform the absolute difference to an error percentage if you would prefer to plot that:
percentage_diff = abs_diff / Y_test
which would be the absolute difference over the actual value, as I see you were originally doing in Pandas. This numpy array will have the same dimensions, so you can iterate over it and generate plots in the same fashion.
For future inquiries, instead of posting the comments, please open a new question and just provide the link - I would be happy to continue helping, but I would like to continue gaining reputation from it.
I disagree with #danielcahall on just one point:
The output tensor from your model contains the predicted values for 96 steps into the future, for each sample
The output does contain 96 time steps, one for each input time step, and you can take an output to mean whatever you want. But this is just not a good model for what you're trying to do. The main reason is that the RNNs you're using are single direction.
x x x x x x # input
| | | | | |
x-->x-->x-->x-->x-->x # SimpleRNN
| | | | | |
x-->x-->x-->x-->x-->x # SimpleRNN
| /|\ /|\ /|\ /|\ |
| / | \ | \ | \ | \ |
x x x x x x # Conv
| | | | | |
x x x x x x # Dense -> output
So the first time index of the output only sees the first 2 input times (thanks to the Conv), it can't see the later times. The first prediction is based only on old data. It's only the last few outputs that can see all the inputs.
use 96 backwards steps to predict 96 steps into the future
Most of the outputs just can't see all the data.
This model would be appropriate if you were trying to predict 1 step into the future from each of the input times.
To predict 96 steps into the future it would be much more reasonable to drop the return_sequences=True and the Conv layer. Then expand the Dense layer to make the prediction:
model = keras.models.Sequential([
keras.layers.SimpleRNN(10, return_sequences=True, input_shape=[None, 3]), # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 10)
keras.layers.SimpleRNN(10), # output size is (BATCH_SIZE, 10)
keras.layers.Dense(96) # output size is (BATCH_SIZE, 96)
])
That way all 96 predictions see all 96 inputs.
See https://www.tensorflow.org/tutorials/structured_data/time_series for more details.
Also SimpleRNN is terrible. Never use it over more than a couple of steps.
I have some data that is of shape 10000 x 1440 x 8 where 10000 is the number of days, 1440 the number of minutes and 8 is the number of features.
For each day, ie. each submatrix of size 1440 x 8 I wish to train an autoencoder and extract the weights from the second layer, such that my output will be a matrix output = 10000 x 8
I can do this in a loop with
import numpy as np
from keras.layers import Input, Dense
from keras import regularizers, models, optimizers
data = np.random.random(size=(10000,1440,8))
def AE(y, epochs=100,learning_rate = 1e-4, regularization = 5e-4, epochs=3):
input = Input(shape=(y.shape[1],))
encoded = Dense(1, activation='relu',
kernel_regularizer=regularizers.l2(regularization))(input)
decoded = Dense(y.shape[1], activation='relu',
kernel_regularizer=regularizers.l2(regularization))(encoded)
autoencoder = models.Model(input, decoded)
autoencoder.compile(optimizer=optimizers.Adam(lr=learning_rate), loss='mean_squared_error')
autoencoder.fit(y, y, epochs=epochs, batch_size=10, shuffle=False)
(w1,b1,w2,b2)=autoencoder.get_weights()
return (w1,b1,w2,b2)
lst = []
for i in range(data.shape[0]):
y = data[i]
(_, _, w2, _) = AE(y)
lst.append(w2[0])
output = np.array(lst)
However, this feels very stupid as surely I must be able to just pass the 3D data to the autoencoder and retrieve what I want. However, if I try modify the shape of input to be input = Input(shape=(y.shape[1],y.shape[2]))
I get an error
ValueError: Dimensions must be equal, but are 1440 and 8 for '{{node
mean_squared_error/SquaredDifference}} =
SquaredDifference[T=DT_FLOAT](model_778/dense_1558/Relu,
IteratorGetNext:1)' with input shapes: [?,1440,1440], [?,1440,8].
Any pointers on how to get the shape right?
Simply reshape your your data like so and call the function.
data = data.reshape(data.shape[0]*data.shape[1], -1)
(w1, b1, w2, b2) = AE(data)
print(w2.shape)
Your first layer of the NN is a Dense layer. You can only pass two dimensional data into it. One dimension will be batch size and the other dimension will be the feature vector. When you are using the data in the way you are using it, you are considering each data point independently. Which means that you can join the first two axes together and just pass it on to the NN. However, note that you would still need to modify the code so that you are not passing the entire dataset at once to the NN. You need to split the data into batches and loop over those before passing it on. And honestly, it's the same as what you are doing now. So your looping is not as bad as you think it is for what you are trying to do.
However, also note that you have a time series data and considering each datapoint as an independent point doesn't really make sense. You need an LSTM layer or something to learn the time series encoding.
I am trying to build an LSTM and am confused about the best way to shape my data.
I have a dataframe that looks like this:
df.head(5)
data labels
0 [0.0009808844009380855, 0.0008974465127279559] 1
1 [0.0007158940267629654, 0.0008202958833774329] 3
2 [0.00040971929722210984, 0.000393972522972382] 3
3 [7.916243163372941e-05, 7.401835468434177e243] 3
4 [8.447556379936086e-05, 8.600626393842705e-05] 3
The 'data' column is my X and the labels is y. The df has 34890 rows. Each row contains 2 floats. The data represents a bunch of sequential text and each observation is a representation of a sentence. There are 5 classes.
I am trying to fit an LSTM with this data and am confused about how to use the timestep parameter.
With this code, I get the following:
data = np.array(df.class_proba.to_list())
labels = pd.get_dummies(df['speaker_spaff']).values
print('Shape of data tensor:', data.shape)
print('Shape of label tensor:', labels.shape)
Shape of data tensor: (34890, 2)
Shape of label tensor: (34890, 5)
I think my label tensor is correct, but I am confused about my data tensor.
Keras LSTM layers require the shape: samples, time steps, and features.
If I understand correctly, my number of samples is 34890, my features is 2, but what about timestamps? What should the timestamp parameter be and how can I reshape my data to fit this?
if MULTIPLE timesteps are required you have to create a sliding window function which helps you to reshape your data, for this purpose TimeSeriesGenerator from Keras is a good instrument (here a good example)
if you consider your data must have a SINGLE timestep you just have to simply expand dimension:
data[:,None,:] ==> new shape: (34890, 1, 2), the labels are ok
Sources
There are several sources out there explaining stateful / stateless LSTMs and the role of batch_size which I've read already. I'll refer to them later in my post:
[1] https://machinelearningmastery.com/understanding-stateful-lstm-recurrent-neural-networks-python-keras/
[2] https://machinelearningmastery.com/stateful-stateless-lstm-time-series-forecasting-python/
[3] http://philipperemy.github.io/keras-stateful-lstm/
[4] https://machinelearningmastery.com/use-different-batch-sizes-training-predicting-python-keras/
And also other SO threads like Understanding Keras LSTMs and Keras - stateful vs stateless LSTMs which didn't fully explain what I'm looking for however.
My Problem
I am still not sure what is the correct approach for my task regarding statefulness and determining batch_size.
I have about 1000 independent time series (samples) that have a length of about 600 days (timesteps) each (actually variable length, but I thought about trimming the data to a constant timeframe) with 8 features (or input_dim) for each timestep (some of the features are identical to every sample, some individual per sample).
Input shape = (1000, 600, 8)
One of the features is the one I want to predict, while the others are (supposed to be) supportive for the prediction of this one “master feature”. I will do that for each of the 1000 time series. What would be the best strategy to model this problem?
Output shape = (1000, 600, 1)
What is a Batch?
From [4]:
Keras uses fast symbolic mathematical libraries as a backend, such as TensorFlow and Theano.
A downside of using these libraries is that the shape and size of your data must be defined once up front and held constant regardless of whether you are training your network or making predictions.
[…]
This does become a problem when you wish to make fewer predictions than the batch size. For example, you may get the best results with a large batch size, but are required to make predictions for one observation at a time on something like a time series or sequence problem.
This sounds to me like a “batch” would be splitting the data along the timesteps-dimension.
However, [3] states that:
Said differently, whenever you train or test your LSTM, you first have to build your input matrix X of shape nb_samples, timesteps, input_dim where your batch size divides nb_samples. For instance, if nb_samples=1024 and batch_size=64, it means that your model will receive blocks of 64 samples, compute each output (whatever the number of timesteps is for every sample), average the gradients and propagate it to update the parameters vector.
When looking deeper into the examples of [1] and [4], Jason is always splitting his time series to several samples that only contain 1 timestep (the predecessor that in his example fully determines the next element in the sequence). So I think the batches are really split along the samples-axis. (However his approach of time series splitting doesn’t make sense to me for a long-term dependency problem.)
Conclusion
So let’s say I pick batch_size=10, that means during one epoch the weights are updated 1000 / 10 = 100 times with 10 randomly picked, complete time series containing 600 x 8 values, and when I later want to make predictions with the model, I’ll always have to feed it batches of 10 complete time series (or use solution 3 from [4], copying the weights to a new model with different batch_size).
Principles of batch_size understood – however still not knowing what would be a good value for batch_size. and how to determine it
Statefulness
The KERAS documentation tells us
You can set RNN layers to be 'stateful', which means that the states computed for the samples in one batch will be reused as initial states for the samples in the next batch.
If I’m splitting my time series into several samples (like in the examples of [1] and [4]) so that the dependencies I’d like to model span across several batches, or the batch-spanning samples are otherwise correlated with each other, I may need a stateful net, otherwise not. Is that a correct and complete conclusion?
So for my problem I suppose I won’t need a stateful net. I’d build my training data as a 3D array of the shape (samples, timesteps, features) and then call model.fit with a batch_size yet to determine. Sample code could look like:
model = Sequential()
model.add(LSTM(32, input_shape=(600, 8))) # (timesteps, features)
model.add(LSTM(32))
model.add(LSTM(32))
model.add(LSTM(32))
model.add(Dense(1, activation='linear'))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(X, y, epochs=500, batch_size=batch_size, verbose=2)
Let me explain it via an example:
So let's say you have the following series: 1,2,3,4,5,6,...,100. You have to decide how many timesteps your lstm will learn, and reshape your data as so. Like below:
if you decide time_steps = 5, you have to reshape your time series as a matrix of samples in this way:
1,2,3,4,5 -> sample1
2,3,4,5,6 -> sample2
3,4,5,6,7 -> sample3
etc...
By doing so, you will end with a matrix of shape (96 samples x 5 timesteps)
This matrix should be reshape as (96 x 5 x 1) indicating Keras that you have just 1 time series. If you have more time series in parallel (as in your case), you do the same operation on each time series, so you will end with n matrices (one for each time series) each of shape (96 sample x 5 timesteps).
For the sake of argument, let's say you 3 time series. You should concat all of three matrices into one single tensor of shape (96 samples x 5 timeSteps x 3 timeSeries). The first layer of your lstm for this example would be:
model = Sequential()
model.add(LSTM(32, input_shape=(5, 3)))
The 32 as first parameter is totally up to you. It means that at each point in time, your 3 time series will become 32 different variables as output space. It is easier to think each time step as a fully conected layer with 3 inputs and 32 outputs but with a different computation than FC layers.
If you are about stacking multiple lstm layers, use return_sequences=True parameter, so the layer will output the whole predicted sequence rather than just the last value.
your target shoud be the next value in the series you want to predict.
Putting all together, let say you have the following time series:
Time series 1 (master): 1,2,3,4,5,6,..., 100
Time series 2 (support): 2,4,6,8,10,12,..., 200
Time series 3 (support): 3,6,9,12,15,18,..., 300
Create the input and target tensor
x -> y
1,2,3,4,5 -> 6
2,3,4,5,6 -> 7
3,4,5,6,7 -> 8
reformat the rest of time series, but forget about the target since you don't want to predict those series
Create your model
model = Sequential()
model.add(LSTM(32, input_shape=(5, 3), return_sequences=True)) # Input is shape (5 timesteps x 3 timeseries), output is shape (5 timesteps x 32 variables) because return_sequences = True
model.add(LSTM(8)) # output is shape (1 timesteps x 8 variables) because return_sequences = False
model.add(Dense(1, activation='linear')) # output is (1 timestep x 1 output unit on dense layer). It is compare to target variable.
Compile it and train. A good batch size is 32. Batch size is the size your sample matrices are splited for faster computation. Just don't use statefull
I'm trying to learn LSTM. Have taken this web courses, read this book (https://machinelearningmastery.com/lstms-with-python/) and a lot of blogs... But, I'm completely stuck. My interest is in multivariate LSTM's and I have read all I can find but still can't get it. Don't know if I'm stupid or what it is...
If this exact question and a good answer already exists then I am sorry for double posting but I have looked and haven't found it...
As I want to really know the basics I created a dummy dataset in excel where every "y" depends on the sum of each input x1 and x2 but also over time. As I understand it this is a many-to-one scenario.
Pseudo code:
x1(t) = sin(A(t))
x2(t) = cos(A(t))
tmp(t) = x1(t) + x2(t) (dummy variable)
y(t) = tmp(t) + tmp(t-1) + tmp(t-2) (i.e. sum over the last three steps)
(Basically I want to predict y(t) given x1 and x2 over three time steps)
This is then exported to a csv file with columns x1, x2, y
I have tried to code it up below but obviously it won't work.
I read the data and split it into a 80/20 test and train set as X_train, y_train, X_test, y_test with dimensions (217,2), (217,1), (54,2), (54/1)
What I really haven't got a grip on yet is what exactly are timesteps and samples and the use in reshape and input_shape. In many examples of code I have looked at they simply use numbers rather than defined variables which makes it very difficult to understand what is happening, especially if you want to change something. As an example, in one of the courses I took the reshaping was coded like this...
X_train = np.reshape(X_train, (1257, 1, 1))
This doesn't provide much info...
Anyway, when i run the code below it says
ValueError: cannot reshape array of size 434 into shape (217,3,2)
So, I know what the causes the error, but not what I need to do to fix it. If I set look_back=1 it works but that's not what I want.
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
# Load data
data_set = pd.read_csv('../Data/LSTM_test.csv',';')
"""
data loaded have three columns:
col 0, col 1: features (x)
col 2: y
"""
# Train/test and variable split
split = 0.8 # 80% train, 20% test
split_idx = int(data_set.shape[0]*split)
# ...train
X_train = data_set.values[0:split_idx,0:2]
y_train = data_set.values[0:split_idx,2]
# ...test
X_test = data_set.values[split_idx:-1,0:2]
y_test = data_set.values[split_idx:-1,2]
# Model setup
look_back = 3 # as that is how y was generated (i.e. sum last three steps)
num_features = 2 # in this case: 2 features x1, x2
output_dim = 1 # want to predict 1 y value
nb_hidden_neurons = 32 # assume something to start with
nb_epoch = 2 # assume something to start with
# Reshaping
nb_samples = len(X_train) # in this case 217 samples in the training set
X_train_reshaped = np.reshape(X_train,(nb_samples, look_back, num_features))
# Create model
model = Sequential()
model.add(LSTM(nb_hidden_neurons, input_shape=(look_back,num_features)))
model.add(Dense(units=output_dim))
model.compile(optimizer = 'adam', loss = 'mean_squared_error')
model.fit(X_train_reshaped, y_train, batch_size = 32, epochs = nb_epoch)
print(model.summary())
Can anyone please explain what I have done wrong?
As I said, I have read a lot of blogs, questions, tutorials etc but if someone has a particularly good source of info I'd love to check that one up too.
I also had this question before. On a higher level, in (samples, time steps, features)
samples are the number of data, or say how many rows are there in your data set
time step is the number of times to feed in the model or LSTM
features is the number of columns of each sample
For me, I think a better example to understand it is that in NLP, suppose you have a sentence to process, then here sample is 1, which means 1 sentence to read, time step is the number of words in that sentence, you feed in the sentence word by word before the model read all the words and get a whole context of that sentence, features here is the dimension of each word, because in word embedding like word2vec or glove, each word is interpreted by a vector with multiple dimensions.
The input_shape parameter in Keras is only (time_steps, num_features),
more you can refer to this.
And the problem of yours is that when you reshape data, the multiplication of each dimension should equal to the multiplication of dimensions of original data set, where 434 does not equal to 217*3*2.
When you implement LSTM, you should be very clear of what are the features and what are the element you want the model to read each time step. There is a very similar case here surely can help you. For example, if you are trying to predict the value of time t using t-1 and t-2, you can either choose to feed in two values as one element to predict t, where (time_step, num_features)=(1, 2), or you can feed each value in 2 time steps, where (time_step, num_features)=(2, 1).
That's basically how I understand this, hope make it clear for you.
You seem to have a decent grasp of what LSTM expects and are just struggling with getting your data into the correct format. You start with an X_train of shape (217, 2) and you want to reshape this such that it's in the shape (nb_samples, look_back, num_features). You already have defined look_back and num_features and really all the work that's left is generating nb_samples chunks of length look_back with your original X_train. Numpy's reshape isn't really the tool for this, instead you'll have to write some code.
import numpy as np
nb_samples = X_train.shape[0] - look_back
x_train_reshaped = np.zeros((nb_samples, look_back, num_features))
y_train_reshaped = np.zeros((nb_samples))
for i in range(nb_samples):
y_position = i + look_back
x_train_reshaped[i] = X_train[i:y_position]
y_train_reshaped[i] = y_train[y_position]
model.fit(x_train_reshaped, y_train_reshaped, ...)
The shapes are now:
x_train_reshaped.shape
# (214, 3, 2)
y_train_reshaped.shape
# (214,)
You'll have to do the same thing with X_test and y_test.
This https://github.com/fchollet/keras/issues/2045 helped me.
But shortly, the answer for your question:
you want to reshape a list with 434 elements into shape (217,3,2), but it's impossible, let me show you why:
A new shape has 217*3*2 = 1302 elements, but you have 434 elements in the original list. Therefore, the solution is to change the dimensions of reshaping.