How to interpret the output of a RNN with Keras? - python

I would like to use a RNN for time series prediction to use 96 backwards steps to predict 96 steps into the future. For this I have the following code:
#Import modules
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.preprocessing import StandardScaler
from tensorflow import keras
# Define the parameters of the RNN and the training
epochs = 1
batch_size = 50
steps_backwards = 96
steps_forward = 96
split_fraction_trainingData = 0.70
split_fraction_validatinData = 0.90
randomSeedNumber = 50
helpValueStrides = int(steps_backwards /steps_forward)
#Read dataset
df = pd.read_csv('C:/Users1/Desktop/TestValues.csv', sep=';', header=0, low_memory=False, infer_datetime_format=True, parse_dates={'datetime':[0]}, index_col=['datetime'])
# standardize data
data = df.values
indexWithYLabelsInData = 0
data_X = data[:, 0:3]
data_Y = data[:, indexWithYLabelsInData].reshape(-1, 1)
scaler_standardized_X = StandardScaler()
data_X = scaler_standardized_X.fit_transform(data_X)
data_X = pd.DataFrame(data_X)
scaler_standardized_Y = StandardScaler()
data_Y = scaler_standardized_Y.fit_transform(data_Y)
data_Y = pd.DataFrame(data_Y)
# Prepare the input data for the RNN
series_reshaped_X = np.array([data_X[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])
series_reshaped_Y = np.array([data_Y[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])
timeslot_x_train_end = int(len(series_reshaped_X)* split_fraction_trainingData)
timeslot_x_valid_end = int(len(series_reshaped_X)* split_fraction_validatinData)
X_train = series_reshaped_X[:timeslot_x_train_end, :steps_backwards]
X_valid = series_reshaped_X[timeslot_x_train_end:timeslot_x_valid_end, :steps_backwards]
X_test = series_reshaped_X[timeslot_x_valid_end:, :steps_backwards]
Y_train = series_reshaped_Y[:timeslot_x_train_end, steps_backwards:]
Y_valid = series_reshaped_Y[timeslot_x_train_end:timeslot_x_valid_end, steps_backwards:]
Y_test = series_reshaped_Y[timeslot_x_valid_end:, steps_backwards:]
# Build the model and train it
np.random.seed(randomSeedNumber)
tf.random.set_seed(randomSeedNumber)
model = keras.models.Sequential([
keras.layers.SimpleRNN(10, return_sequences=True, input_shape=[None, 3]),
keras.layers.SimpleRNN(10, return_sequences=True),
keras.layers.Conv1D(16, helpValueStrides, strides=helpValueStrides),
keras.layers.TimeDistributed(keras.layers.Dense(1))
])
model.compile(loss="mean_squared_error", optimizer="adam", metrics=['mean_absolute_percentage_error'])
history = model.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, validation_data=(X_valid, Y_valid))
#Predict the test data
Y_pred = model.predict(X_test)
prediction_lastValues_list=[]
for i in range (0, len(Y_pred)):
prediction_lastValues_list.append((Y_pred[i][0][1 - 1]))
# Create thw dataframe for the whole data
wholeDataFrameWithPrediciton = pd.DataFrame((X_test[:,1]))
wholeDataFrameWithPrediciton.rename(columns = {indexWithYLabelsInData:'actual'}, inplace = True)
wholeDataFrameWithPrediciton.rename(columns = {1:'Feature 1'}, inplace = True)
wholeDataFrameWithPrediciton.rename(columns = {2:'Feature 2'}, inplace = True)
wholeDataFrameWithPrediciton['predictions'] = prediction_lastValues_list
wholeDataFrameWithPrediciton['difference'] = (wholeDataFrameWithPrediciton['predictions'] - wholeDataFrameWithPrediciton['actual']).abs()
wholeDataFrameWithPrediciton['difference_percentage'] = ((wholeDataFrameWithPrediciton['difference'])/(wholeDataFrameWithPrediciton['actual']))*100
# Inverse the scaling (traInv: transformation inversed)
data_X_traInv = scaler_standardized_X.inverse_transform(data_X)
data_Y_traInv = scaler_standardized_Y.inverse_transform(data_Y)
series_reshaped_X_notTransformed = np.array([data_X_traInv[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])
X_test_notTranformed = series_reshaped_X_notTransformed[timeslot_x_valid_end:, :steps_backwards]
predictions_traInv = scaler_standardized_Y.inverse_transform(wholeDataFrameWithPrediciton['predictions'].values.reshape(-1, 1))
edictions_traInv = wholeDataFrameWithPrediciton['predictions'].values.reshape(-1, 1)
# Create thw dataframe for the inversed transformed data
wholeDataFrameWithPrediciton_traInv = pd.DataFrame((X_test_notTranformed[:,0]))
wholeDataFrameWithPrediciton_traInv.rename(columns = {indexWithYLabelsInData:'actual'}, inplace = True)
wholeDataFrameWithPrediciton_traInv.rename(columns = {1:'Feature 1'}, inplace = True)
wholeDataFrameWithPrediciton_traInv['predictions'] = predictions_traInv
wholeDataFrameWithPrediciton_traInv['difference_absolute'] = (wholeDataFrameWithPrediciton_traInv['predictions'] - wholeDataFrameWithPrediciton_traInv['actual']).abs()
wholeDataFrameWithPrediciton_traInv['difference_percentage'] = ((wholeDataFrameWithPrediciton_traInv['difference_absolute'])/(wholeDataFrameWithPrediciton_traInv['actual']))*100
wholeDataFrameWithPrediciton_traInv['difference'] = (wholeDataFrameWithPrediciton_traInv['predictions'] - wholeDataFrameWithPrediciton_traInv['actual'])
Here you can have some test data (don't care about the actual values as I made them up, just the shape is important) Download test data
How can the output of the Y_pred data be interpreted? Which of those values yields me the predicted values 96 steps into the future? I have attached a screenshot of the 'Y_pred' data. One time with 5 output neurons in the last layer and one time only with 1. Can anyone tell me, how to interpret the 'Y_pred' data meaning what exactly is the RNN predicting? I can use any values in the output (last layer ) of the RNN model. The 'Y_pred' data always has the shape (Batch size of X_test, timesequence, Number of output neurons). My question is targeting at the last dimension. I thought that these might be the features, but this is not true in my case, as I only have 1 output features (you can see that in the shape of the Y_train, Y_test and Y_valid data).
**Reminder **: The bounty is expiring soon and unfortunately I still have not received any answer. So I would like to remind you on the question and the bounty. I'll highly appreciate every comment.

It may be useful to step through the model inputs/outputs in detail.
When using the keras.layers.SimpleRNN layer with return_sequences=True, the output will return a 3-D tensor where the 0th axis is the batch size, the 1st axis is the timestep, and the 2nd axis is the number of hidden units (in the case for both SimpleRNN layers in your model, 10).
The Conv1D layer will produce an output tensor where the last dimension becomes the number of hidden units (in the case for your model, 16), as it's just being convolved with the input.
keras.layers.TimeDistributed, the layer supplied (in the example provided, Dense(1)) will be applied to each timestep in the batch independently. So with 96 timesteps, we have 96 outputs for each record in the batch.
So stepping through your model:
model = keras.models.Sequential([
keras.layers.SimpleRNN(10, return_sequences=True, input_shape=[None, 3]), # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 10)
keras.layers.SimpleRNN(10, return_sequences=True), # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 10)
keras.layers.Conv1D(16, helpValueStrides, strides=helpValueStrides) # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 16),
keras.layers.TimeDistributed(keras.layers.Dense(1)) # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 1)
])
To answer your question, the output tensor from your model contains the predicted values for 96 steps into the future, for each sample. If it's easier to conceptualize, for the case of 1 output, you can apply np.squeeze to the result of model.predict, which will make the output 2-D:
Y_pred = model.predict(X_test) # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 1)
Y_pred_squeezed = np.squeeze(Y_pred) # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS)
In that way, you have a rectangular matrix where each row corresponds to a sample in the batch, and each column i corresponds to the prediction for the timestep i.
In the loop after the prediction step, all the timestep predictions are being discarded except for the first one:
for i in range(0, len(Y_pred)):
prediction_lastValues_list.append((Y_pred[i][0][1 - 1]))
which means the end result is just a list of predictions for the first timestep for each sample in the batch. If you wanted the prediction for the 96th timestep, you could do:
for i in range(0, len(Y_pred)):
prediction_lastValues_list.append((Y_pred[i][-1][1 - 1]))
Notice the -1 instead of 0 for the second bracket, to ensure we grab the last predicted timestep instead of the first.
As a side note, to replicate the results, I had to make one change to your code, specifically when creating series_reshaped_X and series_reshaped_Y. I hit an exception when using np.array to create the array from the list: ValueError: cannot copy sequence with size 192 to array axis with dimension 3 , but looking at what you were doing (joining tensors along a new axis), I changed it to np.stack, which will accomplish the same goal (https://numpy.org/doc/stable/reference/generated/numpy.stack.html):
series_reshaped_X = np.stack([data_X[i:i + (steps_backwards + steps_forward)].copy() for i in
range(len(data) - (steps_backwards + steps_forward))])
series_reshaped_Y = np.stack([data_Y[i:i + (steps_backwards + steps_forward)].copy() for i in
range(len(data) - (steps_backwards + steps_forward))])
Update
"What are those 5 values representing when I only have 1 target feature?"
That's actually just the broadcasting feature of the Tensorflow API (which is also a feature of NumPy). If you perform an arithmetic operation on two tensors with differing shapes, it will try to make them compatible. In this case, if you change the output layer size to be "5" instead of "1" (keras.layers.Dense(5)), the output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 5) instead of (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 1), which just means the output from the convolutional layer is going into 5 neurons instead of 1. When the loss (mean squared error) is computed between the two, the size of the label tensor ((BATCH_SIZE, NUMBER_OF_TIMESTEPS, 1)) is broadcast to the size of the prediction tensor ((BATCH_SIZE, NUMBER_OF_TIMESTEPS, 5)). In this case, the broadcasting is accomplished by replicating the column. For example, if Y_train had [-1.69862224] in the first row for the first timestep, and Y_pred had [-0.6132075 , -0.6621697 , -0.7712653 , -0.60011995, -0.48753992] in the first row for the first timestep, to perform the subtraction operation, the entry in Y_train is converted to [-1.69862224, -1.69862224, -1.69862224, -1.69862224, -1.69862224].
And which of those 5 values is the "correct" value to choose for the 96 time step ahead prediciton?
There is no real "correct" value when trained this way - as detailed above, this just a feature of the API. All output should converge to the single target value for the timestep, they're all being compared to that value, so you could technically train that way, but it's just adding parameters and complexity to the model (and you would just have to choose one to be the "real" prediction). The correct approach for getting the prediction for 96 timesteps ahead is detailed in the original answer, but just to reiterate, the output of the model contains future timestep predictions for each sample in the batch. The output tensor could be iterated over to retrieve the predictions for each timestep, for each sample. Furthermore, ensure the number of neurons in the final dense layer matches the number of target values you are trying to predict, otherwise you'll hit the broadcasting issue (and the "correct" output will be unclear).
Just to be exhaustive (and I am not recommending this), if you really wanted to incorporate several neurons in the output despite only having one target value, you could do something like averaging the results:
for i in range(0, len(Y_pred)):
prediction_lastValues_list.append(np.mean(Y_pred[i][0]))
But there is absolutely no benefit to this approach, so I would recommend just sticking with the previous suggestion.
Update 2
Is my model only predicting one time slot which is 96 time steps into the future or is it also predicting everything in between?
The model is predicting everything in between. So for a sample at timestep t, the output of the model are predictions [t + 1, t + 2, ..., t + NUMBER_OF_TIMESTEPS]. Per my original answer, "the output tensor from your model contains the predicted values for 96 steps into the future, for each sample". To specify that in your evaluation code, you can do something like:
Y_pred = np.squeeze(Y_pred)
predictions_for_all_samples_and_timesteps = Y_pred.tolist()
This results in a list of length BATCH_SIZE, and each element in the list is a list of length NUMBER_OF_TIMESTEPS (to be clear, predictions_for_all_samples_and_timesteps is a list of lists). The element at index i in predictions_for_all_samples_and_timesteps contains the predictions for each timestep from 1-96 for the i^th sample (row) in X_test.
As a side note, you could omit np.squeeze, but then you will have a list of lists of lists, where each element in the inner list is a list of one item (instead of [[1, 2, 3, ...], ], the output would look like [[[1], [2], [3], ...], ].
Update 3
Y_test and Y_pred are both 3-D numpy arrays of size (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 1). To compare them, you can take the absolute (or squared) difference between the two:
abs_diff = np.abs(Y_pred - Y_test)
This results in an array of the same dimensions, (BATCH_SIZE, NUMBER_OF_TIMESTEPS). You can then iterate over the rows and generate a plot of the timestep error for each row.
for diff in abs_diff:
print(diff.shape)
plt.plot(list(range(diff)), diff)
It may get a bit unwieldy with a large batch size (as you can see in the image), so maybe you plot a subset of the rows. You can also transform the absolute difference to an error percentage if you would prefer to plot that:
percentage_diff = abs_diff / Y_test
which would be the absolute difference over the actual value, as I see you were originally doing in Pandas. This numpy array will have the same dimensions, so you can iterate over it and generate plots in the same fashion.
For future inquiries, instead of posting the comments, please open a new question and just provide the link - I would be happy to continue helping, but I would like to continue gaining reputation from it.

I disagree with #danielcahall on just one point:
The output tensor from your model contains the predicted values for 96 steps into the future, for each sample
The output does contain 96 time steps, one for each input time step, and you can take an output to mean whatever you want. But this is just not a good model for what you're trying to do. The main reason is that the RNNs you're using are single direction.
x x x x x x # input
| | | | | |
x-->x-->x-->x-->x-->x # SimpleRNN
| | | | | |
x-->x-->x-->x-->x-->x # SimpleRNN
| /|\ /|\ /|\ /|\ |
| / | \ | \ | \ | \ |
x x x x x x # Conv
| | | | | |
x x x x x x # Dense -> output
So the first time index of the output only sees the first 2 input times (thanks to the Conv), it can't see the later times. The first prediction is based only on old data. It's only the last few outputs that can see all the inputs.
use 96 backwards steps to predict 96 steps into the future
Most of the outputs just can't see all the data.
This model would be appropriate if you were trying to predict 1 step into the future from each of the input times.
To predict 96 steps into the future it would be much more reasonable to drop the return_sequences=True and the Conv layer. Then expand the Dense layer to make the prediction:
model = keras.models.Sequential([
keras.layers.SimpleRNN(10, return_sequences=True, input_shape=[None, 3]), # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 10)
keras.layers.SimpleRNN(10), # output size is (BATCH_SIZE, 10)
keras.layers.Dense(96) # output size is (BATCH_SIZE, 96)
])
That way all 96 predictions see all 96 inputs.
See https://www.tensorflow.org/tutorials/structured_data/time_series for more details.
Also SimpleRNN is terrible. Never use it over more than a couple of steps.

Related

Why is my Tensorflow LSTM Timeseries model returning only one value into the future

I have a tensorflow model for predicting Timeseries values using LSTM, it trains fine but when I ask it to predict some values in time it only gives me the T+1 value,
How can I make it to give me from T+1 to T+n values instead of just T+1.
I thought about giving him back the predicted value to analyse again in a loop, e.g.
We look back at 20 samples for this example
T±0 = now value
T-k = value k steps into the past (known)
T+n = value n steps into the future (unkown at the start)
--- Algorithm
T+1 = model.predict(data from T-20 to T±0)
T+2 = model.predict(data from T-19 to T+1) #using the previously found T+1 value
T+3 = model.predict(data from T-18 to T+2) #using the previously found T+1 and T+2 values
.
.
.
T+n = model.predict(data from T-(k-n) to T+(n-1)) # using the previously found T+1 .. T+(n-1) values
The thing is that T+1 has an mean absolute error of around 0.75%, doesn't the error propagate/compound through the predictions ? If it does it means that if I ask the program to predict T+10 it will have a mean absolute error of 0.75%^10 = ~7.7%, which is not very good in my case. So I'm looking for other ways to predict up to T+n values.
I've looked at few youtube tutorials but each time it seems that their call of model.predict(X) returns multiple values already, and I have no idea about what parameters I could have missed.
Code :
import tensorflow.keras as tfks
import pandas as pd
import numpy as np
def model_training(dataframe,folder_name, window_size=40, epochs=100, batch_size=64):
"""Function to start training the model on given data
Parameters
----------
dataframe : `pandas.DataFrame` The dataframe to train the model on
window_size : `int` The size of the lookback window to use
epochs : `int` The number of epochs to train for
batch_size : `int` The batch size to use for training
Returns
-------
None
"""
dataframe,_ = Process.pre(dataframe) #function to standardize each column of the data
TRAIN_SIZE = 0.7
VAL_SIZE = 0.2
TEST_SIZE = 0.1
#Splitting the data into train, validation and test sets
x,y = dataframe_to_xy(dataframe, window_size) #converts pandas dataframe to numpy array
x_train,y_train = x[:int(len(dataframe)*TRAIN_SIZE)],y[:int(len(dataframe)*TRAIN_SIZE)]
x_val,y_val = x[int(len(dataframe)*TRAIN_SIZE):int(len(dataframe)*(TRAIN_SIZE+VAL_SIZE))],y[int(len(dataframe)*TRAIN_SIZE):int(len(dataframe)*(TRAIN_SIZE+VAL_SIZE))]
x_test,y_test = x[int(len(dataframe)*(TRAIN_SIZE+VAL_SIZE)):],y[int(len(dataframe)*(TRAIN_SIZE+VAL_SIZE)):]
#Creating the model base
model = tfkr.models.Sequential()
model.add(tfkr.layers.InputLayer(input_shape=(window_size, 10)))
model.add(tfkr.layers.LSTM(64))
model.add(tfkr.layers.Dense(8, 'relu'))
model.add(tfkr.layers.Dense(10, 'linear'))
model.summary()
#Compiling and saving the model
cp = tfkr.callbacks.ModelCheckpoint('ai\\models\\'+folder_name+'\\', save_best_only=True)
model.compile(loss=tfkr.losses.MeanSquaredError(), optimizer=tfkr.optimizers.Adam(learning_rate=0.0001), metrics=[tfkr.metrics.RootMeanSquaredError()])
model.fit(x_train, y_train, epochs=epochs, batch_size=batch_size, validation_data=(x_val, y_val), callbacks=[cp])
def predict_data(model,data_pre,window_size,n_future):
'''Function to predict data using the model
Parameters
----------
model : `tensorflow.keras.models.Sequential` The model to use for prediction
data_pre : `pandas.DataFrame` The dataframe to predict on
window_size : `int` The size of the lookback window to use
n_future : `int` Number of values to predict in the future
Returns
-------
data_pred : `pandas.DataFrame` The dataframe containing the predicted values
'''
time_interval = data_pre.index[1] - data_pre.index[0]
#Setting up the dataframe to predict on
data_pre, proc_params = Process.pre(data_pre) #function to standardize each column of the data
data_pred = data_pre.iloc[-window_size:]
data_pred = data_pred.to_numpy().astype('float32')
data_pred = data_pred.reshape(1,window_size,10)
#Predicting the data
data_pred = model.predict(data_pred)
#Converting the data from numpy array to pandas dataframe and doing some formatting + post-processing/reversing standardization
#yada yada pandas dataframe
return data_pred
If there is no way preventing the error propagation, do you have any tips for reducting the error of the model ?
Thanks in advance.
You could let your LSTM return the full sequences, not only the last output, like this:
model = tfkr.models.Sequential()
model.add(tfkr.layers.InputLayer(input_shape=(window_size, 10)))
model.add(tfkr.layers.LSTM(64, return_sequences=True))
model.add(tfkr.layers.Dense(8, 'relu'))
model.add(tfkr.layers.Dense(10, 'linear'))
Each LSTM output would go through the same dense weights (because we did not flatten). Your output would be then of shape
(None, window_size, 10)
This means that for k input time points you would get k output time points.
Downside is, that the first output would be calculated only using the first input. The second output only by using the first two inputs, and so on. So I would suggest to use a bidirectional LSTM, and combine both directions, maybe like this:
model = tfkr.models.Sequential()
model.add(tfkr.layers.InputLayer(input_shape=(window_size, 10)))
model.add(tfkr.layers.Bidirectional(tfkr.layers.LSTM(64, return_sequences=True), merge_mode='sum')
model.add(tfkr.layers.Dense(8, 'relu'))
model.add(tfkr.layers.Dense(10, 'linear'))
One way which you can do this is, lets assume you initially trained your model something like this:
x_train = [
[1, 2, 3, 4, 5, 6, 7],
[2, 3, 4, 5, 6, 7, 8],
]
y_train = [
[8],
[9]
]
And the training was done as such using the model.fit with all appropriate reshaping.
model.fit(x_train, y_train)
Then, you would expect something like this when you try to predict
ans = model.predict(x_train[0])
ans: 8
But lets say you wanted it predict the next value as well, with the expected output as 8 and then 9, or for n expected outputs, you would expect a result such as 8, 9, 10, 11... and so on.
Then I suggest to approach the problem the following way:
def predict_continuous(model, input_arr, num_ahead):
# Your input arr was [1,2,3,4,5,6,7]
# You want the output to go beyond just 8, lets say 8,9,10,11,...
# num_ahead: How much into the future you need predictions.
predictions = []
model_input = input_arr
for i in range(num_ahead):
predictions.append(model.predict(model_input)) # Get the prediction, and subsequent predictions are appended
# Since the LSTM model is trained for a particular input size, we need to keep dynamically changing the input
# so we strip off the first value from the input array, and append the latest prediction as its last value.
model_input = model_input[1:].append(predictions[-1])
return predictions
Now this won't work directly, you will have to do some reshaping, however the core idea remains the same.

Determining the Right Shape for a RNN with a Integer Sequence Target

I am trying to build an RNN based model in Tensorflow that takes a sequence of categorical values as an input, and sequence of categorical values as the output.
For example, if I have sequence of 30 values, the first 25 would be the training data, and the last 5 would be the target. Imagine the data is something like a person pressing keys on a computer keyboard and recording their key presses over time.
I've tried to feed the training data and targets into this model in different shapes, and I always get an error that indicates the data is in the wrong shape.
I've included a code sample that should run and demonstrate what I'm trying to do and the failure I'm seeing.
In the code sample, I've used windows for batches. So if there are 90 values in the sequence, the first 25 values would be the training data for the first batch, and the next 5 values would be the target. This next batch would be the next 30 values (25 training values, 5 target values).
import numpy as np
import tensorflow as tf
from tensorflow import keras
num_categories = 20
data_sequence = np.random.choice(num_categories, 10000)
def create_target(batch):
X = tf.cast(batch[:,:-5][:,:,None], tf.float32)
Y = batch[:,-5:][:,:,None]
return X,Y
def add_windows(data):
data = tf.data.Dataset.from_tensor_slices(data)
return data.window(20, shift=1, drop_remainder=True)
dataset = tf.data.Dataset.from_tensor_slices(data_sequence)
dataset = dataset.window(30, drop_remainder=True)
dataset = dataset.flat_map(lambda x: x.batch(30))
dataset = dataset.batch(5)
dataset = dataset.map(create_target)
model = keras.models.Sequential([
keras.layers.SimpleRNN(20, return_sequences=True),
keras.layers.SimpleRNN(20, return_sequences=True),
keras.layers.TimeDistributed(keras.layers.Dense(num_categories, activation="softmax"))
])
optimizer = keras.optimizers.Adam()
model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer)
model.fit(dataset, epochs=1)
The error I get when I run the above code is
Node: 'sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits'
logits and labels must have the same first dimension, got logits shape [125,20] and labels shape [25]
I've also tried the following model, but the errors are similar.
model = keras.models.Sequential([
keras.layers.SimpleRNN(20, return_sequences=True),
keras.layers.SimpleRNN(20),
keras.layers.Dense(num_categories, activation="softmax"))
])
Does anybody have any recommendations about what I need to do to get this working?
Thanks.
I figured out the issue. The size of the time dimension needs to be the same for the training data and the target.
If you look at my original example code, the training data has these shapes
X.shape = (1, 25, 1)
Y.shape = (1, 5, 1)
To fix it, the time dimension should be the same.
X.shape = (1, 15, 1)
Y.shape = (1, 15, 1)
Here is the updated function that will let the model train. Note that all I did was update the array sizes so they are equally sized. The value of 15 is used because the original array length is 30.
def create_target(batch):
X = tf.cast(batch[:,:-15][:,:,None], tf.float32)
Y = batch[:,-15:][:,:,None]
return X,Y

Train autoencoder without looping

I have some data that is of shape 10000 x 1440 x 8 where 10000 is the number of days, 1440 the number of minutes and 8 is the number of features.
For each day, ie. each submatrix of size 1440 x 8 I wish to train an autoencoder and extract the weights from the second layer, such that my output will be a matrix output = 10000 x 8
I can do this in a loop with
import numpy as np
from keras.layers import Input, Dense
from keras import regularizers, models, optimizers
data = np.random.random(size=(10000,1440,8))
def AE(y, epochs=100,learning_rate = 1e-4, regularization = 5e-4, epochs=3):
input = Input(shape=(y.shape[1],))
encoded = Dense(1, activation='relu',
kernel_regularizer=regularizers.l2(regularization))(input)
decoded = Dense(y.shape[1], activation='relu',
kernel_regularizer=regularizers.l2(regularization))(encoded)
autoencoder = models.Model(input, decoded)
autoencoder.compile(optimizer=optimizers.Adam(lr=learning_rate), loss='mean_squared_error')
autoencoder.fit(y, y, epochs=epochs, batch_size=10, shuffle=False)
(w1,b1,w2,b2)=autoencoder.get_weights()
return (w1,b1,w2,b2)
lst = []
for i in range(data.shape[0]):
y = data[i]
(_, _, w2, _) = AE(y)
lst.append(w2[0])
output = np.array(lst)
However, this feels very stupid as surely I must be able to just pass the 3D data to the autoencoder and retrieve what I want. However, if I try modify the shape of input to be input = Input(shape=(y.shape[1],y.shape[2]))
I get an error
ValueError: Dimensions must be equal, but are 1440 and 8 for '{{node
mean_squared_error/SquaredDifference}} =
SquaredDifference[T=DT_FLOAT](model_778/dense_1558/Relu,
IteratorGetNext:1)' with input shapes: [?,1440,1440], [?,1440,8].
Any pointers on how to get the shape right?
Simply reshape your your data like so and call the function.
data = data.reshape(data.shape[0]*data.shape[1], -1)
(w1, b1, w2, b2) = AE(data)
print(w2.shape)
Your first layer of the NN is a Dense layer. You can only pass two dimensional data into it. One dimension will be batch size and the other dimension will be the feature vector. When you are using the data in the way you are using it, you are considering each data point independently. Which means that you can join the first two axes together and just pass it on to the NN. However, note that you would still need to modify the code so that you are not passing the entire dataset at once to the NN. You need to split the data into batches and loop over those before passing it on. And honestly, it's the same as what you are doing now. So your looping is not as bad as you think it is for what you are trying to do.
However, also note that you have a time series data and considering each datapoint as an independent point doesn't really make sense. You need an LSTM layer or something to learn the time series encoding.

Keras multi-output data reshape for LSTM model

I have a Keras LSTM model that contains multiple outputs.
The model is defined as follows:
outputs=[]
main_input = Input(shape= (seq_length,feature_cnt), name='main_input')
lstm = LSTM(32,return_sequences=True)(main_input)
for _ in range((output_branches)): #output_branches is the number of output branches of the model
prediction = LSTM(8,return_sequences=False)(lstm)
out = Dense(1)(prediction)
outputs.append(out)
model = Model(inputs=main_input, outputs=outputs)
model.compile(optimizer='rmsprop',loss='mse')
I have a problem when reshaping the output data.
The code for reshaping the output data is:
y=y.reshape((len(y),output_branches,1))
I got the following error:
ValueError: Error when checking model target: the list of Numpy arrays
that you are passing to your model is not the size the model expected.
Expected to see 5 array(s), but instead got the following list of 1
arrays: [array([[[0.29670931],
[0.16652206],
[0.25114482],
[0.36952324],
[0.09429612]],
[[0.16652206],
[0.25114482],
[0.36952324],
[0.09429612],...
How can I correctly reshape the output data?
It depends on how y is structured initially. Here I assume that y is a single-valued label for each sequence in batch.
When there are multiple inputs/outputs model.fit() expects a corresponding list of inputs/outputs to be given. np.split(y, output_branches, axis=-1) in a following fully reproducible example does exactly this - for each batch splits a single list of outputs into a list of separate outputs where each output (in this case) is 1-element list:
import tensorflow as tf
import numpy as np
tf.enable_eager_execution()
batch_size = 100
seq_length = 10
feature_cnt = 5
output_branches = 3
# Say we've got:
# - 100-element batch
# - of 10-element sequences
# - where each element of a sequence is a vector describing 5 features.
X = np.random.random_sample([batch_size, seq_length, feature_cnt])
# Every sequence of a batch is labelled with `output_branches` labels.
y = np.random.random_sample([batch_size, output_branches])
# Here y.shape() == (100, 3)
# Here we split the last axis of y (output_branches) into `output_branches` separate lists.
y = np.split(y, output_branches, axis=-1)
# Here y is not a numpy matrix anymore, but a list of matrices.
# E.g. y[0].shape() == (100, 1); y[1].shape() == (100, 1) etc...
outputs = []
main_input = tf.keras.layers.Input(shape=(seq_length, feature_cnt), name='main_input')
lstm = tf.keras.layers.LSTM(32, return_sequences=True)(main_input)
for _ in range(output_branches):
prediction = tf.keras.layers.LSTM(8, return_sequences=False)(lstm)
out = tf.keras.layers.Dense(1)(prediction)
outputs.append(out)
model = tf.keras.models.Model(inputs=main_input, outputs=outputs)
model.compile(optimizer='rmsprop', loss='mse')
model.fit(X, y)
You might need to play around with axes as you didn't specify how exactly your data look like.
EDIT:
As author is looking for an answer drawing from official sources, it's mentioned here (not explicitly though, it only mentions what the Dataset should yield, hence - what kind of input structure model.fit() expects):
When calling fit with a Dataset object, it should yield either a tuple of lists like ([title_data, body_data, tags_data], [priority_targets, dept_targets]) or a tuple of dictionaries like ({'title': title_data, 'body': body_data, 'tags': tags_data}, {'priority': priority_targets, 'department': dept_targets}).
Since you have an amount of outputs equal to output_branches, your output data must be a list with the same amount of arrays.
Basically, if the output data is in the middle dimension as your reshape suggests:
y = [ y[:,i] for i in range(output_branches)]

How to shape large DataFrame for python's keras LSTM?

I nearly found what I need in the accepted answer here. But had memory issues because the test df provided was only 11 rows.
What I'm trying to predict is using LSTM to forecast 10 days ahead of a Time Series data in a regression model (not classifier!). My dataframe X has around 1500 rows and 2000 features, being of shape (1500, 2000) while the truth values y are just 1500 rows of 1 feature (that can range any value between -1 and 1).
Since LSTM needs 3D vector as an input, I'm really struggling how to reshape the data.
Again, following the example at first paragraph, it crashes for MemoryError when padding values, more specifically at df.cumulative_input_vectors.tolist().
My test (read forecast) is a dataframe of shape (10, 2000).
Due to sensitive data I can't actually share the values/example. How can I help you help me with this?
So, to enable the LSTM to learn from the 1500 rows of y, how should I reshape my x of 1500 rows and 2000 features? Also, how should I reshape my forecast of 10 rows and 2000 features?
They'll undergo -at first because I'm learning LSTM- a simple LSTM model of:
model = Sequential()
model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(train_X, train_y , epochs=50, batch_size=2, verbose=1)
what I've tried, but when predictin got error:
# A function to make a 3d data of what I understood needed done:
def preprocess_data(stock, seq_len):
amount_of_features = len(stock.columns)
data = stock.values
sequence_length = seq_len #+ 1
result = []
for index in range(len(data) - sequence_length):
result.append(data[index : index + sequence_length])
X_train = np.array(result)
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], amount_of_features))
return X_train
# creating the train as:
# X == the DF of 1500 rows and 2000 features
window = 10
train_X = preprocess_data(X[::-1], window)
After a while I've managed to understand properly what the dimensions where. Keras expect a 3d array of .shape (totalRows, sequences, totalColumns). The sequences one was the most confusing to me.
That was because when reshaping the df df.reshape(len(df), 1, len(df.columns)) meaning keras would learn for a matrix of 1 line it gave me bad results because I didn't know it's best to scale the data for me MinMaxScaler(-1,1) worked best, but could be (0,1).
What made me understand that was to first use sequence of more than 1 row (or days, since my dataset was a Time Series). Meaning that instead of feeding 1 row of features X results in 1 value of y, I used something like 5 rows of features X results in 1 value of y. as in:
# after scaling the df, resulted in "scaled_dataset"
sequences = 5
result = []
# for loop will walk for each of the 1500 rows
for i in range(0,len(scaled_dataset)):
# every group must have the same length, so if current loop position i + number
# of sequences is higher than df length, breaks
if i+sequences <= len(scaled_dataset):
# this will add into the list as [[R1a,R1b...R1t],[R2a,R2b...R2t],...[R5a,R5b...R5t]]
result.append(scaled_dataset[i:i+sequences].values)
# Converting to array + keras takes float32 better than 64
train_x = np.array(result).astype('float32')
# making the y into same length as X
train_y = np.array(y.tail(train_x.shape[0]).values)
train_x.shape, train_y.shape
'>>> (1495, 5, 2400), (1495,)
Writen in another way the mentality on keras shapes for my problem:
Considering it's a time series, above means that 5 days (rows 0 to 4) of data results in the value y of row 5.
Then, less the first day + the next day after last - still 5 days - (rows 1 to 5) of data results in the value y of row 6.
Then, less the second day + the next day after last - still 5 days - (rows 2 to 6) of data results in the value y of row 7.
It's quite confusing for starters of keras/LSTM, but I hope I could details this for people who might land here.

Categories

Resources