I am trying to build an RNN based model in Tensorflow that takes a sequence of categorical values as an input, and sequence of categorical values as the output.
For example, if I have sequence of 30 values, the first 25 would be the training data, and the last 5 would be the target. Imagine the data is something like a person pressing keys on a computer keyboard and recording their key presses over time.
I've tried to feed the training data and targets into this model in different shapes, and I always get an error that indicates the data is in the wrong shape.
I've included a code sample that should run and demonstrate what I'm trying to do and the failure I'm seeing.
In the code sample, I've used windows for batches. So if there are 90 values in the sequence, the first 25 values would be the training data for the first batch, and the next 5 values would be the target. This next batch would be the next 30 values (25 training values, 5 target values).
import numpy as np
import tensorflow as tf
from tensorflow import keras
num_categories = 20
data_sequence = np.random.choice(num_categories, 10000)
def create_target(batch):
X = tf.cast(batch[:,:-5][:,:,None], tf.float32)
Y = batch[:,-5:][:,:,None]
return X,Y
def add_windows(data):
data = tf.data.Dataset.from_tensor_slices(data)
return data.window(20, shift=1, drop_remainder=True)
dataset = tf.data.Dataset.from_tensor_slices(data_sequence)
dataset = dataset.window(30, drop_remainder=True)
dataset = dataset.flat_map(lambda x: x.batch(30))
dataset = dataset.batch(5)
dataset = dataset.map(create_target)
model = keras.models.Sequential([
keras.layers.SimpleRNN(20, return_sequences=True),
keras.layers.SimpleRNN(20, return_sequences=True),
keras.layers.TimeDistributed(keras.layers.Dense(num_categories, activation="softmax"))
])
optimizer = keras.optimizers.Adam()
model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer)
model.fit(dataset, epochs=1)
The error I get when I run the above code is
Node: 'sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits'
logits and labels must have the same first dimension, got logits shape [125,20] and labels shape [25]
I've also tried the following model, but the errors are similar.
model = keras.models.Sequential([
keras.layers.SimpleRNN(20, return_sequences=True),
keras.layers.SimpleRNN(20),
keras.layers.Dense(num_categories, activation="softmax"))
])
Does anybody have any recommendations about what I need to do to get this working?
Thanks.
I figured out the issue. The size of the time dimension needs to be the same for the training data and the target.
If you look at my original example code, the training data has these shapes
X.shape = (1, 25, 1)
Y.shape = (1, 5, 1)
To fix it, the time dimension should be the same.
X.shape = (1, 15, 1)
Y.shape = (1, 15, 1)
Here is the updated function that will let the model train. Note that all I did was update the array sizes so they are equally sized. The value of 15 is used because the original array length is 30.
def create_target(batch):
X = tf.cast(batch[:,:-15][:,:,None], tf.float32)
Y = batch[:,-15:][:,:,None]
return X,Y
I would like to use a RNN for time series prediction to use 96 backwards steps to predict 96 steps into the future. For this I have the following code:
#Import modules
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.preprocessing import StandardScaler
from tensorflow import keras
# Define the parameters of the RNN and the training
epochs = 1
batch_size = 50
steps_backwards = 96
steps_forward = 96
split_fraction_trainingData = 0.70
split_fraction_validatinData = 0.90
randomSeedNumber = 50
helpValueStrides = int(steps_backwards /steps_forward)
#Read dataset
df = pd.read_csv('C:/Users1/Desktop/TestValues.csv', sep=';', header=0, low_memory=False, infer_datetime_format=True, parse_dates={'datetime':[0]}, index_col=['datetime'])
# standardize data
data = df.values
indexWithYLabelsInData = 0
data_X = data[:, 0:3]
data_Y = data[:, indexWithYLabelsInData].reshape(-1, 1)
scaler_standardized_X = StandardScaler()
data_X = scaler_standardized_X.fit_transform(data_X)
data_X = pd.DataFrame(data_X)
scaler_standardized_Y = StandardScaler()
data_Y = scaler_standardized_Y.fit_transform(data_Y)
data_Y = pd.DataFrame(data_Y)
# Prepare the input data for the RNN
series_reshaped_X = np.array([data_X[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])
series_reshaped_Y = np.array([data_Y[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])
timeslot_x_train_end = int(len(series_reshaped_X)* split_fraction_trainingData)
timeslot_x_valid_end = int(len(series_reshaped_X)* split_fraction_validatinData)
X_train = series_reshaped_X[:timeslot_x_train_end, :steps_backwards]
X_valid = series_reshaped_X[timeslot_x_train_end:timeslot_x_valid_end, :steps_backwards]
X_test = series_reshaped_X[timeslot_x_valid_end:, :steps_backwards]
Y_train = series_reshaped_Y[:timeslot_x_train_end, steps_backwards:]
Y_valid = series_reshaped_Y[timeslot_x_train_end:timeslot_x_valid_end, steps_backwards:]
Y_test = series_reshaped_Y[timeslot_x_valid_end:, steps_backwards:]
# Build the model and train it
np.random.seed(randomSeedNumber)
tf.random.set_seed(randomSeedNumber)
model = keras.models.Sequential([
keras.layers.SimpleRNN(10, return_sequences=True, input_shape=[None, 3]),
keras.layers.SimpleRNN(10, return_sequences=True),
keras.layers.Conv1D(16, helpValueStrides, strides=helpValueStrides),
keras.layers.TimeDistributed(keras.layers.Dense(1))
])
model.compile(loss="mean_squared_error", optimizer="adam", metrics=['mean_absolute_percentage_error'])
history = model.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, validation_data=(X_valid, Y_valid))
#Predict the test data
Y_pred = model.predict(X_test)
prediction_lastValues_list=[]
for i in range (0, len(Y_pred)):
prediction_lastValues_list.append((Y_pred[i][0][1 - 1]))
# Create thw dataframe for the whole data
wholeDataFrameWithPrediciton = pd.DataFrame((X_test[:,1]))
wholeDataFrameWithPrediciton.rename(columns = {indexWithYLabelsInData:'actual'}, inplace = True)
wholeDataFrameWithPrediciton.rename(columns = {1:'Feature 1'}, inplace = True)
wholeDataFrameWithPrediciton.rename(columns = {2:'Feature 2'}, inplace = True)
wholeDataFrameWithPrediciton['predictions'] = prediction_lastValues_list
wholeDataFrameWithPrediciton['difference'] = (wholeDataFrameWithPrediciton['predictions'] - wholeDataFrameWithPrediciton['actual']).abs()
wholeDataFrameWithPrediciton['difference_percentage'] = ((wholeDataFrameWithPrediciton['difference'])/(wholeDataFrameWithPrediciton['actual']))*100
# Inverse the scaling (traInv: transformation inversed)
data_X_traInv = scaler_standardized_X.inverse_transform(data_X)
data_Y_traInv = scaler_standardized_Y.inverse_transform(data_Y)
series_reshaped_X_notTransformed = np.array([data_X_traInv[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])
X_test_notTranformed = series_reshaped_X_notTransformed[timeslot_x_valid_end:, :steps_backwards]
predictions_traInv = scaler_standardized_Y.inverse_transform(wholeDataFrameWithPrediciton['predictions'].values.reshape(-1, 1))
edictions_traInv = wholeDataFrameWithPrediciton['predictions'].values.reshape(-1, 1)
# Create thw dataframe for the inversed transformed data
wholeDataFrameWithPrediciton_traInv = pd.DataFrame((X_test_notTranformed[:,0]))
wholeDataFrameWithPrediciton_traInv.rename(columns = {indexWithYLabelsInData:'actual'}, inplace = True)
wholeDataFrameWithPrediciton_traInv.rename(columns = {1:'Feature 1'}, inplace = True)
wholeDataFrameWithPrediciton_traInv['predictions'] = predictions_traInv
wholeDataFrameWithPrediciton_traInv['difference_absolute'] = (wholeDataFrameWithPrediciton_traInv['predictions'] - wholeDataFrameWithPrediciton_traInv['actual']).abs()
wholeDataFrameWithPrediciton_traInv['difference_percentage'] = ((wholeDataFrameWithPrediciton_traInv['difference_absolute'])/(wholeDataFrameWithPrediciton_traInv['actual']))*100
wholeDataFrameWithPrediciton_traInv['difference'] = (wholeDataFrameWithPrediciton_traInv['predictions'] - wholeDataFrameWithPrediciton_traInv['actual'])
Here you can have some test data (don't care about the actual values as I made them up, just the shape is important) Download test data
How can the output of the Y_pred data be interpreted? Which of those values yields me the predicted values 96 steps into the future? I have attached a screenshot of the 'Y_pred' data. One time with 5 output neurons in the last layer and one time only with 1. Can anyone tell me, how to interpret the 'Y_pred' data meaning what exactly is the RNN predicting? I can use any values in the output (last layer ) of the RNN model. The 'Y_pred' data always has the shape (Batch size of X_test, timesequence, Number of output neurons). My question is targeting at the last dimension. I thought that these might be the features, but this is not true in my case, as I only have 1 output features (you can see that in the shape of the Y_train, Y_test and Y_valid data).
**Reminder **: The bounty is expiring soon and unfortunately I still have not received any answer. So I would like to remind you on the question and the bounty. I'll highly appreciate every comment.
It may be useful to step through the model inputs/outputs in detail.
When using the keras.layers.SimpleRNN layer with return_sequences=True, the output will return a 3-D tensor where the 0th axis is the batch size, the 1st axis is the timestep, and the 2nd axis is the number of hidden units (in the case for both SimpleRNN layers in your model, 10).
The Conv1D layer will produce an output tensor where the last dimension becomes the number of hidden units (in the case for your model, 16), as it's just being convolved with the input.
keras.layers.TimeDistributed, the layer supplied (in the example provided, Dense(1)) will be applied to each timestep in the batch independently. So with 96 timesteps, we have 96 outputs for each record in the batch.
So stepping through your model:
model = keras.models.Sequential([
keras.layers.SimpleRNN(10, return_sequences=True, input_shape=[None, 3]), # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 10)
keras.layers.SimpleRNN(10, return_sequences=True), # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 10)
keras.layers.Conv1D(16, helpValueStrides, strides=helpValueStrides) # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 16),
keras.layers.TimeDistributed(keras.layers.Dense(1)) # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 1)
])
To answer your question, the output tensor from your model contains the predicted values for 96 steps into the future, for each sample. If it's easier to conceptualize, for the case of 1 output, you can apply np.squeeze to the result of model.predict, which will make the output 2-D:
Y_pred = model.predict(X_test) # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 1)
Y_pred_squeezed = np.squeeze(Y_pred) # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS)
In that way, you have a rectangular matrix where each row corresponds to a sample in the batch, and each column i corresponds to the prediction for the timestep i.
In the loop after the prediction step, all the timestep predictions are being discarded except for the first one:
for i in range(0, len(Y_pred)):
prediction_lastValues_list.append((Y_pred[i][0][1 - 1]))
which means the end result is just a list of predictions for the first timestep for each sample in the batch. If you wanted the prediction for the 96th timestep, you could do:
for i in range(0, len(Y_pred)):
prediction_lastValues_list.append((Y_pred[i][-1][1 - 1]))
Notice the -1 instead of 0 for the second bracket, to ensure we grab the last predicted timestep instead of the first.
As a side note, to replicate the results, I had to make one change to your code, specifically when creating series_reshaped_X and series_reshaped_Y. I hit an exception when using np.array to create the array from the list: ValueError: cannot copy sequence with size 192 to array axis with dimension 3 , but looking at what you were doing (joining tensors along a new axis), I changed it to np.stack, which will accomplish the same goal (https://numpy.org/doc/stable/reference/generated/numpy.stack.html):
series_reshaped_X = np.stack([data_X[i:i + (steps_backwards + steps_forward)].copy() for i in
range(len(data) - (steps_backwards + steps_forward))])
series_reshaped_Y = np.stack([data_Y[i:i + (steps_backwards + steps_forward)].copy() for i in
range(len(data) - (steps_backwards + steps_forward))])
Update
"What are those 5 values representing when I only have 1 target feature?"
That's actually just the broadcasting feature of the Tensorflow API (which is also a feature of NumPy). If you perform an arithmetic operation on two tensors with differing shapes, it will try to make them compatible. In this case, if you change the output layer size to be "5" instead of "1" (keras.layers.Dense(5)), the output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 5) instead of (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 1), which just means the output from the convolutional layer is going into 5 neurons instead of 1. When the loss (mean squared error) is computed between the two, the size of the label tensor ((BATCH_SIZE, NUMBER_OF_TIMESTEPS, 1)) is broadcast to the size of the prediction tensor ((BATCH_SIZE, NUMBER_OF_TIMESTEPS, 5)). In this case, the broadcasting is accomplished by replicating the column. For example, if Y_train had [-1.69862224] in the first row for the first timestep, and Y_pred had [-0.6132075 , -0.6621697 , -0.7712653 , -0.60011995, -0.48753992] in the first row for the first timestep, to perform the subtraction operation, the entry in Y_train is converted to [-1.69862224, -1.69862224, -1.69862224, -1.69862224, -1.69862224].
And which of those 5 values is the "correct" value to choose for the 96 time step ahead prediciton?
There is no real "correct" value when trained this way - as detailed above, this just a feature of the API. All output should converge to the single target value for the timestep, they're all being compared to that value, so you could technically train that way, but it's just adding parameters and complexity to the model (and you would just have to choose one to be the "real" prediction). The correct approach for getting the prediction for 96 timesteps ahead is detailed in the original answer, but just to reiterate, the output of the model contains future timestep predictions for each sample in the batch. The output tensor could be iterated over to retrieve the predictions for each timestep, for each sample. Furthermore, ensure the number of neurons in the final dense layer matches the number of target values you are trying to predict, otherwise you'll hit the broadcasting issue (and the "correct" output will be unclear).
Just to be exhaustive (and I am not recommending this), if you really wanted to incorporate several neurons in the output despite only having one target value, you could do something like averaging the results:
for i in range(0, len(Y_pred)):
prediction_lastValues_list.append(np.mean(Y_pred[i][0]))
But there is absolutely no benefit to this approach, so I would recommend just sticking with the previous suggestion.
Update 2
Is my model only predicting one time slot which is 96 time steps into the future or is it also predicting everything in between?
The model is predicting everything in between. So for a sample at timestep t, the output of the model are predictions [t + 1, t + 2, ..., t + NUMBER_OF_TIMESTEPS]. Per my original answer, "the output tensor from your model contains the predicted values for 96 steps into the future, for each sample". To specify that in your evaluation code, you can do something like:
Y_pred = np.squeeze(Y_pred)
predictions_for_all_samples_and_timesteps = Y_pred.tolist()
This results in a list of length BATCH_SIZE, and each element in the list is a list of length NUMBER_OF_TIMESTEPS (to be clear, predictions_for_all_samples_and_timesteps is a list of lists). The element at index i in predictions_for_all_samples_and_timesteps contains the predictions for each timestep from 1-96 for the i^th sample (row) in X_test.
As a side note, you could omit np.squeeze, but then you will have a list of lists of lists, where each element in the inner list is a list of one item (instead of [[1, 2, 3, ...], ], the output would look like [[[1], [2], [3], ...], ].
Update 3
Y_test and Y_pred are both 3-D numpy arrays of size (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 1). To compare them, you can take the absolute (or squared) difference between the two:
abs_diff = np.abs(Y_pred - Y_test)
This results in an array of the same dimensions, (BATCH_SIZE, NUMBER_OF_TIMESTEPS). You can then iterate over the rows and generate a plot of the timestep error for each row.
for diff in abs_diff:
print(diff.shape)
plt.plot(list(range(diff)), diff)
It may get a bit unwieldy with a large batch size (as you can see in the image), so maybe you plot a subset of the rows. You can also transform the absolute difference to an error percentage if you would prefer to plot that:
percentage_diff = abs_diff / Y_test
which would be the absolute difference over the actual value, as I see you were originally doing in Pandas. This numpy array will have the same dimensions, so you can iterate over it and generate plots in the same fashion.
For future inquiries, instead of posting the comments, please open a new question and just provide the link - I would be happy to continue helping, but I would like to continue gaining reputation from it.
I disagree with #danielcahall on just one point:
The output tensor from your model contains the predicted values for 96 steps into the future, for each sample
The output does contain 96 time steps, one for each input time step, and you can take an output to mean whatever you want. But this is just not a good model for what you're trying to do. The main reason is that the RNNs you're using are single direction.
x x x x x x # input
| | | | | |
x-->x-->x-->x-->x-->x # SimpleRNN
| | | | | |
x-->x-->x-->x-->x-->x # SimpleRNN
| /|\ /|\ /|\ /|\ |
| / | \ | \ | \ | \ |
x x x x x x # Conv
| | | | | |
x x x x x x # Dense -> output
So the first time index of the output only sees the first 2 input times (thanks to the Conv), it can't see the later times. The first prediction is based only on old data. It's only the last few outputs that can see all the inputs.
use 96 backwards steps to predict 96 steps into the future
Most of the outputs just can't see all the data.
This model would be appropriate if you were trying to predict 1 step into the future from each of the input times.
To predict 96 steps into the future it would be much more reasonable to drop the return_sequences=True and the Conv layer. Then expand the Dense layer to make the prediction:
model = keras.models.Sequential([
keras.layers.SimpleRNN(10, return_sequences=True, input_shape=[None, 3]), # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 10)
keras.layers.SimpleRNN(10), # output size is (BATCH_SIZE, 10)
keras.layers.Dense(96) # output size is (BATCH_SIZE, 96)
])
That way all 96 predictions see all 96 inputs.
See https://www.tensorflow.org/tutorials/structured_data/time_series for more details.
Also SimpleRNN is terrible. Never use it over more than a couple of steps.
I have some data that is of shape 10000 x 1440 x 8 where 10000 is the number of days, 1440 the number of minutes and 8 is the number of features.
For each day, ie. each submatrix of size 1440 x 8 I wish to train an autoencoder and extract the weights from the second layer, such that my output will be a matrix output = 10000 x 8
I can do this in a loop with
import numpy as np
from keras.layers import Input, Dense
from keras import regularizers, models, optimizers
data = np.random.random(size=(10000,1440,8))
def AE(y, epochs=100,learning_rate = 1e-4, regularization = 5e-4, epochs=3):
input = Input(shape=(y.shape[1],))
encoded = Dense(1, activation='relu',
kernel_regularizer=regularizers.l2(regularization))(input)
decoded = Dense(y.shape[1], activation='relu',
kernel_regularizer=regularizers.l2(regularization))(encoded)
autoencoder = models.Model(input, decoded)
autoencoder.compile(optimizer=optimizers.Adam(lr=learning_rate), loss='mean_squared_error')
autoencoder.fit(y, y, epochs=epochs, batch_size=10, shuffle=False)
(w1,b1,w2,b2)=autoencoder.get_weights()
return (w1,b1,w2,b2)
lst = []
for i in range(data.shape[0]):
y = data[i]
(_, _, w2, _) = AE(y)
lst.append(w2[0])
output = np.array(lst)
However, this feels very stupid as surely I must be able to just pass the 3D data to the autoencoder and retrieve what I want. However, if I try modify the shape of input to be input = Input(shape=(y.shape[1],y.shape[2]))
I get an error
ValueError: Dimensions must be equal, but are 1440 and 8 for '{{node
mean_squared_error/SquaredDifference}} =
SquaredDifference[T=DT_FLOAT](model_778/dense_1558/Relu,
IteratorGetNext:1)' with input shapes: [?,1440,1440], [?,1440,8].
Any pointers on how to get the shape right?
Simply reshape your your data like so and call the function.
data = data.reshape(data.shape[0]*data.shape[1], -1)
(w1, b1, w2, b2) = AE(data)
print(w2.shape)
Your first layer of the NN is a Dense layer. You can only pass two dimensional data into it. One dimension will be batch size and the other dimension will be the feature vector. When you are using the data in the way you are using it, you are considering each data point independently. Which means that you can join the first two axes together and just pass it on to the NN. However, note that you would still need to modify the code so that you are not passing the entire dataset at once to the NN. You need to split the data into batches and loop over those before passing it on. And honestly, it's the same as what you are doing now. So your looping is not as bad as you think it is for what you are trying to do.
However, also note that you have a time series data and considering each datapoint as an independent point doesn't really make sense. You need an LSTM layer or something to learn the time series encoding.
Problem
I am using the Model API to create a Keras network that takes in two inputs and one output. When training the network I get the following error:
Error when checking model input: the list of Numpy arrays that you
are passing to your model is not the size the model expected. Expected
to see 2 array(s), but instead got the following list of 1 arrays:
Despite this error, the input X array has a shape of (2,8), and the output y array has a shape of (1,4).
Things already tried
There are a number of similar questions on SO, however, their solutions largely revolves around ensuring X and y are Numpy arrays. As seen in my implementation, I have already done that. Thus, I do not believe this is a duplicate question.
Implementation
I have defined the model as follows:
opt = Adam(lr = alpha)
input = Input(shape=(input_dim_,))
delta = Input(shape=[1])
l1 = Dense(units = 1024, input_dim = input_dim_, activation = "relu")(input)
l2 = Dense(units=512, activation="relu")(l1)
def loss_function (y,y_pred):
y_pred = K.clip(y_pred,1e-8,1-1e-8)
return K.sum(-y*K.log(y_pred)*delta)
if model_type == "actor":
out = Dense(units = output_dim_, activation="softmax")(l2)
model = Model(input=[input,delta], output = [out])
model.compile(loss = loss_function,optimizer=opt)
And train the model by doing the following:
X = [s_t,delta]
X = np.array(X)
actor.fit(X,y,verbose=0)
You are not passing the data correctly in fit:
actor.fit(X,y,verbose=0)
Here X should be a list containing two numpy arrays, and each numpy array corresponds to one of your inputs (you have a model with two inputs): So it should be more like this:
X = [np.array(s_t), np.array(delta)]
actor.fit(X, y, verbose=0)
Then it should work.
I am using model.predict() on a testing tensor, which has the same size of the input used for training, (N_tr*70,1025,11,3)
The model is trained by regression, with three outputs as ground-truth, each of size (N_te*70,1025).
For information, when testing the model N_te=180.
According to the documentation, the output of model.predict() should be a numpy tensor, instead I get a list of three elements, each with shape (N_te*70,1025).
I am afraid that the output might have been somehow shuffled (which would explain my unexpected results).
Do you have any advice to get a numpy array which is compatible to the one I used as ground-truth? If not, do you know any other work-around?
EDIT: added the neural network code
input_img = Input(shape=(1025, 11, 3 ) )
x = ( Flatten())(input_img)
for i in range(0,4):
x = ( Dense(1024*3))(x)
x = ( BatchNormalization() )(x)
x = ( LeakyReLU())(x)
o0 = ( Dense(1025, activation='sigmoid'))(x)
o1 = ( Dense(1025, activation='sigmoid'))(x)
o2 = ( Dense(1025, activation='sigmoid'))(x)
Model prediction:
output = model.predict(X_in, batch_size = batch_size, verbose=1)
It is expected that in a multi-output model, predict returns a list of numpy arrays, with each element being the corresponding output. Remember that loss is computed individually between each output and the ground truth, so this format is already ideas for that purpose.