I am trying to build an RNN based model in Tensorflow that takes a sequence of categorical values as an input, and sequence of categorical values as the output.
For example, if I have sequence of 30 values, the first 25 would be the training data, and the last 5 would be the target. Imagine the data is something like a person pressing keys on a computer keyboard and recording their key presses over time.
I've tried to feed the training data and targets into this model in different shapes, and I always get an error that indicates the data is in the wrong shape.
I've included a code sample that should run and demonstrate what I'm trying to do and the failure I'm seeing.
In the code sample, I've used windows for batches. So if there are 90 values in the sequence, the first 25 values would be the training data for the first batch, and the next 5 values would be the target. This next batch would be the next 30 values (25 training values, 5 target values).
import numpy as np
import tensorflow as tf
from tensorflow import keras
num_categories = 20
data_sequence = np.random.choice(num_categories, 10000)
def create_target(batch):
X = tf.cast(batch[:,:-5][:,:,None], tf.float32)
Y = batch[:,-5:][:,:,None]
return X,Y
def add_windows(data):
data = tf.data.Dataset.from_tensor_slices(data)
return data.window(20, shift=1, drop_remainder=True)
dataset = tf.data.Dataset.from_tensor_slices(data_sequence)
dataset = dataset.window(30, drop_remainder=True)
dataset = dataset.flat_map(lambda x: x.batch(30))
dataset = dataset.batch(5)
dataset = dataset.map(create_target)
model = keras.models.Sequential([
keras.layers.SimpleRNN(20, return_sequences=True),
keras.layers.SimpleRNN(20, return_sequences=True),
keras.layers.TimeDistributed(keras.layers.Dense(num_categories, activation="softmax"))
])
optimizer = keras.optimizers.Adam()
model.compile(loss="sparse_categorical_crossentropy", optimizer=optimizer)
model.fit(dataset, epochs=1)
The error I get when I run the above code is
Node: 'sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits'
logits and labels must have the same first dimension, got logits shape [125,20] and labels shape [25]
I've also tried the following model, but the errors are similar.
model = keras.models.Sequential([
keras.layers.SimpleRNN(20, return_sequences=True),
keras.layers.SimpleRNN(20),
keras.layers.Dense(num_categories, activation="softmax"))
])
Does anybody have any recommendations about what I need to do to get this working?
Thanks.
I figured out the issue. The size of the time dimension needs to be the same for the training data and the target.
If you look at my original example code, the training data has these shapes
X.shape = (1, 25, 1)
Y.shape = (1, 5, 1)
To fix it, the time dimension should be the same.
X.shape = (1, 15, 1)
Y.shape = (1, 15, 1)
Here is the updated function that will let the model train. Note that all I did was update the array sizes so they are equally sized. The value of 15 is used because the original array length is 30.
def create_target(batch):
X = tf.cast(batch[:,:-15][:,:,None], tf.float32)
Y = batch[:,-15:][:,:,None]
return X,Y
I would like to use a RNN for time series prediction to use 96 backwards steps to predict 96 steps into the future. For this I have the following code:
#Import modules
import pandas as pd
import numpy as np
import tensorflow as tf
from sklearn.preprocessing import StandardScaler
from tensorflow import keras
# Define the parameters of the RNN and the training
epochs = 1
batch_size = 50
steps_backwards = 96
steps_forward = 96
split_fraction_trainingData = 0.70
split_fraction_validatinData = 0.90
randomSeedNumber = 50
helpValueStrides = int(steps_backwards /steps_forward)
#Read dataset
df = pd.read_csv('C:/Users1/Desktop/TestValues.csv', sep=';', header=0, low_memory=False, infer_datetime_format=True, parse_dates={'datetime':[0]}, index_col=['datetime'])
# standardize data
data = df.values
indexWithYLabelsInData = 0
data_X = data[:, 0:3]
data_Y = data[:, indexWithYLabelsInData].reshape(-1, 1)
scaler_standardized_X = StandardScaler()
data_X = scaler_standardized_X.fit_transform(data_X)
data_X = pd.DataFrame(data_X)
scaler_standardized_Y = StandardScaler()
data_Y = scaler_standardized_Y.fit_transform(data_Y)
data_Y = pd.DataFrame(data_Y)
# Prepare the input data for the RNN
series_reshaped_X = np.array([data_X[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])
series_reshaped_Y = np.array([data_Y[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])
timeslot_x_train_end = int(len(series_reshaped_X)* split_fraction_trainingData)
timeslot_x_valid_end = int(len(series_reshaped_X)* split_fraction_validatinData)
X_train = series_reshaped_X[:timeslot_x_train_end, :steps_backwards]
X_valid = series_reshaped_X[timeslot_x_train_end:timeslot_x_valid_end, :steps_backwards]
X_test = series_reshaped_X[timeslot_x_valid_end:, :steps_backwards]
Y_train = series_reshaped_Y[:timeslot_x_train_end, steps_backwards:]
Y_valid = series_reshaped_Y[timeslot_x_train_end:timeslot_x_valid_end, steps_backwards:]
Y_test = series_reshaped_Y[timeslot_x_valid_end:, steps_backwards:]
# Build the model and train it
np.random.seed(randomSeedNumber)
tf.random.set_seed(randomSeedNumber)
model = keras.models.Sequential([
keras.layers.SimpleRNN(10, return_sequences=True, input_shape=[None, 3]),
keras.layers.SimpleRNN(10, return_sequences=True),
keras.layers.Conv1D(16, helpValueStrides, strides=helpValueStrides),
keras.layers.TimeDistributed(keras.layers.Dense(1))
])
model.compile(loss="mean_squared_error", optimizer="adam", metrics=['mean_absolute_percentage_error'])
history = model.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, validation_data=(X_valid, Y_valid))
#Predict the test data
Y_pred = model.predict(X_test)
prediction_lastValues_list=[]
for i in range (0, len(Y_pred)):
prediction_lastValues_list.append((Y_pred[i][0][1 - 1]))
# Create thw dataframe for the whole data
wholeDataFrameWithPrediciton = pd.DataFrame((X_test[:,1]))
wholeDataFrameWithPrediciton.rename(columns = {indexWithYLabelsInData:'actual'}, inplace = True)
wholeDataFrameWithPrediciton.rename(columns = {1:'Feature 1'}, inplace = True)
wholeDataFrameWithPrediciton.rename(columns = {2:'Feature 2'}, inplace = True)
wholeDataFrameWithPrediciton['predictions'] = prediction_lastValues_list
wholeDataFrameWithPrediciton['difference'] = (wholeDataFrameWithPrediciton['predictions'] - wholeDataFrameWithPrediciton['actual']).abs()
wholeDataFrameWithPrediciton['difference_percentage'] = ((wholeDataFrameWithPrediciton['difference'])/(wholeDataFrameWithPrediciton['actual']))*100
# Inverse the scaling (traInv: transformation inversed)
data_X_traInv = scaler_standardized_X.inverse_transform(data_X)
data_Y_traInv = scaler_standardized_Y.inverse_transform(data_Y)
series_reshaped_X_notTransformed = np.array([data_X_traInv[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])
X_test_notTranformed = series_reshaped_X_notTransformed[timeslot_x_valid_end:, :steps_backwards]
predictions_traInv = scaler_standardized_Y.inverse_transform(wholeDataFrameWithPrediciton['predictions'].values.reshape(-1, 1))
edictions_traInv = wholeDataFrameWithPrediciton['predictions'].values.reshape(-1, 1)
# Create thw dataframe for the inversed transformed data
wholeDataFrameWithPrediciton_traInv = pd.DataFrame((X_test_notTranformed[:,0]))
wholeDataFrameWithPrediciton_traInv.rename(columns = {indexWithYLabelsInData:'actual'}, inplace = True)
wholeDataFrameWithPrediciton_traInv.rename(columns = {1:'Feature 1'}, inplace = True)
wholeDataFrameWithPrediciton_traInv['predictions'] = predictions_traInv
wholeDataFrameWithPrediciton_traInv['difference_absolute'] = (wholeDataFrameWithPrediciton_traInv['predictions'] - wholeDataFrameWithPrediciton_traInv['actual']).abs()
wholeDataFrameWithPrediciton_traInv['difference_percentage'] = ((wholeDataFrameWithPrediciton_traInv['difference_absolute'])/(wholeDataFrameWithPrediciton_traInv['actual']))*100
wholeDataFrameWithPrediciton_traInv['difference'] = (wholeDataFrameWithPrediciton_traInv['predictions'] - wholeDataFrameWithPrediciton_traInv['actual'])
Here you can have some test data (don't care about the actual values as I made them up, just the shape is important) Download test data
How can the output of the Y_pred data be interpreted? Which of those values yields me the predicted values 96 steps into the future? I have attached a screenshot of the 'Y_pred' data. One time with 5 output neurons in the last layer and one time only with 1. Can anyone tell me, how to interpret the 'Y_pred' data meaning what exactly is the RNN predicting? I can use any values in the output (last layer ) of the RNN model. The 'Y_pred' data always has the shape (Batch size of X_test, timesequence, Number of output neurons). My question is targeting at the last dimension. I thought that these might be the features, but this is not true in my case, as I only have 1 output features (you can see that in the shape of the Y_train, Y_test and Y_valid data).
**Reminder **: The bounty is expiring soon and unfortunately I still have not received any answer. So I would like to remind you on the question and the bounty. I'll highly appreciate every comment.
It may be useful to step through the model inputs/outputs in detail.
When using the keras.layers.SimpleRNN layer with return_sequences=True, the output will return a 3-D tensor where the 0th axis is the batch size, the 1st axis is the timestep, and the 2nd axis is the number of hidden units (in the case for both SimpleRNN layers in your model, 10).
The Conv1D layer will produce an output tensor where the last dimension becomes the number of hidden units (in the case for your model, 16), as it's just being convolved with the input.
keras.layers.TimeDistributed, the layer supplied (in the example provided, Dense(1)) will be applied to each timestep in the batch independently. So with 96 timesteps, we have 96 outputs for each record in the batch.
So stepping through your model:
model = keras.models.Sequential([
keras.layers.SimpleRNN(10, return_sequences=True, input_shape=[None, 3]), # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 10)
keras.layers.SimpleRNN(10, return_sequences=True), # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 10)
keras.layers.Conv1D(16, helpValueStrides, strides=helpValueStrides) # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 16),
keras.layers.TimeDistributed(keras.layers.Dense(1)) # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 1)
])
To answer your question, the output tensor from your model contains the predicted values for 96 steps into the future, for each sample. If it's easier to conceptualize, for the case of 1 output, you can apply np.squeeze to the result of model.predict, which will make the output 2-D:
Y_pred = model.predict(X_test) # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 1)
Y_pred_squeezed = np.squeeze(Y_pred) # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS)
In that way, you have a rectangular matrix where each row corresponds to a sample in the batch, and each column i corresponds to the prediction for the timestep i.
In the loop after the prediction step, all the timestep predictions are being discarded except for the first one:
for i in range(0, len(Y_pred)):
prediction_lastValues_list.append((Y_pred[i][0][1 - 1]))
which means the end result is just a list of predictions for the first timestep for each sample in the batch. If you wanted the prediction for the 96th timestep, you could do:
for i in range(0, len(Y_pred)):
prediction_lastValues_list.append((Y_pred[i][-1][1 - 1]))
Notice the -1 instead of 0 for the second bracket, to ensure we grab the last predicted timestep instead of the first.
As a side note, to replicate the results, I had to make one change to your code, specifically when creating series_reshaped_X and series_reshaped_Y. I hit an exception when using np.array to create the array from the list: ValueError: cannot copy sequence with size 192 to array axis with dimension 3 , but looking at what you were doing (joining tensors along a new axis), I changed it to np.stack, which will accomplish the same goal (https://numpy.org/doc/stable/reference/generated/numpy.stack.html):
series_reshaped_X = np.stack([data_X[i:i + (steps_backwards + steps_forward)].copy() for i in
range(len(data) - (steps_backwards + steps_forward))])
series_reshaped_Y = np.stack([data_Y[i:i + (steps_backwards + steps_forward)].copy() for i in
range(len(data) - (steps_backwards + steps_forward))])
Update
"What are those 5 values representing when I only have 1 target feature?"
That's actually just the broadcasting feature of the Tensorflow API (which is also a feature of NumPy). If you perform an arithmetic operation on two tensors with differing shapes, it will try to make them compatible. In this case, if you change the output layer size to be "5" instead of "1" (keras.layers.Dense(5)), the output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 5) instead of (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 1), which just means the output from the convolutional layer is going into 5 neurons instead of 1. When the loss (mean squared error) is computed between the two, the size of the label tensor ((BATCH_SIZE, NUMBER_OF_TIMESTEPS, 1)) is broadcast to the size of the prediction tensor ((BATCH_SIZE, NUMBER_OF_TIMESTEPS, 5)). In this case, the broadcasting is accomplished by replicating the column. For example, if Y_train had [-1.69862224] in the first row for the first timestep, and Y_pred had [-0.6132075 , -0.6621697 , -0.7712653 , -0.60011995, -0.48753992] in the first row for the first timestep, to perform the subtraction operation, the entry in Y_train is converted to [-1.69862224, -1.69862224, -1.69862224, -1.69862224, -1.69862224].
And which of those 5 values is the "correct" value to choose for the 96 time step ahead prediciton?
There is no real "correct" value when trained this way - as detailed above, this just a feature of the API. All output should converge to the single target value for the timestep, they're all being compared to that value, so you could technically train that way, but it's just adding parameters and complexity to the model (and you would just have to choose one to be the "real" prediction). The correct approach for getting the prediction for 96 timesteps ahead is detailed in the original answer, but just to reiterate, the output of the model contains future timestep predictions for each sample in the batch. The output tensor could be iterated over to retrieve the predictions for each timestep, for each sample. Furthermore, ensure the number of neurons in the final dense layer matches the number of target values you are trying to predict, otherwise you'll hit the broadcasting issue (and the "correct" output will be unclear).
Just to be exhaustive (and I am not recommending this), if you really wanted to incorporate several neurons in the output despite only having one target value, you could do something like averaging the results:
for i in range(0, len(Y_pred)):
prediction_lastValues_list.append(np.mean(Y_pred[i][0]))
But there is absolutely no benefit to this approach, so I would recommend just sticking with the previous suggestion.
Update 2
Is my model only predicting one time slot which is 96 time steps into the future or is it also predicting everything in between?
The model is predicting everything in between. So for a sample at timestep t, the output of the model are predictions [t + 1, t + 2, ..., t + NUMBER_OF_TIMESTEPS]. Per my original answer, "the output tensor from your model contains the predicted values for 96 steps into the future, for each sample". To specify that in your evaluation code, you can do something like:
Y_pred = np.squeeze(Y_pred)
predictions_for_all_samples_and_timesteps = Y_pred.tolist()
This results in a list of length BATCH_SIZE, and each element in the list is a list of length NUMBER_OF_TIMESTEPS (to be clear, predictions_for_all_samples_and_timesteps is a list of lists). The element at index i in predictions_for_all_samples_and_timesteps contains the predictions for each timestep from 1-96 for the i^th sample (row) in X_test.
As a side note, you could omit np.squeeze, but then you will have a list of lists of lists, where each element in the inner list is a list of one item (instead of [[1, 2, 3, ...], ], the output would look like [[[1], [2], [3], ...], ].
Update 3
Y_test and Y_pred are both 3-D numpy arrays of size (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 1). To compare them, you can take the absolute (or squared) difference between the two:
abs_diff = np.abs(Y_pred - Y_test)
This results in an array of the same dimensions, (BATCH_SIZE, NUMBER_OF_TIMESTEPS). You can then iterate over the rows and generate a plot of the timestep error for each row.
for diff in abs_diff:
print(diff.shape)
plt.plot(list(range(diff)), diff)
It may get a bit unwieldy with a large batch size (as you can see in the image), so maybe you plot a subset of the rows. You can also transform the absolute difference to an error percentage if you would prefer to plot that:
percentage_diff = abs_diff / Y_test
which would be the absolute difference over the actual value, as I see you were originally doing in Pandas. This numpy array will have the same dimensions, so you can iterate over it and generate plots in the same fashion.
For future inquiries, instead of posting the comments, please open a new question and just provide the link - I would be happy to continue helping, but I would like to continue gaining reputation from it.
I disagree with #danielcahall on just one point:
The output tensor from your model contains the predicted values for 96 steps into the future, for each sample
The output does contain 96 time steps, one for each input time step, and you can take an output to mean whatever you want. But this is just not a good model for what you're trying to do. The main reason is that the RNNs you're using are single direction.
x x x x x x # input
| | | | | |
x-->x-->x-->x-->x-->x # SimpleRNN
| | | | | |
x-->x-->x-->x-->x-->x # SimpleRNN
| /|\ /|\ /|\ /|\ |
| / | \ | \ | \ | \ |
x x x x x x # Conv
| | | | | |
x x x x x x # Dense -> output
So the first time index of the output only sees the first 2 input times (thanks to the Conv), it can't see the later times. The first prediction is based only on old data. It's only the last few outputs that can see all the inputs.
use 96 backwards steps to predict 96 steps into the future
Most of the outputs just can't see all the data.
This model would be appropriate if you were trying to predict 1 step into the future from each of the input times.
To predict 96 steps into the future it would be much more reasonable to drop the return_sequences=True and the Conv layer. Then expand the Dense layer to make the prediction:
model = keras.models.Sequential([
keras.layers.SimpleRNN(10, return_sequences=True, input_shape=[None, 3]), # output size is (BATCH_SIZE, NUMBER_OF_TIMESTEPS, 10)
keras.layers.SimpleRNN(10), # output size is (BATCH_SIZE, 10)
keras.layers.Dense(96) # output size is (BATCH_SIZE, 96)
])
That way all 96 predictions see all 96 inputs.
See https://www.tensorflow.org/tutorials/structured_data/time_series for more details.
Also SimpleRNN is terrible. Never use it over more than a couple of steps.
Can I use Convolutional layers of keras without gpu support? I am getting errors when I use it on Colab with runtime as None.
My code looks like this:
model = tf.keras.Sequential()
model.add(layers.Conv1D(1,5, name='conv1', padding="same", activation='relu',data_format="channels_first", input_shape=(1,2048)))
# model.add(layers.LSTM(5, activation='tanh'))
model.add(layers.Flatten())
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(num_classes, activation='softmax'))
#model.summary()
model.compile(loss=tf.keras.losses.categorical_crossentropy,
optimizer=tf.keras.optimizers.SGD(lr=0.001, momentum=0.9),
metrics=['accuracy'])
x_train = train_value
y_train = train_label
x_test = test_value
y_test = test_label
print(np.shape(x_train)) #shape of x train is (4459, 1, 2048)
print(np.shape(x_test)) #shape of test is (1340,1,2048)
history = model.fit(x_train, y_train,
batch_size=100,
epochs=30,
verbose=1,
validation_data=(x_test, y_test)
)
It is running fine on GPU but gives following error on CPU:
InvalidArgumentError: Conv2DCustomBackpropFilterOp only supports NHWC.
[[{{node
training/SGD/gradients/gradients/conv1/conv1d_grad/Conv2DBackpropFilter}}]]
UnimplementedError: The Conv2D op currently only supports the NHWC
tensor format on the CPU. The op was given the format: NCHW [[{{node
conv1_1/conv1d}}]]
I have figured out that the problem is with the format of Input Data. My input data are vectors of size (1,2048). Can you please guide me on how to convert these vectors to NHWC format?
I would really appreciate it, if someone can clear this up for me.
Thanks in advance.
Per the Keras documentation
data_format: A string, one of "channels_last" (default) or "channels_first". The ordering of the dimensions in the inputs. "channels_last" corresponds to inputs with shape (batch, steps, channels) (default format for temporal data in Keras) while "channels_first" corresponds to inputs with shape (batch, channels, steps)
Now Keras in TensorFlow appears to implement Conv1D in terms of a Conv2D operator - basically forming an "image" with 1 row, W columns, and then your C "channels". That's why your getting error messages about image shapes when you don't have image data.
In the docs above "channels" are the number of data items per time step (e.g. perhaps you have 5 sensor readings at each time step so you'd have 5 channels). From your answers above I believe you're passing tensors with shape (n, 1, 2048) where n is your batch size. So, with channels_last TensorFlow thinks that means you have n examples in your batch each with a sequence length of 1 and 2048 data items per time step - that is only a single time step with 2048 data items per observation (e.g. 2048 sensor readings taken at each time step) in which case the convolution won't be doing a convolution - it'd be equivalent to a single dense layer taking all 2048 numbers as input.
I think in reality you have only a single data item per time step and you have 2048 time steps. That explains why passing channels_first improves your accuracy - now TensorFlow understand that your data represents 1 data item samples 2048 times and it can do a convolution over that data.
To fix you can just tf.reshape(t, (1, 2048, 1)) - and remove the channels_first (that code assumes you're doing batches of size 1 and your tensor is named t). Now it's in the format (n, s, 1) where n is the batch size (1 here), s is the number of time steps (2048), and 1 indicates one data point per time step. You can now run the same model on the GPU or CPU.
I am training a model that the feature shape is [3751,4] and I'd like to use reshape and layer dense function built in Tensorflow to make the output labels have the shape [1,6].
The training and testing set are very similar, the only difference is that the testing data set has less batches than training set.
Now I am having two hidden layers in my model that will do something like:
input_layer = tf.reshape(features["x"], [1,-1])
first_hidden_layer = tf.layers.dense(input_layer, 4, activation=tf.nn.relu)
second_hidden_layer = tf.layers.dense(first_hidden_layer, 5, activation=tf.nn.relu)
output_layer = tf.layers.dense(second_hidden_layer, 6, activation=tf.nn.relu)
This network structure is a function that both training and evaluating phase utilize.
Partial code for training is like :
nn = tf.estimator.Estimator(model_fn=model_fn, params=model_params, model_dir='/tmp/nmos_self_define')
train_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": train_features_numpy},
y=train_labels_numpy,
batch_size = 3751,
num_epochs=None,
shuffle=False)
# Train
nn.train(input_fn=train_input_fn, max_steps=5000)
And testing part is like:
test_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": test_features_numpy},
y=test_labels_numpy,
batch_size = 3751,
num_epochs= 1,
shuffle=False)
ev = nn.evaluate(input_fn=test_input_fn)
print("Loss: %s" % ev["loss"])
print("Root Mean Squared Error: %s" % ev["rmse"])
During training, there is no problem, the function can reshape the input data and do the dense part. During the testing, however, the tensor shape of the reshape function gives something like [1,?], which is different from the training phase ([1,15004]). And this caused the tf.layers.dense functions to fail because it cannot do the dense without knowing the actual shape of the tensor.
The only difference between training and testing from my perspective is the num_epochs, but that shouldn't affect the input shape right? I don't understand why Tensorflow can reshape the tensor with solid values during training while it thinks the testing data input set are dynamic?
Please help and thanks for taking the time reading my question.
What you are doing is flattening the input of multiple batches to a single feature vector of size 15004. What you are most probably trying to accomplish is to reduce the dimension of your feature to a 2D vector with shape (Batches, Nr Features), where Batches is dynamic. There are two common ways to do this. The easiest is to use the flatten layer from tf.contrib like this:
input_layer = tf.contrib.layers.flatten(features["x"])
or you can reshape in such a way that the batch dimension is still dynamic, but then you have to calculate the shape of your input like this:
num_dimensions = features["x"].shape.as_list[1] * features["x"].shape.as_list[2] ...
input_layer = tf.reshape(features["x"], [-1, num_dimensions])