Why Bother With Recurrent Neural Networks For Structured Data? - python

I have been developing feedforward neural networks (FNNs) and recurrent neural networks (RNNs) in Keras with structured data of the shape [instances, time, features], and the performance of FNNs and RNNs has been the same (except that RNNs require more computation time).
I have also simulated tabular data (code below) where I expected a RNN to outperform a FNN because the next value in the series is dependent on the previous value in the series; however, both architectures predict correctly.
With NLP data, I have seen RNNs outperform FNNs, but not with tabular data. Generally, when would one expect a RNN to outperform a FNN with tabular data? Specifically, could someone post simulation code with tabular data demonstrating a RNN outperforming a FNN?
Thank you! If my simulation code is not ideal for my question, please adapt it or share a more ideal one!
from keras import models
from keras import layers
from keras.layers import Dense, LSTM
import numpy as np
import matplotlib.pyplot as plt
Two features were simulated over 10 time steps, where the value of the second feature is dependent on the value of both features in the prior time step.
## Simulate data.
np.random.seed(20180825)
X = np.random.randint(50, 70, size = (11000, 1)) / 100
X = np.concatenate((X, X), axis = 1)
for i in range(10):
X_next = np.random.randint(50, 70, size = (11000, 1)) / 100
X = np.concatenate((X, X_next, (0.50 * X[:, -1].reshape(len(X), 1))
+ (0.50 * X[:, -2].reshape(len(X), 1))), axis = 1)
print(X.shape)
## Training and validation data.
split = 10000
Y_train = X[:split, -1:].reshape(split, 1)
Y_valid = X[split:, -1:].reshape(len(X) - split, 1)
X_train = X[:split, :-2]
X_valid = X[split:, :-2]
print(X_train.shape)
print(Y_train.shape)
print(X_valid.shape)
print(Y_valid.shape)
FNN:
## FNN model.
# Define model.
network_fnn = models.Sequential()
network_fnn.add(layers.Dense(64, activation = 'relu', input_shape = (X_train.shape[1],)))
network_fnn.add(Dense(1, activation = None))
# Compile model.
network_fnn.compile(optimizer = 'adam', loss = 'mean_squared_error')
# Fit model.
history_fnn = network_fnn.fit(X_train, Y_train, epochs = 10, batch_size = 32, verbose = False,
validation_data = (X_valid, Y_valid))
plt.scatter(Y_train, network_fnn.predict(X_train), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()
plt.scatter(Y_valid, network_fnn.predict(X_valid), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()
LSTM:
## LSTM model.
X_lstm_train = X_train.reshape(X_train.shape[0], X_train.shape[1] // 2, 2)
X_lstm_valid = X_valid.reshape(X_valid.shape[0], X_valid.shape[1] // 2, 2)
# Define model.
network_lstm = models.Sequential()
network_lstm.add(layers.LSTM(64, activation = 'relu', input_shape = (X_lstm_train.shape[1], 2)))
network_lstm.add(layers.Dense(1, activation = None))
# Compile model.
network_lstm.compile(optimizer = 'adam', loss = 'mean_squared_error')
# Fit model.
history_lstm = network_lstm.fit(X_lstm_train, Y_train, epochs = 10, batch_size = 32, verbose = False,
validation_data = (X_lstm_valid, Y_valid))
plt.scatter(Y_train, network_lstm.predict(X_lstm_train), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()
plt.scatter(Y_valid, network_lstm.predict(X_lstm_valid), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()

In practice even in NLP you see that RNNs and CNNs are often competitive. Here's a 2017 review paper that shows this in more detail. In theory it might be the case that RNNs can handle the full complexity and sequential nature of language better but in practice the bigger obstacle is usually properly training the network and RNNs are finicky.
Another problem that might have a chance of working would be to look at a problem like the balanced parenthesis problem (either with just parentheses in the strings or parentheses along with other distractor characters). This requires processing the inputs sequentially and tracking some state and might be easier to learn with a LSTM then a FFN.
Update:
Some data that looks sequential might not actually have to be treated sequentially. For example even if you provide a sequence of numbers to add since addition is commutative a FFN will do just as well as a RNN. This could also be true of many health problems where the dominating information is not of a sequential nature. Suppose every year a patient's smoking habits are measured. From a behavioral standpoint the trajectory is important but if you're predicting whether the patient will develop lung cancer the prediction will be dominated by just the number of years the patient smoked (maybe restricted to the last 10 years for the FFN).
So you want to make the toy problem more complex and to require taking into account the ordering of the data. Maybe some kind of simulated time series, where you want to predict whether there was a spike in the data, but you don't care about absolute values just about the relative nature of the spike.
Update2
I modified your code to show a case where RNNs perform better. The trick was to use more complex conditional logic that is more naturally modeled in LSTMs than FFNs. The code is below. For 8 columns we see that the FFN trains in 1 minute and reaches a validation loss of 6.3. The LSTM takes 3x longer to train but it's final validation loss is 6x lower at 1.06.
As we increase the number of columns the LSTM has a larger and larger advantage, especially if we added more complicated conditions in. For 16 columns the FFNs validation loss is 19 (and you can more clearly see the training curve as the model isn't able to instantly fit the data). In comparison the LSTM takes 11 times longer to train but has a validation loss of 0.31, 30 times smaller than the FFN! You can play around with even larger matrices to see how far this trend will extend.
from keras import models
from keras import layers
from keras.layers import Dense, LSTM
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import time
matplotlib.use('Agg')
np.random.seed(20180908)
rows = 20500
cols = 10
# Randomly generate Z
Z = 100*np.random.uniform(0.05, 1.0, size = (rows, cols))
larger = np.max(Z[:, :cols/2], axis=1).reshape((rows, 1))
larger2 = np.max(Z[:, cols/2:], axis=1).reshape((rows, 1))
smaller = np.min((larger, larger2), axis=0)
# Z is now the max of the first half of the array.
Z = np.append(Z, larger, axis=1)
# Z is now the min of the max of each half of the array.
# Z = np.append(Z, smaller, axis=1)
# Combine and shuffle.
#Z = np.concatenate((Z_sum, Z_avg), axis = 0)
np.random.shuffle(Z)
## Training and validation data.
split = 10000
X_train = Z[:split, :-1]
X_valid = Z[split:, :-1]
Y_train = Z[:split, -1:].reshape(split, 1)
Y_valid = Z[split:, -1:].reshape(rows - split, 1)
print(X_train.shape)
print(Y_train.shape)
print(X_valid.shape)
print(Y_valid.shape)
print("Now setting up the FNN")
## FNN model.
tick = time.time()
# Define model.
network_fnn = models.Sequential()
network_fnn.add(layers.Dense(32, activation = 'relu', input_shape = (X_train.shape[1],)))
network_fnn.add(Dense(1, activation = None))
# Compile model.
network_fnn.compile(optimizer = 'adam', loss = 'mean_squared_error')
# Fit model.
history_fnn = network_fnn.fit(X_train, Y_train, epochs = 500, batch_size = 128, verbose = False,
validation_data = (X_valid, Y_valid))
tock = time.time()
print()
print(str('%.2f' % ((tock - tick) / 60)) + ' minutes.')
print("Now evaluating the FNN")
loss_fnn = history_fnn.history['loss']
val_loss_fnn = history_fnn.history['val_loss']
epochs_fnn = range(1, len(loss_fnn) + 1)
print("train loss: ", loss_fnn[-1])
print("validation loss: ", val_loss_fnn[-1])
plt.plot(epochs_fnn, loss_fnn, 'black', label = 'Training Loss')
plt.plot(epochs_fnn, val_loss_fnn, 'red', label = 'Validation Loss')
plt.title('FNN: Training and Validation Loss')
plt.legend()
plt.show()
plt.scatter(Y_train, network_fnn.predict(X_train), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title('training points')
plt.show()
plt.scatter(Y_valid, network_fnn.predict(X_valid), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title('valid points')
plt.show()
print("LSTM")
## LSTM model.
X_lstm_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_lstm_valid = X_valid.reshape(X_valid.shape[0], X_valid.shape[1], 1)
tick = time.time()
# Define model.
network_lstm = models.Sequential()
network_lstm.add(layers.LSTM(32, activation = 'relu', input_shape = (X_lstm_train.shape[1], 1)))
network_lstm.add(layers.Dense(1, activation = None))
# Compile model.
network_lstm.compile(optimizer = 'adam', loss = 'mean_squared_error')
# Fit model.
history_lstm = network_lstm.fit(X_lstm_train, Y_train, epochs = 500, batch_size = 128, verbose = False,
validation_data = (X_lstm_valid, Y_valid))
tock = time.time()
print()
print(str('%.2f' % ((tock - tick) / 60)) + ' minutes.')
print("now eval")
loss_lstm = history_lstm.history['loss']
val_loss_lstm = history_lstm.history['val_loss']
epochs_lstm = range(1, len(loss_lstm) + 1)
print("train loss: ", loss_lstm[-1])
print("validation loss: ", val_loss_lstm[-1])
plt.plot(epochs_lstm, loss_lstm, 'black', label = 'Training Loss')
plt.plot(epochs_lstm, val_loss_lstm, 'red', label = 'Validation Loss')
plt.title('LSTM: Training and Validation Loss')
plt.legend()
plt.show()
plt.scatter(Y_train, network_lstm.predict(X_lstm_train), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title('training')
plt.show()
plt.scatter(Y_valid, network_lstm.predict(X_lstm_valid), alpha = 0.1)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.title("validation")
plt.show()

Related

How to replace loss function during training tensorflow.keras

I want to replace the loss function related to my neural network during training, this is the network:
model = tensorflow.keras.models.Sequential()
model.add(tensorflow.keras.layers.Conv2D(32, kernel_size=(3, 3), activation="relu", input_shape=input_shape))
model.add(tensorflow.keras.layers.Conv2D(64, (3, 3), activation="relu"))
model.add(tensorflow.keras.layers.MaxPooling2D(pool_size=(2, 2)))
model.add(tensorflow.keras.layers.Dropout(0.25))
model.add(tensorflow.keras.layers.Flatten())
model.add(tensorflow.keras.layers.Dense(128, activation="relu"))
model.add(tensorflow.keras.layers.Dropout(0.5))
model.add(tensorflow.keras.layers.Dense(output_classes, activation="softmax"))
model.compile(loss=tensorflow.keras.losses.categorical_crossentropy, optimizer=tensorflow.keras.optimizers.Adam(0.001), metrics=['accuracy'])
history = model.fit(x_train, y_train, batch_size=128, epochs=5, validation_data=(x_test, y_test))
so now I want to change tensorflow.keras.losses.categorical_crossentropy with another, so I made this:
model.compile(loss=tensorflow.keras.losses.mse, optimizer=tensorflow.keras.optimizers.Adam(0.001), metrics=['accuracy'])
history = model.fit(x_improve, y_improve, epochs=1, validation_data=(x_test, y_test)) #FIXME bug during training
but I have this error:
ValueError: No gradients provided for any variable: ['conv2d/kernel:0', 'conv2d/bias:0', 'conv2d_1/kernel:0', 'conv2d_1/bias:0', 'dense/kernel:0', 'dense/bias:0', 'dense_1/kernel:0', 'dense_1/bias:0'].
Why? How can I fix it? There is another way to change loss function?
Thanks
I'm currently working on google colab with Tensorflow and Keras and i was not able to recompile a model mantaining the weights, every time i recompile a model like this:
with strategy.scope():
model = hd_unet_model(INPUT_SIZE)
model.compile(optimizer=Adam(lr=0.01),
loss=tf.keras.losses.MeanSquaredError() ,
metrics=[tf.keras.metrics.MeanSquaredError()])
the weights gets resetted.
so i found an other solution, all you need to do is:
Get the model with the weights you want ( load it or something else )
gets the weights of the model like this:
weights = model.get_weights()
recompile the model ( to change the loss function )
set again the weights of the recompiled model like this:
model.set_weights(weights)
launch the training
i tested this method and it seems to work.
so to change the loss mid-Training you can:
Compile with the first loss.
Train of the first loss.
Save the weights.
Recompile with the second loss.
Load the weights.
Train on the second loss.
So, a straightforward answer I would give is: switch to pytorch if you want to play this kind of games. Since in pytorch you define your training and evaluation functions, it takes just an if statement to switch from a loss function to another one.
Also, I see in your code that you want to switch from cross_entropy to mean_square_error, the former is suitable for classification the latter for regression, so this is not really something you can do, in the code that follows I switched from mean squared error to mean squared logarithmic error, which are both loss suitable for regression.
Despite other answers offers solutions to your question (see change-loss-function-dynamically-during-training) it is not clear wether you can trust or not the results. Some people found that even with a customised function sometimes Keras keep training with the first loss.
Solution:
My solution is based on train_on_batch, which allows us to train a model in a for loop and therefore stop training it whenever we prefer to recompile the model with a new loss function. Please note that recompiling the model does not reset the weights (see:Does recompiling a model re-initialize the weights?).
The dataset can be found here Boston housing dataset
# Regression Example With Boston Dataset: Standardized and Larger
from pandas import read_csv
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split
from keras.losses import mean_squared_error, mean_squared_logarithmic_error
from matplotlib import pyplot
import matplotlib.pyplot as plt
# load dataset
dataframe = read_csv("housing.csv", delim_whitespace=True, header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:13]
y = dataset[:,13]
trainX, testX, trainy, testy = train_test_split(X, y, test_size=0.33, random_state=42)
# create model
model = Sequential()
model.add(Dense(13, input_dim=13, kernel_initializer='normal', activation='relu'))
model.add(Dense(6, kernel_initializer='normal', activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
batch_size = 25
# have to define manually a dict to store all epochs scores
history = {}
history['history'] = {}
history['history']['loss'] = []
history['history']['mean_squared_error'] = []
history['history']['mean_squared_logarithmic_error'] = []
history['history']['val_loss'] = []
history['history']['val_mean_squared_error'] = []
history['history']['val_mean_squared_logarithmic_error'] = []
# first compiling with mse
model.compile(loss='mean_squared_error', optimizer='adam', metrics=[mean_squared_error, mean_squared_logarithmic_error])
# define number of iterations in training and test
train_iter = round(trainX.shape[0]/batch_size)
test_iter = round(testX.shape[0]/batch_size)
for epoch in range(2):
# train iterations
loss, mse, msle = 0, 0, 0
for i in range(train_iter):
start = i*batch_size
end = i*batch_size + batch_size
batchX = trainX[start:end,]
batchy = trainy[start:end,]
loss_, mse_, msle_ = model.train_on_batch(batchX,batchy)
loss += loss_
mse += mse_
msle += msle_
history['history']['loss'].append(loss/train_iter)
history['history']['mean_squared_error'].append(mse/train_iter)
history['history']['mean_squared_logarithmic_error'].append(msle/train_iter)
# test iterations
val_loss, val_mse, val_msle = 0, 0, 0
for i in range(test_iter):
start = i*batch_size
end = i*batch_size + batch_size
batchX = testX[start:end,]
batchy = testy[start:end,]
val_loss_, val_mse_, val_msle_ = model.test_on_batch(batchX,batchy)
val_loss += val_loss_
val_mse += val_mse_
val_msle += msle_
history['history']['val_loss'].append(val_loss/test_iter)
history['history']['val_mean_squared_error'].append(val_mse/test_iter)
history['history']['val_mean_squared_logarithmic_error'].append(val_msle/test_iter)
# recompiling the model with new loss
model.compile(loss='mean_squared_logarithmic_error', optimizer='adam', metrics=[mean_squared_error, mean_squared_logarithmic_error])
for epoch in range(2):
# train iterations
loss, mse, msle = 0, 0, 0
for i in range(train_iter):
start = i*batch_size
end = i*batch_size + batch_size
batchX = trainX[start:end,]
batchy = trainy[start:end,]
loss_, mse_, msle_ = model.train_on_batch(batchX,batchy)
loss += loss_
mse += mse_
msle += msle_
history['history']['loss'].append(loss/train_iter)
history['history']['mean_squared_error'].append(mse/train_iter)
history['history']['mean_squared_logarithmic_error'].append(msle/train_iter)
# test iterations
val_loss, val_mse, val_msle = 0, 0, 0
for i in range(test_iter):
start = i*batch_size
end = i*batch_size + batch_size
batchX = testX[start:end,]
batchy = testy[start:end,]
val_loss_, val_mse_, val_msle_ = model.test_on_batch(batchX,batchy)
val_loss += val_loss_
val_mse += val_mse_
val_msle += msle_
history['history']['val_loss'].append(val_loss/test_iter)
history['history']['val_mean_squared_error'].append(val_mse/test_iter)
history['history']['val_mean_squared_logarithmic_error'].append(val_msle/test_iter)
# Some plots to check what is going on
# loss function
pyplot.subplot(311)
pyplot.title('Loss')
pyplot.plot(history['history']['loss'], label='train')
pyplot.plot(history['history']['val_loss'], label='test')
pyplot.legend()
# Only mean squared error
pyplot.subplot(312)
pyplot.title('Mean Squared Error')
pyplot.plot(history['history']['mean_squared_error'], label='train')
pyplot.plot(history['history']['val_mean_squared_error'], label='test')
pyplot.legend()
# Only mean squared logarithmic error
pyplot.subplot(313)
pyplot.title('Mean Squared Logarithmic Error')
pyplot.plot(history['history']['mean_squared_logarithmic_error'], label='train')
pyplot.plot(history['history']['val_mean_squared_logarithmic_error'], label='test')
pyplot.legend()
plt.tight_layout()
pyplot.show()
The resulting plot confirm that the loss function is changing after the second epoch:
The drop in the loss function is due to the fact that the model is switching from normal mean squared error to the logarithmic one, which has much lower values. Printing the scores also prove that the used loss truly changed:
print(history['history']['loss'])
[599.5209197998047, 570.4041115897043, 3.8622902120862688, 2.1578191178185597]
print(history['history']['mean_squared_error'])
[599.5209197998047, 570.4041115897043, 510.29034205845426, 425.32058388846264]
print(history['history']['mean_squared_logarithmic_error'])
[8.624503476279122, 6.346359729766846, 3.8622902120862688, 2.1578191178185597]
In the first two epochs the values of loss are equal to ones of mean_square_error and during the third and fourth epochs the values becomes equal to the ones of mean_square_logarithmic_error, which is the new loss that was set. So it seems that using train_on_batch allows to change loss function, nevertheless I want to stress out again that this is basically what one should do on pytoch to achieve the same results, with the difference that the behaviour of pytorch (in this scenario and in my opinion) is more reliable.

TF 2.0 MLP accuracy always zero

I've written a minimal example of a simple neural network that fits a given function (a multilayer perceptron for regression).
During the training process the loss decresses as expected and the model works fine. However, the accuracy remains constant and equal to 0.0 at all times, and I don't understand why. What am I missing here?
I guess there is some technical detail that prevents the accuracy from updating?
The training process and the resulting model can be seen in this link
Thank you very much for any help you can provide! ;)
PS- Here is a minimal example to reproduce this result:
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import gridspec
# Create TRAINING data
noise = 0.1
N=500
Xt = np.random.uniform(-np.pi, np.pi, size=(N,))
Yt = np.sin(Xt) + noise * np.random.uniform(-1,1,size=Xt.shape)
# Create VALIDATION data
Nv = int(0.1*N)
Xv = np.random.uniform(-np.pi, np.pi, size=(Nv,))
Yv = np.sin(Xv) + noise * np.random.uniform(-1,1,size=Xv.shape)
# Create model
model = Sequential()
model.add( Dense(10, activation='tanh',input_shape=(1,)) )
model.add( Dense(5, activation='tanh') )
model.add( Dense(1, activation=None) )
model.compile(optimizer='adam',
loss='mse',
metrics=['accuracy'])
# Fit & evaluate
history = model.fit(Xt, Yt, validation_data=(Xv,Yv),
epochs=100,
verbose=2)
results = model.evaluate(Xv, Yv,verbose=0)
print('\n\nEvaluating model, loss/acc:', results)
## PLOTS
fig = plt.figure()
gs = gridspec.GridSpec(2, 2)
ax1 = plt.subplot(gs[0,0]) # losses
ax2 = plt.subplot(gs[1,0], sharex=ax1) # accuracies
ax3 = plt.subplot(gs[:,1]) # data & model
# Plot learning curve
err = history.history['loss']
val_err = history.history['val_loss']
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
ax1.plot(err,label='loss')
ax1.plot(val_err,label='val_loss')
ax2.plot(acc,label='accuracy')
ax2.plot(val_acc,label='val_accuracy')
ax1.set_ylim(bottom=0)
ax2.set_ylim(bottom=-0.01)
ax1.legend()
ax2.legend()
# Plot test
# Generate "continous" data for pretty test
x = np.linspace(np.min(Xt),np.max(Xt),1000)
y = model.predict(x)
ax3.scatter(Xt, Yt, label='Training')
ax3.scatter(Xv, Yv, c='C2', label='Validation')
ax3.plot(x, y, 'C3-', lw=4, label='Model')
ax3.legend()
fig.tight_layout()
plt.show()
As Swier pointed out in the comment accuracy is meant for classification.
Nevertheless I thought that some points should yield the exact target value, that's why I was expecting acc>0.
Anyway I mapped the problem to a integer-only problem and in that scenario the accuracy is different from zero. Obviously not a useful metric, but at least it makes (mathematical) sense.
Thanks!!

Concatenating a time-series neural net with a feedforward neural net

Consider the following example problem:
# dummy data for a SO question
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
from keras.models import Model
from keras.layers import Input, Conv1D, Dense
from keras.optimizers import Adam, SGD
time = np.array(range(100))
brk = np.array((time>40) & (time < 60)).reshape(100,1)
B = np.array([5, -5]).reshape(1,2)
np.dot(brk, B)
y = np.c_[np.sin(time), np.sin(time)] + np.random.normal(scale = .2, size=(100,2))+ np.dot(brk, B)
plt.clf()
plt.plot(time, y[:,0])
plt.plot(time, y[:,1])
You've got N time series, and they've got one component that follows a common process, and another component that is idiosyncratic to the series itself. Assume for simplicity that you know a priori that the bump is between 40 and 60, and you want to model it simultaneously with the sinusoidal component.
A TCN does a good job on the common component, but it can't get the series-idiosyncratic component:
# time series model
n_filters = 10
filter_width = 3
dilation_rates = [2**i for i in range(7)]
inp = Input(shape=(None, 1))
x = inp
for dilation_rate in dilation_rates:
x = Conv1D(filters=n_filters,
kernel_size=filter_width,
padding='causal',
activation = "relu",
dilation_rate=dilation_rate)(x)
x = Dense(1)(x)
model = Model(inputs = inp, outputs = x)
model.compile(optimizer = Adam(), loss='mean_squared_error')
model.summary()
X_train = np.transpose(np.c_[time, time]).reshape(2,100,1)
y_train = np.transpose(y).reshape(2,100,1)
history = model.fit(X_train, y_train,
batch_size=2,
epochs=1000,
verbose = 0)
yhat = model.predict(X_train)
plt.clf()
plt.plot(time, y[:,0])
plt.plot(time, y[:,1])
plt.plot(time, yhat[0,:,:])
plt.plot(time, yhat[1,:,:])
On the other hand, a basic linear regression with N outputs (here implemented in Keras) is perfect for the idiosyncratic component:
inp1 = Input((1,))
x1 = inp1
x1 = Dense(2)(x1)
model1 = Model(inputs = inp1, outputs = x1)
model1.compile(optimizer = Adam(), loss='mean_squared_error')
model1.summary()
brk_train = brk
y_train = y
history = model1.fit(brk_train, y_train,
batch_size=100,
epochs=6000, verbose = 0)
yhat1 = model1.predict(brk_train)
plt.clf()
plt.plot(time, y[:,0])
plt.plot(time, y[:,1])
plt.plot(time, yhat1[:,0])
plt.plot(time, yhat1[:,1])
I want to use keras to jointly estimate the time series component and the idiosyncratic component. The major problem is that feed-forward networks (which linear regression is a special case of) take shape batch_size x dims while time series networks take dimension batch_size x time_steps x dims.
Because I want to jointly estimate the idiosyncratic part of the model (the linear regression part) together with the time series part, I'm only ever going to batch-sample whole time-series. Which is why I specified batch_size = time_steps for model 1.
But in the static model, what I'm really doing is modeling my data as time_steps x dims.
I have tried to re-cast the feed-forward model as a time-series model, without success. Here's the non-working approach:
inp3 = Input(shape = (None, 1))
x3 = inp3
x3 = Dense(2)(x3)
model3 = Model(inputs = inp3, outputs = x3)
model3.compile(optimizer = Adam(), loss='mean_squared_error')
model3.summary()
brk_train = brk.reshape(1, 100, 1)
y_train = np.transpose(y).reshape(2,100,1)
history = model3.fit(brk_train, y_train,
batch_size=1,
epochs=1000, verbose = 1)
ValueError: Error when checking target: expected dense_40 to have shape (None, 2) but got array with shape (100, 1)
I am trying to fit the same model as model1, but with a different shape, so that it is compatible with the TCN model -- and importantly so that it will have the same batching structure.
The output should ultimately have the shape (2, 100, 1) in this example. Basically I want the model to do the following algorithm:
ingest X of shape (N, time_steps, dims)
Lose the first dimension, because the design matrix is going to be identical for every series, yielding X1 of shape (time_steps, dims)
Forward step: np.dot(X1, W), where W is of dimension (dims, N), yielding X2 of dimension (time_steps, N)
Reshape X2 to (N, time_steps, 1). Then I can add it to the output of the other part of the model.
Backwards step: since this is just a linear model, the gradient of W with respect to the output is just X1
How can I implement this? Do I need a custom layer?
I'm building off of ideas in this paper, in case you're curious about the motivation behind all of this.
EDIT: After posting, I noticed that I used only the time variable, rather than the time series itself. A TCN fit with the lagged series fits the idiosyncratic part of the series just fine (in-sample anyway). But my basic question still stands -- I want to merge the two types of networks.
So, I solved my own problem. The answer is to create dummy interactions (and a thus a really sparse design matrix) and then reshape the data.
###########################
# interaction model
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
from keras.models import Model
from keras.layers import Input, Conv1D, Dense
from keras.optimizers import Adam, SGD
from patsy import dmatrix
def shift5(arr, num, fill_value=np.nan):
result = np.empty_like(arr)
if num > 0:
result[:num] = fill_value
result[num:] = arr[:-num]
elif num < 0:
result[num:] = fill_value
result[:num] = arr[-num:]
else:
result = arr
return result
time = np.array(range(100))
brk = np.array((time>40) & (time < 60)).reshape(100,1)
B = np.array([5, -5]).reshape(1,2)
np.dot(brk, B)
y = np.c_[np.sin(time), np.sin(time)] + np.random.normal(scale = .2, size=(100,2))+ np.dot(brk, B)
plt.clf()
plt.plot(time, y[:,0])
plt.plot(time, y[:,1])
# define interaction model
inp = Input(shape=(None, 2))
x = inp
x = Dense(1)(x)
model = Model(inputs = inp, outputs = x)
model.compile(optimizer = Adam(), loss='mean_squared_error')
model.summary()
from patsy import dmatrix
df = pd.DataFrame(data = {"fips": np.concatenate((np.zeros(100), np.ones(100))),
"brk": np.concatenate((brk.reshape(100), brk.squeeze()))})
df.brk = df.brk.astype(int)
tm = np.asarray(dmatrix("brk:C(fips)-1", data = df))
brkint = np.concatenate(( \
tm[:100,:].reshape(1,100,2),
tm[100:200,:].reshape(1,100,2)
), axis = 0)
y_train = np.transpose(y).reshape(2,100,1)
history = model.fit(brkint, y_train,
batch_size=2,
epochs=1000,
verbose = 1)
yhat = model.predict(brkint)
plt.clf()
plt.plot(time, y[:,0])
plt.plot(time, y[:,1])
plt.plot(time, yhat[0,:,:])
plt.plot(time, yhat[1,:,:])
The output shape is the same as for the TCN, and can simply be added element-wise.

Neural network has <0.001 validation and testing loss but 0% accuracy when doing a prediction

I've been training an MLP to predict the time remaining on an assembly sequence. The Training loss, Validation loss and MSE are all less 0.001, however, when I try to do a prediction with one of the datasets I trained the network with the it can't correctly identify any of the outputs from the set of inputs. What am I doing wrong that is producing this error?
I am also struggling to understand how, when the model is deployed, how do I perform the scaling of the result for one prediction? scaler.inverse_transform won't work because the data for that scaler used during training has been lost as the prediction would be done in a separate script to the training using the model the training produced. Is this information saved in the model builder?
I have tried to change the batch size during training, rounding the time column of the dataset to the nearest second (previously was 0.1 seconds), trained over 50, 100 and 200 epochs and I always end up with no correct predictions. I am also training an LSTM to see which is more accurate but that is also having the same issue. The dataset is split 70-30 training-testing and then training is then split 75-25 into training and validation.
Data scaling and model training code:
def scale_data(training_data, training_data_labels, testing_data, testing_data_labels):
# Create X and Y scalers between 0 and 1
x_scaler = MinMaxScaler(feature_range=(0, 1))
y_scaler = MinMaxScaler(feature_range=(0, 1))
# Scale training data
x_scaled_training = x_scaler.fit_transform(training_data)
y_scaled_training = y_scaler.fit_transform(training_data_labels)
# Scale testing data
x_scaled_testing = x_scaler.transform(testing_data)
y_scaled_testing = y_scaler.transform(testing_data_labels)
return x_scaled_training, y_scaled_training, x_scaled_testing, y_scaled_testing
def train_model(training_data, training_labels, testing_data, testing_labels, number_of_epochs, number_of_columns):
model_hidden_neuron_number_list = []
model_repeat_list = []
model_error_rate_list = []
for hidden_layer_1_units in range(int(np.floor(number_of_columns / 2)), int(np.ceil(number_of_columns * 2))):
print("Training starting, number of hidden units = %d" % hidden_layer_1_units)
for repeat in range(1, 6):
print("Repeat %d" % repeat)
model = k.Sequential()
model.add(Dense(hidden_layer_1_units, input_dim=number_of_columns,
activation='relu', name='hidden_layer_1'))
model.add(Dense(1, activation='linear', name='output_layer'))
model.compile(loss='mean_squared_error', optimizer='adam')
# Train Model
model.fit(
training_data,
training_labels,
epochs=number_of_epochs,
shuffle=True,
verbose=2,
callbacks=[logger],
batch_size=1024,
validation_split=0.25
)
# Test Model
test_error_rate = model.evaluate(testing_data, testing_labels, verbose=0)
print("Error on testing data is %.3f" % test_error_rate)
model_hidden_neuron_number_list.append(hidden_layer_1_units)
model_repeat_list.append(repeat)
model_error_rate_list.append(test_error_rate)
# Save Model
model_builder = tf.saved_model.builder.SavedModelBuilder("MLP/models/{hidden_layer_1_units}/{repeat}".format(hidden_layer_1_units=hidden_layer_1_units, repeat=repeat))
inputs = {
'input': tf.saved_model.build_tensor_info(model.input)
}
outputs = { 'time_remaining':tf.saved_model.utils.build_tensor_info(model.output)
}
signature_def = tf.saved_model.signature_def_utils.build_signature_def(
inputs=inputs,
outputs=outputs, method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME
)
model_builder.add_meta_graph_and_variables(
K.get_session(),
tags=[tf.saved_model.tag_constants.SERVING],
signature_def_map={tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY: signature_def
}
)
model_builder.save()
And then to do a prediction:
file_name = top_level_file_path + "./MLP/models/19/1/"
testing_dataset = pd.read_csv(file_path + os.listdir(file_path)[0])
number_of_rows = len(testing_dataset.index)
number_of_columns = len(testing_dataset.columns)
newcol = [number_of_rows]
max_time = testing_dataset['Time'].max()
for j in range(0, number_of_rows - 1):
newcol.append(max_time - testing_dataset.iloc[j].iloc[number_of_columns - 1])
x_scaler = MinMaxScaler(feature_range=(0, 1))
y_scaler = MinMaxScaler(feature_range=(0, 1))
# Scale training data
data_scaled = x_scaler.fit_transform(testing_dataset)
labels = pd.read_csv("Labels.csv")
labels_scaled = y_scaler.fit_transform(labels)
signature_key = tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY
input_key = 'input'
output_key = 'time_remaining'
with tf.Session(graph=tf.Graph()) as sess:
saved_model = tf.saved_model.loader.load(sess, [tf.saved_model.tag_constants.SERVING], file_name)
signature = saved_model.signature_def
x_tensor_name = signature[signature_key].inputs[input_key].name
y_tensor_name = signature[signature_key].outputs[output_key].name
x = sess.graph.get_tensor_by_name(x_tensor_name)
y = sess.graph.get_tensor_by_name(y_tensor_name)
#np.expand_dims(data_scaled[600], axis=0)
predictions = sess.run(y, {x: data_scaled})
predictions = y_scaler.inverse_transform(predictions)
#print(np.round(predictions, 2))
correct_result = 0
for i in range(0, number_of_rows):
correct_result = 0
print(np.round(predictions[i]), " ", np.round(newcol[i]))
if np.round(predictions[i]) == np.round(newcol[i]):
correct_result += 1
print((correct_result/number_of_rows)*100)
The output of the first row should 96.0 but it produces 110.0, the last should be 0.1 but is -40.0 when no negatives appear in the dataset.
You can't compute accuracy when you do regression. Compute the mean squared error on the test set as well.
Second, when it comes to the scalers, you always do scaler.fit_transform on the training date so the scaler will compute the parameters (in this case min and max if you use min-max scaler) on the training data. Then, when performing inference on the test set, you should only do scaler.transform prior to feeding the data to the model.

Out of sample prediction/forecasting for uni-variate time series in Keras

I am using this Kaggle guide to do time series forecasting (sample data attached).
Here's the code:
def create_dataset(dataset, window_size = 1):
data_X, data_Y = [], []
for i in range(len(dataset) - window_size - 1):
a = dataset[i:(i + window_size), 0]
data_X.append(a)
data_Y.append(dataset[i + window_size, 0])
return(np.array(data_X), np.array(data_Y))
def fit_model(train_X, train_Y, window_size = 1):
model = Sequential()
model.add(LSTM(4,
input_shape = (1, window_size)))
model.add(Dense(1))
model.compile(loss = "mean_squared_error",
optimizer = "adam")
model.fit(train_X,
train_Y,
epochs = 100,
batch_size = 1,
verbose = 0)
return(model)
def predict_and_score(model, X, Y):
# Make predictions on the original scale of the data.
pred = MinMaxScaler(feature_range = (0,1)).inverse_transform(model.predict(X))
# Prepare Y data to also be on the original scale for interpretability.
orig_data = MinMaxScaler(feature_range = (0,1)).inverse_transform([Y])
# Calculate RMSE.
score = math.sqrt(mean_squared_error(orig_data[0], pred[:, 0]))
return(score, pred)
This entire thing is being used in the following function:
def nnet(time_series, window_size=1, ):
cmi_total_raw = vstack((time_series.values.astype('float32')))
scaler = MinMaxScaler(feature_range = (0,1))
cmi_total_scaled = scaler.fit_transform(cmi_total_raw)
cmi_train_sc = (cmi_total_scaled[0:int(cmi_split*len(cmi_total_scaled))])
cmi_test_sc = cmi_total_scaled[int(cmi_split*len(cmi_total_scaled)) : len(cmi_total_scaled)]
# Create test and training sets for one-step-ahead regression.
window_size = 1
train_X, train_Y = create_dataset(cmi_train_sc, window_size)
test_X, test_Y = create_dataset(cmi_test_sc, window_size)
# Reshape the input data into appropriate form for Keras.
train_X = np.reshape(train_X, (train_X.shape[0], 1, train_X.shape[1]))
test_X = np.reshape(test_X, (test_X.shape[0], 1, test_X.shape[1]))
model = fit_model(train_X, train_Y, window_size)
rmse_train, train_predict = predict_and_score(nn_model, train_X, train_Y)
mape_test, test_predict = predict_and_score(model, test_X, test_Y)
return (mape_test, test_predict)
As far as I understand, it is creating a model based on training data and predicting on in-sample test set and finally calculates the error.
The input data has 209 rows and I want to predict the next row(s).
Here's what I tried:
Since the same thing is done in Auto-Arima using forecast(steps= n_steps) method, I looked for something similar in Keras.
From Keras documentation:
predict(x, batch_size=None, verbose=0, steps=None)
Arguments:
x: The input data, as a Numpy array (or list of Numpy arrays if the model has multiple inputs).
steps: Total number of steps (batches of samples) before declaring the prediction round finished. Ignored with the default value of None.
I tried changing step and it predicted very absurd values of the order of 100,000. Moreover, length of the test_predict was no way near the steps I gave. So I am assuming step means something else here.
Question
- Can Keras even be used to forecast time series data (out of sample)
- If yes, is there a forecast method just as there the aforementioned predict method?
- If no, can the existing predict method be used in any way to get out of sample forecast?
Sample data (cmi_total):
2014-05-25 272.459887
2014-06-01 272.446022
2014-06-08 330.301260
2014-06-15 656.838394
2014-06-22 670.575110

Categories

Resources