LSTM Predict number from sequence of numbers - python

I am trying to train an LSTM to predict some numbers from a sequence of number. My X dataset has 33 features and my Y dataset has 4 variables that I have to predict for each X sample. For example:
Xdf:
Ydf:
After I turn X, y to numpy arrays and reshape the data I build the below model:
I get the following results with huge loss and distinct overfitting. Any ideas what I am doing wrong and I am getting these results? Thanks.
Xdf: https://drive.google.com/file/d/1wH56E0M3ok1MGWzU6FGKgDJF7EZfL_7f/view?usp=sharing
ydf: https://drive.google.com/file/d/1RkjWl1FIiQDyjkRvl7ZQKTOXIE8FtBXA/view?usp=sharing
UPDATE
I edited my code the following way and it seems to bee working a tiny bit better. Still huge loss and overfitting but after some epochs it seemed it worked fine for a while. Any suggestions on how to reduce loss and overfitting would be appreciated.
X = X.reshape((X.shape[0], 33, 1)) # (5850, 33, 1)
model = tf.keras.Sequential()
model.add(layers.LSTM(50, activation='relu', input_shape=(33, 1)))
model.add(layers.Dense(4))
model.add(layers.Dense(4))
model.add(layers.Dense(4))
early_stopping = EarlyStopping(monitor='val_loss', patience=42)
model.compile(optimizer=tf.keras.optimizers.Adam(0.01), loss=tf.keras.losses.MeanSquaredError(), metrics=['accuracy'])
model.fit(X, y, epochs=200, verbose=1, validation_split = 0.2)

Related

Inverse scale of predicted data in Keras

Im trying to use a NN model to predict with new data. However predicted data is not of the correct scale (values obtained 1e-10 when it should be 0.3 etc).
In my model ive used minmaxscaler on the x and y data. The model gave me an R2 value of 0.9 when using the test train split method, and and MSE of 0.01% using a pipeline method and also the cross val method. So i believe the model ive created is ok.
here is the model ive made.
data=pd.read_csv(r'''F:\DataforANNfromIESFebAugPowerValues.csv''')
data.dropna(axis=0,how='all')
x=data[['Dry-bulb_temperature_C','Wind_speed_m/s','Cloud_cover_oktas','External_relative_humidity_%','Starrag1250','StarragEcospeed2538','StarragS191','StarragLX051','DoosanCNC6700','MakinoG7','HermleC52MT','WFL_Millturn','Hofler1350','MoriNT4250','MoriNT5400','NMV8000','MoriNT6600','MoriNVL1350','HermleC42','CFV550','MoriDura635','DMGUltrasonic10']]
y=data[['Process_heat_output_waste_kW','Heating_plant_sensible_load_kW','Cooling_plant_sensible_load_kW','Relative_humidity_%','Air_temperature_C','Total_electricity_kW','Chillers_energy_kW','Boilers_energy_kW']]
epochs=150
learning_rate=0.001
decay_rate=learning_rate/epochs
optimiser=keras.optimizers.Nadam(lr=learning_rate, schedule_decay=decay_rate)
def create_model():
model=Sequential()
model.add(Dense(21, input_dim=22, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(19, activation='relu')) #hidden layer 2
model.add(Dropout(0.2))
model.add(Dense(8, activation='sigmoid')) #output layer
model.compile(loss='mean_squared_error', optimizer=optimiser,metrics=['accuracy','mse'])
return model
scaler=MinMaxScaler()
x=MinMaxScaler().fit_transform(x)
print(x)
y=MinMaxScaler().fit_transform(y)
model=KerasRegressor(build_fn=create_model, verbose=0,epochs=150, batch_size=70)
model.fit(x, y, epochs=150, batch_size=70)
##SET UP NEW DATA FOR PREDICTIONS
xnewdata=pd.read_csv(r'''F:\newdatapowervalues.csv''')
xnewdata.dropna(axis=0,how='all')
xnew=xnewdata[['Dry-bulb_temperature_C','Wind_speed_m/s','Cloud_cover_oktas','External_relative_humidity_%','Starrag1250','StarragEcospeed2538','StarragS191','StarragLX051','DoosanCNC6700','MakinoG7','HermleC52MT','WFL_Millturn','Hofler1350','MoriNT4250','MoriNT5400','NMV8000','MoriNT6600','MoriNVL1350','HermleC42','CFV550','MoriDura635','DMGUltrasonic10']]
xnew=MinMaxScaler().fit_transform(xnew)
ynew=model.predict(xnew)
ynewdata=pd.DataFrame(data=ynew)
ynewdata.to_csv(r'''F:\KerasIESPowerYPredict.csv''',header=['Process_heat_output_waste_kW','Heating_plant_sensible_load_kW','Cooling_plant_sensible_load_kW','Relative_humidity_%','Air_temperature_C','Total_electricity_kW','Chillers_energy_kW','Boilers_energy_kW'])
seeing ive used the scaler on the inital training model, i thought i would also need to do this to the new data. Ive tried doing
scaler.inverse_transform(ynew)
after model.predict(ynew) however i get the error that the minmaxscaler instance isnt fitted to y yet.
Therefore, i tried using the pipeline method.
estimators = []
estimators.append(('standardize', MinMaxScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=create_model, epochs=150, batch_size=70, verbose=0)))
pipeline = Pipeline(estimators)
pipeline.fit(x,y)
for the inital training model instead of
x=MinMaxScaler().fit_transform(x)
y=MinMaxScaler().fit_transform(y)
model=KerasRegressor(build_fn=create_model, verbose=0,epochs=150, batch_size=70)
model.fit(x, y, epochs=150, batch_size=70)
i then used
ynew=pipeline.predict(xnew)
however this gave me data consisting mainly of 1's!
any idea on how i can predict correctly on this new data? im unsure which data to scale and which not too, as i believe that using the pipeline.predict would include scaling for x and y. therefore do i need some sort of inverse pipeline scalar after making these predictions?
many thanks for your help.
There is one minor and one major problem with your approach.
Minor one: there's no need to scale your target variable, it does not affect your optimisation function.
Major one: you fit the scaler again on the data on which you want to run the prediction. By doing this, you skew completely the relations you have in the data and hence the predicted output is of a very different scale. Also, you define scaler and later not use it. Let's fix it.
(...)
scaler=MinMaxScaler()
x=scaler.fit_transform(x)
model=KerasRegressor(build_fn=create_model, verbose=0,epochs=150, batch_size=70)
model.fit(x, y, epochs=150, batch_size=70)
##SET UP NEW DATA FOR PREDICTIONS
xnewdata=pd.read_csv(r'''F:\newdatapowervalues.csv''')
xnewdata.dropna(axis=0,how='all')
xnew=xnewdata[['Dry-bulb_temperature_C','Wind_speed_m/s','Cloud_cover_oktas','External_relative_humidity_%','Starrag1250','StarragEcospeed2538','StarragS191','StarragLX051','DoosanCNC6700','MakinoG7','HermleC52MT','WFL_Millturn','Hofler1350','MoriNT4250','MoriNT5400','NMV8000','MoriNT6600','MoriNVL1350','HermleC42','CFV550','MoriDura635','DMGUltrasonic10']]
xnew=scaler.transform(xnew)
ynew=model.predict(xnew)
ynewdata=pd.DataFrame(data=ynew)
As you can see, we used the scaler first to learn the proper normnalisation factor and then used it (transform) on the new data on which we run predict.

Using fit_generator in Keras Model

I'm trying to train a neural network using Keras and Tensorflow backend. My X is text descriptions which I have processed and transformed into sequences. Now, my y is a sparse matrix since it's a multi-label classification and I have many output classes.
>>> y
<30405x3387 sparse matrix of type '<type 'numpy.int64'>'
with 54971 stored elements in Compressed Sparse Row format>
To train the model, I tried defining a batch generator:
def batch_generator(x, y, batch_size=32):
n_batches_per_epoch = x.shape[0]//batch_size
for i in range(n_batches_per_epoch):
index_batch = range(x.shape[0])[batch_size*i:batch_size*(i+1)]
x_batch = x[index_batch,:]
y_batch = y[index_batch,:].todense()
yield x_batch, np.array(y_batch)
I've divided my data as:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)
I define my model as:
model = Sequential()
# Create architecture, add some layers.
model.add(Dense(num_classes))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
And I'm training my model as:
model.fit_generator(generator=batch_generator(x_train, y_train), steps_per_epoch=(x_train[0]/32), epochs=200, callbacks=the_callbacks)
But my model starts with around 55% accuracy and it quickly (in 2 or 3 steps) becomes 99.95%, which makes no sense at all. Am I doing something wrong?
You'll need to switch your loss to "categorical_crossentropy" or change your metric to "crossentropy" for multiclass classification.
The "accuracy" metric is actually ambiguous behind the scenes in Keras- it picks binary or multiclass accuracy based on the loss function used.
https://github.com/keras-team/keras/blob/master/keras/engine/training.py#L375
If you have two classes you can use sigmoid activation in the last layer and binary cross entropy loss function. But, if you have more than one classes, then you have to replace sigmoid with softmax and binary with categorical cross entropy.
There could be multiple other reasons for the abrupt change in accuracy depending upon your data distribution, model configuration etc. etc.

Keras regression prediction is not same dimension as output dimension

Hello I'm trying to do Energy Disaggregation (predict the energy use of appliances while given the total energy consumption of a certain household.)
Now I have an input dimension of 2 because of 2 main energy measurements.
The output dimension of the Keras Sequential model should be 18 because I have 18 appliances I would like to make a prediction for.
I have enough data using the REDD dataset (this is no problem).
I have trained the model and gained reasonable loss and accuracy.
But when I want to make a prediction for some test data, the prediction consists of values in a 1-dimensional array. Meanwhile the outputs are 18-dimensional?
How is this possible or am I trying something that isn't really viable?
Some code:
model = Sequential()
model.add(Dense(HIDDEN_LAYER_NEURONS,input_dim=2))
model.add(Activation('relu'))
model.add(Dense(18))
model.compile(loss=LOSS,
optimizer=OPTIMIZER,
metrics=['accuracy'])
model.fit(X_train, y_train, epochs=EPOCHS, batch_size=BATCH_SIZE,
verbose=1, validation_split=VALIDATION_SPLIT)
pred = model.predict(X_test).reshape(-1)
pred.shape # prints the following 1 dimensional array: (xxxxx,) dimensional
The ALL_CAPS variables are constants.
X_train is 2-dim
y_train is 18-dim
Any help is appreciated!
Well you are reshaping the predictions and flattening them here:
pred = model.predict(X_test).reshape(-1)
The reshape(-1) effectively makes the array one-dimensional. Just take the predictions directly:
pred = model.predict(X_test)

Forming a Multi input LSTM in Keras

I am trying to predict neutron widths from resonance energies, using a Neural Network (I'm quite new to Keras/NNs in general so apologies in advance).
There is said to be a link between resonance energies and neutron widths, and the similiarities between energy increasing monotonically this can be modelled similiar to a time series problem.
In essences I have 2 columns of data with the first column being resonance energy and the other column containing the respective neutron width on each row. I have decided to use an LSTM layer to help in the networks predict by utlising previous computations.
From various tutorials and other answers, it seems common to use a "look_back" argument to allow the network to use previous timesteps to help predict the current timestep when creating the dataset e.g
trainX, trainY = create_dataset(train, look_back)
I would like to ask regarding forming the NN:
1) Given my particular application do I need to explicitly map each resonance energy to its corresponding neutron width on the same row?
2) Look_back indicates how many previous values the NN can use to help predict the current value, but how is it incorporated with the LSTM layer? I.e I dont quite understand how both can be used?
3) At which point do I inverse the MinMaxScaler?
That is the main two queries, for 1) I have assumed its okay not to, for 2) I believe it is possible but I dont really understand how. I can't quite work out what I have done wrong in the code, ideally I would like to plot the relative deviation of predicted to reference values in the train and test data once the code works. Any advice would be much appreciated:
import numpy
import matplotlib.pyplot as plt
import pandas
import math
from keras.models import Sequential
from keras.layers import Dense, LSTM, Dropout
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
# convert an array of values into a dataset matrix
def create_dataset(dataset, look_back=1):
dataX, dataY = [], []
for i in range(len(dataset) - look_back - 1):
a = dataset[i:(i + look_back), 0]
dataX.append(a)
dataY.append(dataset[i + look_back, 1])
return numpy.array(dataX), numpy.array(dataY)
# fix random seed for reproducibility
numpy.random.seed(7)
# load the dataset
dataframe = pandas.read_csv('CSVDataFe56Energyneutron.csv', engine='python')
dataset = dataframe.values
print("dataset")
print(dataset.shape)
print(dataset)
# normalize the dataset
scaler = MinMaxScaler(feature_range=(0, 1))
dataset = scaler.fit_transform(dataset)
print(dataset)
# split into train and test sets
train_size = int(len(dataset) * 0.67)
test_size = len(dataset) - train_size
train, test = dataset[0:train_size, :], dataset[train_size:len(dataset), :]
# reshape into X=t and Y=t+1
look_back = 3
trainX, trainY = create_dataset(train, look_back)
testX, testY = create_dataset(test, look_back)
# reshape input to be [samples, time steps, features]
trainX = numpy.reshape(trainX, (trainX.shape[0], look_back, 1))
testX = numpy.reshape(testX, (testX.shape[0],look_back, 1))
# # create and fit the LSTM network
#
number_of_hidden_layers=16
model = Sequential()
model.add(LSTM(6, input_shape=(look_back,1)))
for x in range(0, number_of_hidden_layers):
model.add(Dense(50, activation='relu'))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
history= model.fit(trainX, trainY, nb_epoch=200, batch_size=32)
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)
print('Train Score: %.2f MSE (%.2f RMSE)' % (trainScore, math.sqrt(trainScore)))
testScore = model.evaluate(testX, testY, verbose=0)
print('Test Score: %.2f MSE (%.2f RMSE)' % (testScore, math.sqrt(testScore)))
1) Given my particular application do I need to explicitly map each
resonance energy to its corresponding neutron width on the same row?
Yes you have to do that. Basically your data has to be in a shape of.
X=[timestep, timestep,...] y=[label, label,...]
2) Look_back indicates how many previous values the NN can use to help
predict the current value, but how is it incorporated with the LSTM
layer? I.e I dont quite understand how both can be used?
A LSTM is a sequence aware layer. You can think about it as a hidden markov model. It takes the first timestep, calculates something and in the next timestep the previous calculation is considered. Look_back, with is usually called sequence_length is just the maximum number of timesteps.
3) At which point do I inverse the MinMaxScaler?
Why should you do that? Furthermore, you don´t need to scale your input.
It seems like you have a general misconception in your model. If you have input_shape=(look_back,1) you don´t need LSTMs at all. If your sequence is just sequence of single values, it might be better to avoid LSTMs. Furthermore, fitting your model should include validation after each epoch to track the loss and validation performance.
model.fit(x_train, y_train,
batch_size=32,
epochs=200,
validation_data=[x_test, y_test],
verbose=1)

Keras LSTM - Validation Loss Increasing From Epoch #1

I'm currently undertaking my first 'real' DL project of (surprise) predicting stock movements. I know that I'm 1000:1 to make anything useful but I'm enjoying it and want to see it through, I've learnt more in my few weeks of attempting this than I have in the prior 6 months of completing MOOC's.
I'm building an LSTM using Keras to currently predict the next 1 step forward and have attempted the task as both classification (up/down/steady) and now as a regression problem. Both result in a similar roadblock in that my validation loss never improves from epoch #1.
I can get the model to overfit such that training loss approaches zero with MSE (or 100% accuracy if classification), but at no stage does the validation loss decrease. This screams overfitting to my untrained eye so I added varying amounts of dropout but all that does is stifle the learning of the model/training accuracy and shows no improvements on the validation accuracy.
I have attempted to change a significant number of hyperparameters - learning rate, optimiser, batchsize, lookback window, #layers, #units, dropout, #samples, etc, also tried with subset of data and subset of features but I just can't get it to work so I'm very thankful for any help.
Code Below (it's not pretty I know):
# Import saved full dataframe ~ 200 features
import feather
df = feather.read_dataframe('df_feathered')
df.set_index('time', inplace=True)
# Difference the dataset to make stationary
df = df.diff(periods=1, axis=0)
# MAKE LARGE SAMPLE FOR TESTING
df_train = df.loc['2017-3-1':'2017-6-30']
df_val = df.loc['2017-7-1':'2017-8-31']
df_test = df.loc['2017-9-1':'2017-9-30']
# Make x_train, x_val sets by dropping target variable
x_train = df_train.drop('close+1', axis=1)
x_val = df_val.drop('close+1', axis=1)
# Scale the training data first then fit the transform to the test set
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_val)
# scaler = MinMaxScaler(feature_range=(0,1))
# x_train = scaler.fit_transform(df_train1)
# x_test = scaler.transform(df_val1)
# Create y_train, y_test, simply target variable for regression
y_train = df_train['close+1']
y_test = df_val['close+1']
# Define Lookback window for LSTM input
sliding_window = 15
# Convert x_train, x_test, y_train, y_test into 3d array (samples,
timesteps, features) for LSTM input
dataXtrain = []
for i in range(len(x_train)-sliding_window-1):
a = x_train[i:(i+sliding_window), 0:(x_train.shape[1])]
dataXtrain.append(a)
dataXtest = []
for i in range(len(x_test)-sliding_window-1):
a = x_test[i:(i+sliding_window), 0:(x_test.shape[1])]
dataXtest.append(a)
dataYtrain = []
for i in range(len(y_train)-sliding_window-1):
dataYtrain.append(y_train[i + sliding_window])
dataYtest = []
for i in range(len(y_test)-sliding_window-1):
dataYtest.append(y_test[i + sliding_window])
# Make data the divisible by a variety of batch_sizes for training
# Started at 1000 to not include replaced NaN values
dataXtrain = np.array(dataXtrain[1000:172008])
dataYtrain = np.array(dataYtrain[1000:172008])
dataXtest = np.array(dataXtest[1000:83944])
dataYtest = np.array(dataYtest[1000:83944])
# Checking input shapes
print('dataXtrain size is: {}'.format((dataXtrain).shape))
print('dataXtest size is: {}'.format((dataXtest).shape))
print('dataYtrain size is: {}'.format((dataYtrain).shape))
print('dataYtest size is: {}'.format((dataYtest).shape))
### ACTUAL LSTM MODEL
batch_size = 256
timesteps = dataXtrain.shape[1]
features = dataXtrain.shape[2]
# Model set-up, stacked 4 layer stateful LSTM
model = Sequential()
model.add(LSTM(512, return_sequences=True, stateful=True,
batch_input_shape=(batch_size, timesteps, features)))
model.add(LSTM(256,stateful=True, return_sequences=True))
model.add(LSTM(256,stateful=True, return_sequences=True))
model.add(LSTM(128,stateful=True))
model.add(Dense(1, activation='linear'))
model.summary()
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.9, patience=5, min_lr=0.000001, verbose=1)
def coeff_determination(y_true, y_pred):
from keras import backend as K
SS_res = K.sum(K.square( y_true-y_pred ))
SS_tot = K.sum(K.square( y_true - K.mean(y_true) ) )
return ( 1 - SS_res/(SS_tot + K.epsilon()) )
model.compile(loss='mse',
optimizer='nadam',
metrics=[coeff_determination,'mse','mae','mape'])
history = model.fit(dataXtrain, dataYtrain,validation_data=(dataXtest, dataYtest),
epochs=100,batch_size=batch_size, shuffle=False, verbose=1, callbacks=[reduce_lr])
score = model.evaluate(dataXtest, dataYtest,batch_size=batch_size, verbose=1)
print(score)
predictions = model.predict(dataXtest, batch_size=batch_size)
print(predictions)
import matplotlib.pyplot as plt
%matplotlib inline
#plt.plot(history.history['mean_squared_error'])
#plt.plot(history.history['val_mean_squared_error'])
plt.plot(history.history['coeff_determination'])
plt.plot(history.history['val_coeff_determination'])
#plt.plot(history.history['mean_absolute_error'])
#plt.plot(history.history['mean_absolute_percentage_error'])
#plt.plot(history.history['val_mean_absolute_percentage_error'])
#plt.title("MSE")
plt.ylabel("R2")
plt.xlabel("epoch")
plt.legend(["train", "val"], loc="best")
plt.show()
plt.plot(history.history["loss"][5:])
plt.plot(history.history["val_loss"][5:])
plt.title("model loss")
plt.ylabel("loss")
plt.xlabel("epoch")
plt.legend(["train", "val"], loc="best")
plt.show()
plt.figure(figsize=(20,8))
plt.plot(dataYtest)
plt.plot(predictions)
plt.title("Prediction")
plt.ylabel("Price")
plt.xlabel("Time")
plt.legend(["Truth", "Prediction"], loc="best")
plt.show()
Maybe you should remember you are predicting sock returns, which it's very likely to predict nothing. So val_loss increasing is not overfitting at all. Instead of adding more dropouts, maybe you should think about adding more layers to increase it's power.
Try to reduce learning rate much (and remove dropouts for now).
Why do you use
shuffle=False
in fit() function?

Categories

Resources