Hyperparameter Tuning (Keras) a Neural Network Regression - python

We have developed an Artificial Neural Network in Python, and in that regard we would like tune the hyperparameters with GridSearchCV to find the best possible hyperparameters. The goal of our ANN is to predict temperature based on other relevant features, and so far this is the evaluation of the performance of the neural network:
Coefficient of Determination (R2) Root Mean Square Error (RMSE) Mean Squared Error (MSE) Mean Absolute Percent Error (MAPE) Mean Absolute Error (MAE) Mean Bias Error (MBE)
0.9808840288506496 0.7527763482280911 0.5666722304516204 0.09142692180578049 0.588041786518511 -0.07293321963266877
As of now, we have no clue on how to utilize GridSearchCV correctly, and we therefore seek help to move us towards a solution that would satisfy our goal. We have a function that might work, but are not able to apply it correctly to our code.
This is the hyperparameter tuning function (GridSearchCV):
def hyperparameterTuning():
# Listing all the parameters to try
Parameter_Trials = {'batch_size': [10, 20, 30],
'epochs': [10, 20],
'Optimizer_trial': ['adam', 'rmsprop']
}
# Creating the regression ANN model
RegModel = KerasRegressor(make_regression_ann, verbose=0)
# Creating the Grid search space
grid_search = GridSearchCV(estimator=RegModel,
param_grid=Parameter_Trials,
scoring=None,
cv=5)
# Running Grid Search for different paramenters
grid_search.fit(X, y, verbose=1)
print('### Printing Best parameters ###')
grid_search.best_params_
Our main function:
if __name__ == '__main__':
print('--------------')
dataframe = pd.read_csv("/.../file.csv")
# Splitting data into training and tesing data
X_train, X_test, y_train, y_test, PredictorScalerFit, TargetVarScalerFit = splitData(dataframe=dataframe)
# Making the Regression Artificial Neural Network (ANN)
ann = ANN(X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test, PredictorScalerFit=PredictorScalerFit, TargetVarScalerFit=TargetVarScalerFit)
# Evaluation of the performance of the Aritifical Neural Network (ANN)
eval = evaluation(y_test_orig=ann['temp'], y_test_pred=ann['Predicted_temp'])
Our function to split data into training and testing data:
def splitData(dataframe):
X = dataframe[Predictors].values
y = dataframe[TargetVariable].values
### Sandardization of data ###
PredictorScaler = StandardScaler()
TargetVarScaler = StandardScaler()
# Storing the fit object for later reference
PredictorScalerFit = PredictorScaler.fit(X)
TargetVarScalerFit = TargetVarScaler.fit(y)
# Generating the standardized values of X and y
X = PredictorScalerFit.transform(X)
y = TargetVarScalerFit.transform(y)
# Split the data into training and testing set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
return X_train, X_test, y_train, y_test, PredictorScalerFit, TargetVarScalerFit
Our function to fit the model and to utilize the Artificial Neural Network (ANN)
def ANN(X_train, y_train, X_test, y_test, TargetVarScalerFit, PredictorScalerFit):
model = make_regression_ann()
# Fitting the ANN to the Training set
model.fit(X_train, y_train, batch_size=5, epochs=100, verbose=1)
# Generating Predictions on testing data
Predictions = model.predict(X_test)
# Scaling the predicted temp data back to original price scale
Predictions = TargetVarScalerFit.inverse_transform(Predictions)
# Scaling the y_test temp data back to original temp scale
y_test_orig = TargetVarScalerFit.inverse_transform(y_test)
# Scaling the test data back to original scale
Test_Data = PredictorScalerFit.inverse_transform(X_test)
TestingData = pd.DataFrame(data=Test_Data, columns=Predictors)
TestingData['temp'] = y_test_orig
TestingData['Predicted_temp'] = Predictions
TestingData.head()
# Computing the absolute percent error
APE = 100 * (abs(TestingData['temp'] - TestingData['Predicted_temp']) / TestingData['temp'])
TestingData['APE'] = APE
# ...
TestingData = TestingData.round(2)
TestingData.to_csv("TestingData.csv")
return TestingData
Our function to make the model of the ANN
def make_regression_ann():
# create ANN model
model = Sequential()
# Defining the Input layer and FIRST hidden layer, both are same!
model.add(Dense(units=8, input_dim=7, kernel_initializer='normal', activation='sigmoid'))
# Defining the Second layer of the model
# after the first layer we don't have to specify input_dim as keras configure it automatically
model.add(Dense(units=6, kernel_initializer='normal', activation='sigmoid'))
# The output neuron is a single fully connected node
# Since we will be predicting a single number
model.add(Dense(1, kernel_initializer='normal'))
# Compiling the model
model.compile(loss='mean_squared_error', optimizer='adam')
return model
Our function to evaluate the performance of the ANN
def evaluation(y_test_orig, y_test_pred):
# Computing the Mean Absolute Percent Error
MAPE = mean_absolute_percentage_error(y_test_orig, y_test_pred)
# Computing R2 Score
r2 = r2_score(y_test_orig, y_test_pred)
# Computing Mean Square Error (MSE)
MSE = mean_squared_error(y_test_orig, y_test_pred)
# Computing Root Mean Square Error (RMSE)
RMSE = mean_squared_error(y_test_orig, y_test_pred, squared=False)
# Computing Mean Absolute Error (MAE)
MAE = mean_absolute_error(y_test_orig, y_test_pred)
# Computing Mean Bias Error (MBE)
MBE = np.mean(y_test_pred - y_test_orig) # here we calculate MBE
print('--------------')
print('The Coefficient of Determination (R2) of ANN model is:', r2)
print("The Root Mean Squared Error (RMSE) of ANN model is:", RMSE)
print("The Mean Squared Error (MSE) of ANN model is:", MSE)
print('The Mean Absolute Percent Error (MAPE) of ANN model is:', MAPE)
print("The Mean Absolute Error (MAE) of ANN model is:", MAE)
print("The Mean Bias Error (MBE) of ANN model is:", MBE)
print('--------------')
eval_list = [r2, RMSE, MSE, MAPE, MAE, MBE]
columns = ['Coefficient of Determination (R2)', 'Root Mean Square Error (RMSE)', 'Mean Squared Error (MSE)',
'Mean Absolute Percent Error (MAPE)', 'Mean Absolute Error (MAE)', 'Mean Bias Error (MBE)']
dataframe = pd.DataFrame([eval_list], columns=columns)
return dataframe

Your code should work if you update the make_regression_ann function to include any hyperparameters that you want to optimize as inputs, with the exception of the fitting parameters.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import make_regression
def make_regression_ann(initializer='uniform', activation='relu', optimizer='adam', loss='mse'):
model = Sequential()
model.add(Dense(units=8, input_dim=7, kernel_initializer=initializer, activation=activation))
model.add(Dense(units=6, kernel_initializer=initializer, activation=activation))
model.add(Dense(1, kernel_initializer=initializer))
model.compile(loss=loss, optimizer=optimizer)
return model
param_grid = {
'initializer': ['normal', 'uniform'],
'activation': ['relu', 'sigmoid'],
'optimizer': ['adam', 'rmsprop'],
'loss': ['mse', 'mae'],
'batch_size': [32, 64],
'epochs': [5, 10],
}
grid_search = GridSearchCV(
estimator=KerasRegressor(make_regression_ann, verbose=0),
param_grid=param_grid,
scoring='neg_mean_absolute_percentage_error',
cv=3,
)
X, y = make_regression(n_features=7, n_samples=100, random_state=42)
grid_search.fit(X, y, verbose=1)
grid_search.best_params_
# {'activation': 'sigmoid',
# 'batch_size': 32,
# 'epochs': 10,
# 'initializer': 'normal',
# 'loss': 'mae',
# 'optimizer': 'adam'}

The way I used GridSearchCV successfully, recently was:
tuned_parameters2 = {'C': [1,10,100,10000], 'max_iter':[5000,10000,50000]}
model2 = GridSearchCV(svm.LinearSVC(), tuned_parameters2)
model2.fit(features, y_train)
So separate dictionary with hyperparameters, then assign your model to GridSearchCV(make_regression_ann, the_hyperparam_dict). Then fit it with the data.
In your case this approach would require more refactoring. It’s up to you to decide if maybe it’s better to feed ANN to GridSearchCV.

Related

How to use k-fold cross-validation instead of train_test_split for Regression Neural Network

We have developed an Artificial Neural Network (ANN), where we split our data into training and testing data with train_test_split. As we want a better and more generalized estimate of our performance scores, we would like to split data with k-fold instead.
Now, we split the data into 70% training and 30% testing data with train_test_split
def splitData(dataframe):
X = dataframe[Predictors].values
y = dataframe[TargetVariable].values
### Sandardization of data ###
PredictorScaler = StandardScaler()
TargetVarScaler = StandardScaler()
# Storing the fit object for later reference
PredictorScalerFit = PredictorScaler.fit(X)
TargetVarScalerFit = TargetVarScaler.fit(y)
# Generating the standardized values of X and y
X = PredictorScalerFit.transform(X)
y = TargetVarScalerFit.transform(y)
# Split the data into training and testing set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
return X_train, X_test, y_train, y_test, PredictorScalerFit, TargetVarScalerFit
How do we split our data with k-fold instead?
And are we right in our assumptions that the performance scores will result in better and more generalized estimates if using k-fold cross-validation instead of train_test_split?
Our main
if __name__ == '__main__':
print('--------------')
# ...
dataframe = pd.read_csv("/.../file.csv")
# Splitting data into training and tesing data
X_train, X_test, y_train, y_test, PredictorScalerFit, TargetVarScalerFit = splitData(dataframe=dataframe)
# Making the Regression Artificial Neural Network (ANN)
ann = ANN(X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test, PredictorScalerFit=PredictorScalerFit, TargetVarScalerFit=TargetVarScalerFit)
# Evaluation of the performance of the Aritifical Neural Network (ANN)
performanceEvaluation(y_test_orig=ann['temp'], y_test_pred=ann['Predicted_temp'])
Our ANN Prediction function
def ANN(X_train, y_train, X_test, y_test, TargetVarScalerFit, PredictorScalerFit):
# Making the regression Artificial Neural Network (ANN) Model
model = make_regression_ann()
# Fitting the ANN to the Training set
model.fit(X_train, y_train, batch_size=5, epochs=50, verbose=1)
# Generating Predictions on testing data
Predictions = model.predict(X_test)
# Scaling the predicted temp data back to original price scale
Predictions = TargetVarScalerFit.inverse_transform(Predictions)
# Scaling the y_test temp data back to original temp scale
y_test_orig = TargetVarScalerFit.inverse_transform(y_test)
# Scaling the test data back to original scale
Test_Data = PredictorScalerFit.inverse_transform(X_test)
TestingData = pd.DataFrame(data=Test_Data, columns=Predictors)
TestingData['temp'] = y_test_orig
TestingData['Predicted_temp'] = Predictions
TestingData.head()
TestingData.to_csv("TestingData.csv")
return TestingData
Making the regression ann model
def make_regression_ann(initializer='normal', activation='relu', optimizer='adam', loss='mse'):
# create ANN model
model = Sequential()
# Defining the Input layer and FIRST hidden layer, both are same!
model.add(Dense(units=8, input_dim=7, kernel_initializer=initializer, activation=activation))
# Defining the Second layer of the model
# after the first layer we don't have to specify input_dim as keras configure it automatically
model.add(Dense(units=6, kernel_initializer=initializer, activation=activation))
# The output neuron is a single fully connected node
# Since we will be predicting a single number
model.add(Dense(1, kernel_initializer=initializer))
# Compiling the model
model.compile(loss=loss, optimizer=optimizer)
return model
Our function to generate performance scores
def performanceEvaluation(y_test_orig, y_test_pred):
# Computing the Mean Absolute Percent Error
MAPE = mean_absolute_percentage_error(y_test_orig, y_test_pred)
# Computing R2 Score
r2 = r2_score(y_test_orig, y_test_pred)
# Computing Mean Square Error (MSE)
MSE = mean_squared_error(y_test_orig, y_test_pred)
# Computing Root Mean Square Error (RMSE)
RMSE = mean_squared_error(y_test_orig, y_test_pred, squared=False)
# Computing Mean Absolute Error (MAE)
MAE = mean_absolute_error(y_test_orig, y_test_pred)
# Computing Mean Bias Error (MBE)
MBE = np.mean(y_test_pred - y_test_orig) # here we calculate MBE
print('--------------')
print('The Coefficient of Determination (R2) of ANN model is:', r2)
print("The Root Mean Squared Error (RMSE) of ANN model is:", RMSE)
print("The Mean Squared Error (MSE) of ANN model is:", MSE)
print('The Mean Absolute Percent Error (MAPE) of ANN model is:', MAPE)
print("The Mean Absolute Error (MAE) of ANN model is:", MAE)
print("The Mean Bias Error (MBE) of ANN model is:", MBE)
print('--------------')
eval_list = [r2, RMSE, MSE, MAPE, MAE, MBE]
columns = ['Coefficient of Determination (R2)', 'Root Mean Square Error (RMSE)', 'Mean Squared Error (MSE)',
'Mean Absolute Percent Error (MAPE)', 'Mean Absolute Error (MAE)', 'Mean Bias Error (MBE)']
dataframe = pd.DataFrame([eval_list], columns=columns)
return dataframe
Our performance scores
Coefficient of Determination (R2) Root Mean Square Error (RMSE) Mean Squared Error (MSE) Mean Absolute Percent Error (MAPE) Mean Absolute Error (MAE) Mean Bias Error (MBE)
0.982052940563799 0.7293977725380798 0.5320211105835124 0.0894734801108027 0.5711224962560332 0.049541171482965995
What we tried to reach our goal
Splitting whole dataset into X (Predictors) , y (TargetVariable)
def splitData(dataframe):
X = dataframe[Predictors].values
y = dataframe[TargetVariable].values
### Sandardization of data ###
PredictorScaler = StandardScaler()
TargetVarScaler = StandardScaler()
# Storing the fit object for later reference
PredictorScalerFit = PredictorScaler.fit(X)
TargetVarScalerFit = TargetVarScaler.fit(y)
# Generating the standardized values of X and y
X = PredictorScalerFit.transform(X)
y = TargetVarScalerFit.transform(y)
return X, y
Utilize ANN
def ANN_test(X,y):
# Fitting the ANN to the Training set
model = KerasRegressor(build_fn=make_regression_ann(), epochs=50, batch_size=5)
cv = KFold(n_splits=10, random_state=1, shuffle=True)
test = cross_val_score(model, X=X, y=y, cv=cv, scoring="neg_mean_squared_error", n_jobs=1)
print(test)
mean = test.mean()
print(mean)
Error receiving from using this function:
2021-12-24 16:16:47.909705: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
/Users/yusuf/PycharmProjects/MasterThesis-ArtificialNeuralNetwork/main.py:168: DeprecationWarning: KerasRegressor is deprecated, use Sci-Keras (https://github.com/adriangb/scikeras) instead.
model = KerasRegressor(build_fn=make_regression_ann(), epochs=50, batch_size=5)
2021-12-24 16:16:48.193312: W tensorflow/python/util/util.cc:368] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
[nan nan nan nan nan nan nan nan nan nan]
nan
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/model_selection/_validation.py:372: FitFailedWarning:
10 fits failed out of a total of 10.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/model_selection/_validation.py", line 681, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/keras/wrappers/scikit_learn.py", line 152, in fit
self.model = self.build_fn(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/keras/engine/base_layer.py", line 3076, in _split_out_first_arg
raise ValueError(
ValueError: The first argument to `Layer.call` must always be passed.
warnings.warn(some_fits_failed_message, FitFailedWarning)
We are operating on:
Mac Monterey, Version: 12.0.1
Tensorflow Version: 2.7.0
Keras Version: 2.7.0
Libraries
import os
import time
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from pathlib import Path
from sklearn.metrics import accuracy_score, make_scorer, mean_absolute_percentage_error, mean_absolute_error, mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV, KFold, cross_val_score
from keras.wrappers.scikit_learn import KerasRegressor
from keras.models import Sequential
from keras.layers import Dense
You need to use KerasRegressor to wrap your keras model as a scikit learn model.
Take a look at example 1 here
from keras.wrappers.scikit_learn import KerasRegressor
model = KerasRegressor(build_fn=make_regression_ann, epochs=512, batch_size=3)
kfold = KFold(n_splits=10, random_state=seed)
scores = cross_val_score(model, x, y, cv=kfold)

Problem with building an ANN for Iris Dataset

I am new to machine learning. I have been trying to get this code working but the loss is stuck as 1.12 and is neither increasing or decreasing. Any help would be appreciated.
import pandas as pd
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
dataset = pd.read_csv('Iris.csv')
#for rncoding label
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
dataset["Labels"] = encoder.fit_transform(dataset["Species"])
X = dataset.iloc[:,1:5]
Y = dataset['Labels']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=42)
X_train = np.array(X_train).astype(np.float32)
X_test = np.array(X_test).astype(np.float32)
y_train = np.array(y_train).astype(np.float32)
y_test = np.array(y_test).astype(np.float32)
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(8, input_shape=(4,), activation='relu'),
tf.keras.layers.Dense(3, activation='softmax')
])
opt = tf.keras.optimizers.Adam(0.01)
model.compile(optimizer=opt, loss='mse')
r = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=50)
This is a classification problem where you have to predict the class of Iris plant (source). You have specified mse loss which stands for 'Mean Squared Error'. It measures the average deviation of predicted values from actual values. The square ensures you penalize a large deviation higher than a small deviation. This loss is used for regression problems when you have to predict a continuous value like price, clicks, sales etc.
A few suggestions that will help are:
Change the loss to a classification loss function. categorical_cross_entropy is a good choice here. Without going into too many details, in classification problems model outputs the score of a particular sample belonging to a class. The softmax function used by you converts these scores to normalized probabilities. The cross-entropy loss ensures that your model is penalized when it gives a high probability to the wrong class
Try standardizing your data with 0 mean and unit variance. This helps the model convergence.
You may refer to this article for building a neural network for Iris dataset.

Tensorflow 2.0: How are metrics computed when the ouput is sequential?

I have been working with binary sequential inputs and outputs using Tensorflow 2.0, and I've been wondering which approach Tensorflow uses to compute metrics such as recall or accuracy during training in those scenarios.
Each sample to my network consists of 60 timesteps, each with 300 features, and thus my expected output is a (60, 1) array of 1s and 0s. Suppose I have 2000 validation samples. When evaluating the validation set for each epoch, does tensorflow concatenates all of the 2000 samples into a single (2000*60=120000, 1) array and then compares to the concatenated groundtruth labels, or does it evalutes each of the (60, 1) individually and then returns a mean of those values? Is there any way to modify this behavior?
Tensorflow/Keras by default computes the metrics batch-wise for train data, while it computes the same metrics on ALL the data passed in validation_data parameters in fit method.
This means that the metric printed during fitting for the train data is the mean of that score calculated on all the batches. In other words, for trainset keras evaluates each bach individually and then returns a mean of those values. For validation data is different, keras gets all the validation samples and then compares them with the "concatenated" groundtruth labels.
To prove this behavior with code I propose a dummy example. I provide a custom callback that computes for sure the accuracy score on ALL the data passed at the end of the epoch (for train and optionally validation). this is useful for us to understand the behavior of tensorflow during training.
import numpy as np
from sklearn.metrics import accuracy_score
import tensorflow as tf
from tensorflow.keras.layers import *
from tensorflow.keras.models import *
from tensorflow.keras.callbacks import *
class ACC_custom(tf.keras.callbacks.Callback):
def __init__(self, train, validation=None):
super(ACC_custom, self).__init__()
self.validation = validation
self.train = train
def on_epoch_end(self, epoch, logs={}):
logs['ACC_score_train'] = float('-inf')
X_train, y_train = self.train[0], self.train[1]
y_pred = (self.model.predict(X_train).ravel()>0.5)+0
score = accuracy_score(y_train.ravel(), y_pred)
if (self.validation):
logs['ACC_score_val'] = float('-inf')
X_valid, y_valid = self.validation[0], self.validation[1]
y_val_pred = (self.model.predict(X_valid).ravel()>0.5)+0
val_score = accuracy_score(y_valid.ravel(), y_val_pred)
logs['ACC_score_train'] = np.round(score, 5)
logs['ACC_score_val'] = np.round(val_score, 5)
else:
logs['ACC_score_train'] = np.round(score, 5)
create dummy data
x_train = np.random.uniform(0,1, (1000,60,10))
y_train = np.random.randint(0,2, (1000,60,1))
x_val = np.random.uniform(0,1, (500,60,10))
y_val = np.random.randint(0,2, (500,60,1))
fit model
inp = Input(shape=((60,10)), dtype='float32')
x = Dense(32, activation='relu')(inp)
out = Dense(1, activation='sigmoid')(x)
model = Model(inp, out)
es = EarlyStopping(patience=10, verbose=1, min_delta=0.001,
monitor='ACC_score_val', mode='max', restore_best_weights=True)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(x_train,y_train, epochs=10, verbose=2,
callbacks=[ACC_custom(train=(x_train,y_train),validation=(x_val,y_val)),es],
validation_data=(x_val,y_val))
in the graphs below I make a comparison between the accuracies computed by our callback and the accuracy computed by keras
plt.plot(history.history['ACC_score_train'], label='accuracy_callback_train')
plt.plot(history.history['accuracy'], label='accuracy_default_train')
plt.legend(); plt.title('train accuracy')
plt.plot(history.history['ACC_score_val'], label='accuracy_callback_valid')
plt.plot(history.history['val_accuracy'], label='accuracy_default_valid')
plt.legend(); plt.title('validation accuracy')
as we can see the accuracy on the train data (first plot) is different between the default method and our callbacks. this means that the accuracy of train data is calculated batch-wise.
the validation accuracy (second plot) calculated by our callback and the default method is the same! this means that the score on validation data is computed one-shoot

What is the correct way to calculate performance metrics when using KFold CV or Stratified CV?

After reading a few tutorials, this is the first time I have built a Keras Deep Learning Model as I am a beginner in machine learning and deep learning. Most of the tutorials use the train-test split to train and test the model. However, I chose to use StratifiedKFold CV. The code is as below.
X = dataset[:,0:80].astype(float)
Y = dataset[:,80]
kfold = StratifiedKFold(n_splits=10,random_state=seed)
for train, test in kfold.split(X, Y):
# create model
model = Sequential()
model.add(Dense())
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='Adam',metrics=['accuracy'])
model.fit(X[train], Y[train], epochs=100,batch_size=128, verbose=0)
scores = model.evaluate(X[test], Y[test], verbose=1)
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
cvscores.append(scores[1] * 100)
print("%.2f%% (+/- %.2f%%)" % (numpy.mean(cvscores), numpy.std(cvscores)))
Y[pred]= model.predict(X[test])
acc = accuracy_score(Y[test],Y[pred])
confusion = confusion_matrix(Y[test], Y[pred])
print(confusion)
plot_confusion_matrix(confusion, classes =['No','Yes'],title='Confusion Matrix')
TP= confusion[1,1]
TN= confusion[0,0]
FP= confusion[0,1]
FN= confusion[1,0]
print('Accuracy: ')
print((TP + TN) / float(TP + TN + FP + FN))
print(accuracy_score(Y[test],Y[pred]))
fpr, tpr, thresholds = roc_curve(Y[test], y_pred_prob)
plt.plot(fpr, tpr)
print(roc_auc_score(y_test, y_pred_prob))
y_pred_class = binarize([y_pred_prob], 0.3)[0]
confusion_new = confusion_matrix(Y[test], y_pred_class)
print(confusion_new)
I have understood the theoretical concept of Kfold CV and StratifiedKFoldCV. I have come across What does KFold in python exactly do?, KFolds Cross Validation vs train_test_split, and a few more links. But when I calculate the performance metrics it gives me the following errors.
NameError: name 'pred' is not defined
NameError: name 'y_pred_prob' is not defined
NameError: name 'roc_curve' is not defined
What I am doing wrong here? Why am I getting these errors? How do I fix this?
Thanks.
Here's a way you can try:
X = dataset[:,0:80].astype(float)
Y = dataset[:,80]
# define model
model = Sequential()
model.add(Dense(10))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='Adam',metrics=['accuracy'])
# create folds
folds = list(StratifiedKFold(n_splits=10, shuffle=True, random_state=1).split(X, Y))
# train model for every fold
for j, (train_idx, val_idx) in enumerate(folds):
print('\nFold ',j)
X_train_cv = X[train_idx]
y_train_cv = Y[train_idx]
X_valid_cv = X[val_idx]
y_valid_cv= Y[val_idx]
model.fit(X_train_cv,
y_train_cv,
epochs=100,
batch_size=128,
validation_data = (X_valid_cv, y_valid_cv),
verbose=0)
print(model.evaluate(X_valid_cv, y_valid_cv))
# check metrics for each fold
pred = model.predict(X_valid_cv)
acc = accuracy_score(y_valid_cv, pred)
confusion = confusion_matrix(y_valid_cv, pred)
print(confusion)

Keras LSTM - Validation Loss Increasing From Epoch #1

I'm currently undertaking my first 'real' DL project of (surprise) predicting stock movements. I know that I'm 1000:1 to make anything useful but I'm enjoying it and want to see it through, I've learnt more in my few weeks of attempting this than I have in the prior 6 months of completing MOOC's.
I'm building an LSTM using Keras to currently predict the next 1 step forward and have attempted the task as both classification (up/down/steady) and now as a regression problem. Both result in a similar roadblock in that my validation loss never improves from epoch #1.
I can get the model to overfit such that training loss approaches zero with MSE (or 100% accuracy if classification), but at no stage does the validation loss decrease. This screams overfitting to my untrained eye so I added varying amounts of dropout but all that does is stifle the learning of the model/training accuracy and shows no improvements on the validation accuracy.
I have attempted to change a significant number of hyperparameters - learning rate, optimiser, batchsize, lookback window, #layers, #units, dropout, #samples, etc, also tried with subset of data and subset of features but I just can't get it to work so I'm very thankful for any help.
Code Below (it's not pretty I know):
# Import saved full dataframe ~ 200 features
import feather
df = feather.read_dataframe('df_feathered')
df.set_index('time', inplace=True)
# Difference the dataset to make stationary
df = df.diff(periods=1, axis=0)
# MAKE LARGE SAMPLE FOR TESTING
df_train = df.loc['2017-3-1':'2017-6-30']
df_val = df.loc['2017-7-1':'2017-8-31']
df_test = df.loc['2017-9-1':'2017-9-30']
# Make x_train, x_val sets by dropping target variable
x_train = df_train.drop('close+1', axis=1)
x_val = df_val.drop('close+1', axis=1)
# Scale the training data first then fit the transform to the test set
scaler = StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.transform(x_val)
# scaler = MinMaxScaler(feature_range=(0,1))
# x_train = scaler.fit_transform(df_train1)
# x_test = scaler.transform(df_val1)
# Create y_train, y_test, simply target variable for regression
y_train = df_train['close+1']
y_test = df_val['close+1']
# Define Lookback window for LSTM input
sliding_window = 15
# Convert x_train, x_test, y_train, y_test into 3d array (samples,
timesteps, features) for LSTM input
dataXtrain = []
for i in range(len(x_train)-sliding_window-1):
a = x_train[i:(i+sliding_window), 0:(x_train.shape[1])]
dataXtrain.append(a)
dataXtest = []
for i in range(len(x_test)-sliding_window-1):
a = x_test[i:(i+sliding_window), 0:(x_test.shape[1])]
dataXtest.append(a)
dataYtrain = []
for i in range(len(y_train)-sliding_window-1):
dataYtrain.append(y_train[i + sliding_window])
dataYtest = []
for i in range(len(y_test)-sliding_window-1):
dataYtest.append(y_test[i + sliding_window])
# Make data the divisible by a variety of batch_sizes for training
# Started at 1000 to not include replaced NaN values
dataXtrain = np.array(dataXtrain[1000:172008])
dataYtrain = np.array(dataYtrain[1000:172008])
dataXtest = np.array(dataXtest[1000:83944])
dataYtest = np.array(dataYtest[1000:83944])
# Checking input shapes
print('dataXtrain size is: {}'.format((dataXtrain).shape))
print('dataXtest size is: {}'.format((dataXtest).shape))
print('dataYtrain size is: {}'.format((dataYtrain).shape))
print('dataYtest size is: {}'.format((dataYtest).shape))
### ACTUAL LSTM MODEL
batch_size = 256
timesteps = dataXtrain.shape[1]
features = dataXtrain.shape[2]
# Model set-up, stacked 4 layer stateful LSTM
model = Sequential()
model.add(LSTM(512, return_sequences=True, stateful=True,
batch_input_shape=(batch_size, timesteps, features)))
model.add(LSTM(256,stateful=True, return_sequences=True))
model.add(LSTM(256,stateful=True, return_sequences=True))
model.add(LSTM(128,stateful=True))
model.add(Dense(1, activation='linear'))
model.summary()
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.9, patience=5, min_lr=0.000001, verbose=1)
def coeff_determination(y_true, y_pred):
from keras import backend as K
SS_res = K.sum(K.square( y_true-y_pred ))
SS_tot = K.sum(K.square( y_true - K.mean(y_true) ) )
return ( 1 - SS_res/(SS_tot + K.epsilon()) )
model.compile(loss='mse',
optimizer='nadam',
metrics=[coeff_determination,'mse','mae','mape'])
history = model.fit(dataXtrain, dataYtrain,validation_data=(dataXtest, dataYtest),
epochs=100,batch_size=batch_size, shuffle=False, verbose=1, callbacks=[reduce_lr])
score = model.evaluate(dataXtest, dataYtest,batch_size=batch_size, verbose=1)
print(score)
predictions = model.predict(dataXtest, batch_size=batch_size)
print(predictions)
import matplotlib.pyplot as plt
%matplotlib inline
#plt.plot(history.history['mean_squared_error'])
#plt.plot(history.history['val_mean_squared_error'])
plt.plot(history.history['coeff_determination'])
plt.plot(history.history['val_coeff_determination'])
#plt.plot(history.history['mean_absolute_error'])
#plt.plot(history.history['mean_absolute_percentage_error'])
#plt.plot(history.history['val_mean_absolute_percentage_error'])
#plt.title("MSE")
plt.ylabel("R2")
plt.xlabel("epoch")
plt.legend(["train", "val"], loc="best")
plt.show()
plt.plot(history.history["loss"][5:])
plt.plot(history.history["val_loss"][5:])
plt.title("model loss")
plt.ylabel("loss")
plt.xlabel("epoch")
plt.legend(["train", "val"], loc="best")
plt.show()
plt.figure(figsize=(20,8))
plt.plot(dataYtest)
plt.plot(predictions)
plt.title("Prediction")
plt.ylabel("Price")
plt.xlabel("Time")
plt.legend(["Truth", "Prediction"], loc="best")
plt.show()
Maybe you should remember you are predicting sock returns, which it's very likely to predict nothing. So val_loss increasing is not overfitting at all. Instead of adding more dropouts, maybe you should think about adding more layers to increase it's power.
Try to reduce learning rate much (and remove dropouts for now).
Why do you use
shuffle=False
in fit() function?

Categories

Resources