I'm trying to use GridSearchCV to optimise the hyperparameters in a custom model built with Keras. My code so far:
https://pastebin.com/ujYJf67c#9suyZ8vM
The model definition:
def build_nn_model(n, hyperparameters, loss, metrics, opt):
model = keras.Sequential([
keras.layers.Dense(hyperparameters[0], activation=hyperparameters[1], # number of outputs to next layer
input_shape=[n]), # number of features
keras.layers.Dense(hyperparameters[2], activation=hyperparameters[3]),
keras.layers.Dense(hyperparameters[4], activation=hyperparameters[5]),
keras.layers.Dense(1) # 1 output (redshift)
])
model.compile(loss=loss,
optimizer = opt,
metrics = metrics)
return model
and the grid search:
optimizer = ['SGD', 'RMSprop', 'Adagrad', 'Adadelta', 'Adam', 'Adamax', 'Nadam']
epochs = [10, 50, 100]
param_grid = dict(epochs=epochs, optimizer=optimizer)
grid = GridSearchCV(estimator=model, param_grid=param_grid, scoring='accuracy', n_jobs=-1, refit='boolean')
grid_result = grid.fit(X_train, y_train)
throws an error:
TypeError: Cannot clone object '<keras.engine.sequential.Sequential object at 0x0000028B8C50C0D0>' (type <class 'keras.engine.sequential.Sequential'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' method.
How can I get GridSearchCV to play nicely with the model as it's defined?
I'm assuming you are training a classifier, so you have to wrap it in KerasClassifier:
from scikeras.wrappers import KerasClassifier
...
model = KerasClassifier(build_nn_model)
# Do grid search
Remember to provide for each of build_nn_model's parameters either a default value or a grid in GridSearchCV.
For a regression model use KerasRegressor instead.
Related
We have developed an Artificial Neural Network in Python, and in that regard we would like tune the hyperparameters with GridSearchCV to find the best possible hyperparameters. The goal of our ANN is to predict temperature based on other relevant features, and so far this is the evaluation of the performance of the neural network:
Coefficient of Determination (R2) Root Mean Square Error (RMSE) Mean Squared Error (MSE) Mean Absolute Percent Error (MAPE) Mean Absolute Error (MAE) Mean Bias Error (MBE)
0.9808840288506496 0.7527763482280911 0.5666722304516204 0.09142692180578049 0.588041786518511 -0.07293321963266877
As of now, we have no clue on how to utilize GridSearchCV correctly, and we therefore seek help to move us towards a solution that would satisfy our goal. We have a function that might work, but are not able to apply it correctly to our code.
This is the hyperparameter tuning function (GridSearchCV):
def hyperparameterTuning():
# Listing all the parameters to try
Parameter_Trials = {'batch_size': [10, 20, 30],
'epochs': [10, 20],
'Optimizer_trial': ['adam', 'rmsprop']
}
# Creating the regression ANN model
RegModel = KerasRegressor(make_regression_ann, verbose=0)
# Creating the Grid search space
grid_search = GridSearchCV(estimator=RegModel,
param_grid=Parameter_Trials,
scoring=None,
cv=5)
# Running Grid Search for different paramenters
grid_search.fit(X, y, verbose=1)
print('### Printing Best parameters ###')
grid_search.best_params_
Our main function:
if __name__ == '__main__':
print('--------------')
dataframe = pd.read_csv("/.../file.csv")
# Splitting data into training and tesing data
X_train, X_test, y_train, y_test, PredictorScalerFit, TargetVarScalerFit = splitData(dataframe=dataframe)
# Making the Regression Artificial Neural Network (ANN)
ann = ANN(X_train=X_train, y_train=y_train, X_test=X_test, y_test=y_test, PredictorScalerFit=PredictorScalerFit, TargetVarScalerFit=TargetVarScalerFit)
# Evaluation of the performance of the Aritifical Neural Network (ANN)
eval = evaluation(y_test_orig=ann['temp'], y_test_pred=ann['Predicted_temp'])
Our function to split data into training and testing data:
def splitData(dataframe):
X = dataframe[Predictors].values
y = dataframe[TargetVariable].values
### Sandardization of data ###
PredictorScaler = StandardScaler()
TargetVarScaler = StandardScaler()
# Storing the fit object for later reference
PredictorScalerFit = PredictorScaler.fit(X)
TargetVarScalerFit = TargetVarScaler.fit(y)
# Generating the standardized values of X and y
X = PredictorScalerFit.transform(X)
y = TargetVarScalerFit.transform(y)
# Split the data into training and testing set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
return X_train, X_test, y_train, y_test, PredictorScalerFit, TargetVarScalerFit
Our function to fit the model and to utilize the Artificial Neural Network (ANN)
def ANN(X_train, y_train, X_test, y_test, TargetVarScalerFit, PredictorScalerFit):
model = make_regression_ann()
# Fitting the ANN to the Training set
model.fit(X_train, y_train, batch_size=5, epochs=100, verbose=1)
# Generating Predictions on testing data
Predictions = model.predict(X_test)
# Scaling the predicted temp data back to original price scale
Predictions = TargetVarScalerFit.inverse_transform(Predictions)
# Scaling the y_test temp data back to original temp scale
y_test_orig = TargetVarScalerFit.inverse_transform(y_test)
# Scaling the test data back to original scale
Test_Data = PredictorScalerFit.inverse_transform(X_test)
TestingData = pd.DataFrame(data=Test_Data, columns=Predictors)
TestingData['temp'] = y_test_orig
TestingData['Predicted_temp'] = Predictions
TestingData.head()
# Computing the absolute percent error
APE = 100 * (abs(TestingData['temp'] - TestingData['Predicted_temp']) / TestingData['temp'])
TestingData['APE'] = APE
# ...
TestingData = TestingData.round(2)
TestingData.to_csv("TestingData.csv")
return TestingData
Our function to make the model of the ANN
def make_regression_ann():
# create ANN model
model = Sequential()
# Defining the Input layer and FIRST hidden layer, both are same!
model.add(Dense(units=8, input_dim=7, kernel_initializer='normal', activation='sigmoid'))
# Defining the Second layer of the model
# after the first layer we don't have to specify input_dim as keras configure it automatically
model.add(Dense(units=6, kernel_initializer='normal', activation='sigmoid'))
# The output neuron is a single fully connected node
# Since we will be predicting a single number
model.add(Dense(1, kernel_initializer='normal'))
# Compiling the model
model.compile(loss='mean_squared_error', optimizer='adam')
return model
Our function to evaluate the performance of the ANN
def evaluation(y_test_orig, y_test_pred):
# Computing the Mean Absolute Percent Error
MAPE = mean_absolute_percentage_error(y_test_orig, y_test_pred)
# Computing R2 Score
r2 = r2_score(y_test_orig, y_test_pred)
# Computing Mean Square Error (MSE)
MSE = mean_squared_error(y_test_orig, y_test_pred)
# Computing Root Mean Square Error (RMSE)
RMSE = mean_squared_error(y_test_orig, y_test_pred, squared=False)
# Computing Mean Absolute Error (MAE)
MAE = mean_absolute_error(y_test_orig, y_test_pred)
# Computing Mean Bias Error (MBE)
MBE = np.mean(y_test_pred - y_test_orig) # here we calculate MBE
print('--------------')
print('The Coefficient of Determination (R2) of ANN model is:', r2)
print("The Root Mean Squared Error (RMSE) of ANN model is:", RMSE)
print("The Mean Squared Error (MSE) of ANN model is:", MSE)
print('The Mean Absolute Percent Error (MAPE) of ANN model is:', MAPE)
print("The Mean Absolute Error (MAE) of ANN model is:", MAE)
print("The Mean Bias Error (MBE) of ANN model is:", MBE)
print('--------------')
eval_list = [r2, RMSE, MSE, MAPE, MAE, MBE]
columns = ['Coefficient of Determination (R2)', 'Root Mean Square Error (RMSE)', 'Mean Squared Error (MSE)',
'Mean Absolute Percent Error (MAPE)', 'Mean Absolute Error (MAE)', 'Mean Bias Error (MBE)']
dataframe = pd.DataFrame([eval_list], columns=columns)
return dataframe
Your code should work if you update the make_regression_ann function to include any hyperparameters that you want to optimize as inputs, with the exception of the fitting parameters.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import make_regression
def make_regression_ann(initializer='uniform', activation='relu', optimizer='adam', loss='mse'):
model = Sequential()
model.add(Dense(units=8, input_dim=7, kernel_initializer=initializer, activation=activation))
model.add(Dense(units=6, kernel_initializer=initializer, activation=activation))
model.add(Dense(1, kernel_initializer=initializer))
model.compile(loss=loss, optimizer=optimizer)
return model
param_grid = {
'initializer': ['normal', 'uniform'],
'activation': ['relu', 'sigmoid'],
'optimizer': ['adam', 'rmsprop'],
'loss': ['mse', 'mae'],
'batch_size': [32, 64],
'epochs': [5, 10],
}
grid_search = GridSearchCV(
estimator=KerasRegressor(make_regression_ann, verbose=0),
param_grid=param_grid,
scoring='neg_mean_absolute_percentage_error',
cv=3,
)
X, y = make_regression(n_features=7, n_samples=100, random_state=42)
grid_search.fit(X, y, verbose=1)
grid_search.best_params_
# {'activation': 'sigmoid',
# 'batch_size': 32,
# 'epochs': 10,
# 'initializer': 'normal',
# 'loss': 'mae',
# 'optimizer': 'adam'}
The way I used GridSearchCV successfully, recently was:
tuned_parameters2 = {'C': [1,10,100,10000], 'max_iter':[5000,10000,50000]}
model2 = GridSearchCV(svm.LinearSVC(), tuned_parameters2)
model2.fit(features, y_train)
So separate dictionary with hyperparameters, then assign your model to GridSearchCV(make_regression_ann, the_hyperparam_dict). Then fit it with the data.
In your case this approach would require more refactoring. It’s up to you to decide if maybe it’s better to feed ANN to GridSearchCV.
For a regression problem I want to compare some metrics but I am only able to get accuracy from the history which makes no sense wrt to regression purposes. How can I get other metrics like mean_squared_error and so on?
create_model(...)
input_layer = ...
output_laye = ...
model = Model(input_layer, output_layer)
model.compile(loss='mean_squared_error', optimizer=optimizer, metrics=['accuracy'])
return model
model = KerasRegressor(build_fn=create_model, verbose=0)
batch_size = [1, 2]
epochs = [1, 2]
optimizer = ['Adam', 'sgd']
param_grid = dict(batch_size=batch_size
, optimizer = optimizer
)
grid_obj = RandomizedSearchCV(estimator=model
, param_grid=hypparas
, n_jobs=1
, cv = 3
, scoring = ['explained_variance', 'neg_mean_squared_error', 'r2']
, refit = 'neg_mean_squared_error'
, return_train_score=True
, verbose = 2
)
grid_result = grid_obj.fit(X_train1, y_train1)
X_train1, X_val1, y_train1, y_val1 = train_test_split(X_train1, y_train1, test_size=0.2, shuffle=False)
grid_best = grid_result.best_estimator_
history = grid_best.fit(X_train1, y_train1
, validation_data=(X_val1, y_val1)
)
print(history.history.keys())
> dict_keys(['val_loss', 'val_accuracy', 'loss', 'accuracy'])
I have seen https://stackoverflow.com/a/50137577/6761328 to get e.g.
history.history['accuracy']
which works but I can't access mean_squared_error or something else:
history.history['neg_mean_squared_error']
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-473-eb96973bf014> in <module>
----> 1 history.history['neg_mean_squared_error']
KeyError: 'neg_mean_squared_error'
This question is finally a follow-up on How to compare different metrics? as I think this question is the answer to the other one.
In stand-alone Keras (not sure for the scikit-learn wrapper), history.history['loss'] (or val_loss respectively for the validation set) would do the job.
Here, 'loss' and 'val_loss' are keys; give
print(history.history.keys())
to see what keys are available in your case, and you will find among them the required ones for the loss (might even be the same, i.e. 'loss' and 'val_loss').
As a side note, you should remove completely metrics=['accuracy'] from your model compilation - as you correctly point out, accuracy is meaningless in regression settings (you might want to check What function defines accuracy in Keras when the loss is mean squared error (MSE)?).
I am trying to do a cross validation on a k-nn classifier and I am confused about which of the following two methods below conducts cross validation correctly.
training_scores = defaultdict(list)
validation_f1_scores = defaultdict(list)
validation_precision_scores = defaultdict(list)
validation_recall_scores = defaultdict(list)
validation_scores = defaultdict(list)
def model_1(seed, X, Y):
np.random.seed(seed)
scoring = ['accuracy', 'f1_macro', 'precision_macro', 'recall_macro']
model = KNeighborsClassifier(n_neighbors=13)
kfold = StratifiedKFold(n_splits=2, shuffle=True, random_state=seed)
scores = model_selection.cross_validate(model, X, Y, cv=kfold, scoring=scoring, return_train_score=True)
print(scores['train_accuracy'])
training_scores['KNeighbour'].append(scores['train_accuracy'])
print(scores['test_f1_macro'])
validation_f1_scores['KNeighbour'].append(scores['test_f1_macro'])
print(scores['test_precision_macro'])
validation_precision_scores['KNeighbour'].append(scores['test_precision_macro'])
print(scores['test_recall_macro'])
validation_recall_scores['KNeighbour'].append(scores['test_recall_macro'])
print(scores['test_accuracy'])
validation_scores['KNeighbour'].append(scores['test_accuracy'])
print(np.mean(training_scores['KNeighbour']))
print(np.std(training_scores['KNeighbour']))
#rest of print statments
It seems that for loop in the second model is redundant.
def model_2(seed, X, Y):
np.random.seed(seed)
scoring = ['accuracy', 'f1_macro', 'precision_macro', 'recall_macro']
model = KNeighborsClassifier(n_neighbors=13)
kfold = StratifiedKFold(n_splits=2, shuffle=True, random_state=seed)
for train, test in kfold.split(X, Y):
scores = model_selection.cross_validate(model, X[train], Y[train], cv=kfold, scoring=scoring, return_train_score=True)
print(scores['train_accuracy'])
training_scores['KNeighbour'].append(scores['train_accuracy'])
print(scores['test_f1_macro'])
validation_f1_scores['KNeighbour'].append(scores['test_f1_macro'])
print(scores['test_precision_macro'])
validation_precision_scores['KNeighbour'].append(scores['test_precision_macro'])
print(scores['test_recall_macro'])
validation_recall_scores['KNeighbour'].append(scores['test_recall_macro'])
print(scores['test_accuracy'])
validation_scores['KNeighbour'].append(scores['test_accuracy'])
print(np.mean(training_scores['KNeighbour']))
print(np.std(training_scores['KNeighbour']))
# rest of print statments
I am using StratifiedKFold and I am not sure if I need for loop as in model_2 function or does cross_validate function already use the split as we are passing cv=kfold as an argument.
I am not calling fit method, is this OK? Does cross_validate calls that automatically or do I need to call fit before calling cross_validate?
Finally, how can I create confusion matrix? Do I need to create it for each fold, if yes, how can the final/average confusion matrix be calculated?
The documentation is arguably your best friend in such questions; from the simple example there it should be apparent that you should use neither a for loop nor a call to fit. Adapting the example to use KFold as you do:
from sklearn.model_selection import KFold, cross_validate
from sklearn.datasets import load_boston
from sklearn.tree import DecisionTreeRegressor
X, y = load_boston(return_X_y=True)
n_splits = 5
kf = KFold(n_splits=n_splits, shuffle=True)
model = DecisionTreeRegressor()
scoring=('r2', 'neg_mean_squared_error')
cv_results = cross_validate(model, X, y, cv=kf, scoring=scoring, return_train_score=False)
cv_results
Result:
{'fit_time': array([0.00901461, 0.00563478, 0.00539804, 0.00529385, 0.00638533]),
'score_time': array([0.00132656, 0.00214362, 0.00134897, 0.00134444, 0.00176597]),
'test_neg_mean_squared_error': array([-11.15872549, -30.1549505 , -25.51841584, -16.39346535,
-15.63425743]),
'test_r2': array([0.7765484 , 0.68106786, 0.73327311, 0.83008371, 0.79572363])}
how can I create confusion matrix? Do I need to create it for each fold
No one can tell you if you need to create a confusion matrix for each fold - it is your choice. If you choose to do so, it may be better to skip cross_validate and do the procedure "manually" - see my answer in How to display confusion matrix and report (recall, precision, fmeasure) for each cross validation fold.
if yes, how can the final/average confusion matrix be calculated?
There is no "final/average" confusion matrix; if you want to calculate anything further than the k ones (one for each k-fold) as described in the linked answer, you need to have available a separate validation set...
model_1 is correct.
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_validate.html
cross_validate(estimator, X, y=None, groups=None, scoring=None, cv=’warn’, n_jobs=None, verbose=0, fit_params=None, pre_dispatch=‘2*n_jobs’, return_train_score=’warn’, return_estimator=False, error_score=’raise-deprecating’)
where
estimator is an object implementing ‘fit’. It will be called to fit the model on the train folds.
cv: is a cross-validation generator that is used to generated train and test splits.
If you follow the example in the sklearn docs
cv_results = cross_validate(lasso, X, y, cv=3, return_train_score=False)
cv_results['test_score']
array([0.33150734, 0.08022311, 0.03531764])
You can see that the model lasso is fitted 3 times once for each fold on train splits and also validated 3 times on test splits. You can see that the test score on validation data are reported.
Cross validation of Keras models
Keras provides wrapper which makes the keras models compatible with sklearn cross_validatation method. You have to wrap the keras model using KerasClassifier
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import KFold, cross_validate
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
def get_model():
model = Sequential()
model.add(Dense(2, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
model = KerasClassifier(build_fn=get_model, epochs=10, batch_size=8, verbose=0)
kf = KFold(n_splits=3, shuffle=True)
X = np.random.rand(10,2)
y = np.random.rand(10,1)
cv_results = cross_validate(model, X, y, cv=kf, return_train_score=False)
print (cv_results)
I am trying to define a custom metric in Keras that takes into account sample weights. When fitting the model I use the sample weights as follows:
training_history = model.fit(
train_data,
train_labels,
sample_weight = train_weights,
epochs = num_epochs,
batch_size = 128,
validation_data = (validation_data, validatation_labels, validation_weights ),
)
An example of a custom metric I am using is the AUC (area under roc curve ), which I defined as follows:
from keras import backend as K
import tensorflow as tf
def auc(true_labels, predictions, weights = None):
auc = tf.metrics.auc(true_labels, predictions, weights = weights)[1]
K.get_session().run(tf.local_variables_initializer())
return auc
and I use this metric when compiling the model:
model.compile(
optimizer = optimizer,
loss = 'binary_crossentropy',
metrics = ['accuracy', auc]
)
But as far as I can tell, the metric does not take into account the sample weights. In fact I verified this by comparing the metric value I see when training the model using the custom metric defined above to what I get by computing it myself from the model output and the sample weights, which indeed yield very different results. How would I define the auc metric shown above to take into account the sample weights?
You could wrap your metric with another function that takes sample_weights as an argument:
def auc(weights):
def metric(true_labels, predictions):
auc = tf.metrics.auc(true_labels, predictions, weights=weights)[1]
K.get_session().run(tf.local_variables_initializer())
return auc
return metric
And then define an extra input placeholder that will receive the sample weights:
sample_weights = Input(shape=(1,))
Your model can then be compiled as follows:
model.compile(
optimizer = optimizer,
loss = 'binary_crossentropy',
metrics = ['accuracy', auc(sample_weights)]
)
NOTE: Not tested.
I am using a OneVsRestClassifier on MLPClassifier. I am using this for classifying text data where X is set of questions in pd.DataFrame and Y is multi-label and multi-class strings. Please see below the code snippets
text_clf = Pipeline([('scale',StandardScaler(with_mean=False)),('clf',OneVsRestClassifier(MLPClassifier(learning_rate = 'adaptive', solver = 'lbfgs',random_state=9000)))])
parameters = {'clf__alpha':[10.0 ** ~ np.arange(1, 7).any()],'clf__hidden_layer_sizes': [(100,),(50,)],'clf__max_iter': [1000,500],'clf__activation':('relu','tanh')}
grid = GridSearchCV(text_clf, parameters, cv=3, n_jobs=-1, scoring= 'accuracy')
with parallel_backend('threading'):
grid.fit(X,Y)
I am getting the following error
ValueError: Invalid parameter activation for estimator OneVsRestClassifier(estimator=MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
beta_2=0.999, early_stopping=False, epsilon=1e-08,
hidden_layer_sizes=(100,), learning_rate='adaptive',
learning_rate_init=0.001, max_iter=200, momentum=0.9,
n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
random_state=9000, shuffle=True, solver='lbfgs', tol=0.0001,
validation_fraction=0.1, verbose=False, warm_start=False),
n_jobs=None). Check the list of available parameters with `estimator.get_params().keys()`.
As per my understanding MLPClassifier supports Multi Label classification. Is it indicating that parameters need to be re-examined? If so, then can any body please give any clue on where to make changes in parameters?
Any help would really be appreciated.
Your MLPClassifier is nested into OneVsRestClassifier as its estimator.
In other words, parameters should specify that all of alpha, hidden_layer_sizes, ... are meant for nested estimator not OneVsRestClassifier.
Changing your parameters like following should do the job:
parameters = {'clf__estimator__alpha':[10.0 ** ~ np.arange(1,7).any()],
'clf__estimator__hidden_layer_sizes': [(100,),(50,)],
'clf__estimator__max_iter': [1000,500],
'clf__estimator__activation':('relu','tanh')}