I am an electrical engineer and I am looking for a solution to calculate the DC current of a permanent synchronous motor. So I decided to check the ANN solutions with Keras and so on.Long story short, I'll show you a screenshot of some measured signals.
The first 5 signals are the measured signals. The last one is the DC current, which I will estimate. Here the value was recorded with the help of a current clamp. Okay, I started building a model in Python and tried some things that I assume will increase the accuracy of the model. But after all that, I am not getting that good results from the model and my hope is that maybe I am choosing wrong parameters or not an ideal model for this purpose.
Here is my code:
import numpy as np
from keras.layers import Dense, LSTM
from keras.models import Sequential
from keras.callbacks import EarlyStopping
import pandas as pd
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from matplotlib import pyplot as plt
import seaborn as sns
# Import input (x) and output (y) data, and asign these to df1 and df1
df = pd.read_csv('train_data.csv')
df = df[['rpm','iq','uq','udc','idc']]
X = df[df.columns[:-1]]
Y = df.idc
plt.figure()
sns.heatmap(df.corr(),annot=True)
plt.show()
# Split the data into input (x) training and testing data, and ouput (y) training and testing data,
# with training data being 80% of the data, and testing data being the remaining 20% of the data
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2)#, shuffle=True)
# Scale both training and testing input data
X_train = preprocessing.maxabs_scale(X_train)
X_test = preprocessing.maxabs_scale(X_test)
model = Sequential()
model.add(Dense(4, input_shape=(4,)))
model.add(Dense(4, input_shape=(4,)))
model.add(Dense(1, input_shape=(4,)))
model.compile(optimizer="adam", loss="msle", metrics=['mean_squared_logarithmic_error','accuracy'])
# Pass several parameters to 'EarlyStopping' function and assign it to 'earlystopper'
earlystopper = EarlyStopping(monitor='val_loss', min_delta=0, patience=15, verbose=1, mode='auto')
model.summary()
history = model.fit(X_train, y_train, epochs = 2000, validation_split = 0.3, verbose = 2, callbacks = [earlystopper])
# Runs model (the one with the activation function, although this doesn't really matter as they perform the same)
# with its current weights on the training and testing data
y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)
# Calculates and prints r2 score of training and testing data
print("The R2 score on the Train set is:\t{:0.3f}".format(r2_score(y_train, y_train_pred)))
print("The R2 score on the Test set is:\t{:0.3f}".format(r2_score(y_test, y_test_pred)))
df = pd.read_csv('test_two_data.csv')
df = df[['rpm','iq','uq','udc','idc']]
X = df[df.columns[:-1]]
Y = df.idc
X_validate = preprocessing.maxabs_scale(X)
y_pred = model.predict(X_validate)
plt.plot(Y)
plt.plot(y_pred)
plt.show()
(weight_0,bias_0) = model.layers[0].get_weights()
(weight_1,bias_1) = model.layers[1].get_weights()
One limitation is that I can't use LSTM layers or other complex algorithms because I need to implement the trained model in a microcontroller on a motor application later.
I guess you could find some words for me to make my model a little better in accuracy.
At the end here is a figure where I show you the worse prediction performance. Orange is the prediction and blue is the measured current.
The training dataset was this one.
The correlation between the individual values can be found here. Since the values of id and ud have no correlation to idc, I decided to delete them.
The most important thing to keep in mind when trying to improve the accuracy of the model is ALWAYS Normalise the input data which basically means rescaling real-valued numeric attributes into the range 0 and 1. I am not able to understand the way you are providing the training data to the model. Could you please explain that. It would be better in understanding and identifying the scope of higher accuracy.
Now if we talk about parameters, I would suggest you the addition of a Tuning Algorithm for the parameters to get the optimized value of each parameter.
It is always a good parctice to include hidden layers which could provide better feature extract.
Related
I was training a model that contains 8 features that allows us to predict the probability of a room been sold.
Region: The region the room belongs to (an integer, taking value between 1 and 10)
Date:The date of stay (an integer between 1‐365, here we consider only one‐day
request)
Weekday: Day of week (an integer between 1‐7)
Apartment: Whether the room is a whole apartment (1) or just a room (0)
#beds:The number of beds in the room (an integer between 1‐4)
Review: Average review of the seller (a continuous variable between 1 and 5)
Pic Quality: Quality of the picture of the room (a continuous variable between 0 and 1)
Price: he historic posted price of the room (a continuous variable)
Accept:Whether this post gets accepted (someone took it, 1) or not (0) in the end
Column Accept is the "y". Hence, this is a binary classification.
We have plot the data and some of the data were skewed so we applied power transform.
We tried a neural network, ExtraTrees, XGBoost, Gradient boost, Random forest. They all gave about 0.77 AUC. However, when we tried them on the test set, the AUC dropped to 0.55 with a precision of 27%.
I am not sure where when wrong but my thinking was that the reason may due to the mixing of discrete and continuous data. Especially some of them are either 0 or 1.
Can anyone help?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import OneHotEncoder
import warnings
warnings.filterwarnings('ignore')
df_train = pd.read_csv('case2_training.csv')
X, y = df_train.iloc[:, 1:-1], df_train.iloc[:, -1]
y = y.astype(np.float32)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)
from sklearn.preprocessing import PowerTransformer
pt = PowerTransformer()
transform_list = ['Pic Quality', 'Review', 'Price']
X_train[transform_list] = pt.fit_transform(X_train[transform_list])
X_test[transform_list] = pt.transform(X_test[transform_list])
for i in transform_list:
df = X_train[i]
ax = df.plot.hist()
ax.set_title(i)
plt.show()
# Normalization
sc = MinMaxScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
X_train = X_train.astype(np.float32)
X_test = X_test.astype(np.float32)
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(random_state=123, n_estimators=50)
clf.fit(X_train,y_train)
yhat = clf.predict_proba(X_test)
# AUC metric
train_accuracy = roc_auc_score(y_test, yhat[:,-1])
print("AUC",train_accuracy)
from sklearn.ensemble import GradientBoostingClassifier
clf = GradientBoostingClassifier(random_state=123, n_estimators=50)
clf.fit(X_train,y_train)
yhat = clf.predict_proba(X_test)
# AUC metric
train_accuracy = roc_auc_score(y_test, yhat[:,-1])
print("AUC",train_accuracy)
from torch import nn
from skorch import NeuralNetBinaryClassifier
import torch
model = nn.Sequential(
nn.Linear(8,64),
nn.BatchNorm1d(64),
nn.GELU(),
nn.Linear(64,32),
nn.BatchNorm1d(32),
nn.GELU(),
nn.Linear(32,16),
nn.BatchNorm1d(16),
nn.GELU(),
nn.Linear(16,1),
# nn.Sigmoid()
)
net = NeuralNetBinaryClassifier(
model,
max_epochs=100,
lr=0.1,
# Shuffle training data on each epoch
optimizer=torch.optim.Adam,
iterator_train__shuffle=True,
)
net.fit(X_train, y_train)
from xgboost.sklearn import XGBClassifier
clf = XGBClassifier(silent=0,
learning_rate=0.01,
min_child_weight=1,
max_depth=6,
objective='binary:logistic',
n_estimators=500,
seed=1000)
clf.fit(X_train,y_train)
yhat = clf.predict_proba(X_test)
# AUC metric
train_accuracy = roc_auc_score(y_test, yhat[:,-1])
print("AUC",train_accuracy)
Here is an attachment of a screenshot of the data.
Sample data
This is the fundamental first step of Data Analytics. You need to do two things here:
Data understanding - do the data fields in their current format make sense (data types, value range etc.)
Data preparation - what should I do to update these data fields before passing them to our model? Also which inputs do you think will be useful for your model and which will provide little benefit? Are there outliers I need to consider/handle?
A good book if you're starting in the field of data analytics is Fundamentals of Machine Learning for Predictive Data Analytics (I have no affiliation with this book).
Looking at your dataset there's a couple of things you could try to see how it influences your prediction results:
Unless region order is actually ranked in importance/value I would change this to a one hot encoded feature, you can do this in sklearn. Otherwise you run the risk of your model thinking that regions with a higher number (say 10) are more important than regions with a lower value (say 1).
You could attempt to normalise certain categories if they are much larger than some of your other data fields Why Data Normalization is necessary for Machine Learning models
Consider looking at the Kaggle competition House Prices: Advanced Regression Techniques. It's doing a similar thing to what you're attempting to do, and it might have some pointers for how you should approach the problem in the Notebooks and Discussion tabs.
Without deeply exploring all the data you are using it is hard to say for certain what is causing the drop in accuracy (or AUC) when moving from your training set to the testing set. It is unlikely to be caused by the mixed discrete/continuous data.
The drop just suggests that your models are over-fitting to your training data (and therefore not transferring well). This could be caused by too many learned parameters (given the amount of data you have)--more often a problem with neural networks than with some of the other methods you mentioned. Or, the problem could be with the way the data was split into training/testing. If the distribution of the data has a significant difference (that's maybe not obvious) then you wouldn't expect the testing performance to be as good. If it were me, I'd look carefully at how the data was split into training/testing (assuming you have a reasonably large set of data). You may try repeating your experiments with a number of random training/testing splits (search k-fold cross validation if you're not familiar with it).
your model is overfit. try to make a simple model first and use a lower parameter value. for tree-based classification, scaling does not have any impact on the model.
I am new to machine learning. I have been trying to get this code working but the loss is stuck as 1.12 and is neither increasing or decreasing. Any help would be appreciated.
import pandas as pd
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
dataset = pd.read_csv('Iris.csv')
#for rncoding label
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
dataset["Labels"] = encoder.fit_transform(dataset["Species"])
X = dataset.iloc[:,1:5]
Y = dataset['Labels']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=42)
X_train = np.array(X_train).astype(np.float32)
X_test = np.array(X_test).astype(np.float32)
y_train = np.array(y_train).astype(np.float32)
y_test = np.array(y_test).astype(np.float32)
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(8, input_shape=(4,), activation='relu'),
tf.keras.layers.Dense(3, activation='softmax')
])
opt = tf.keras.optimizers.Adam(0.01)
model.compile(optimizer=opt, loss='mse')
r = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=50)
This is a classification problem where you have to predict the class of Iris plant (source). You have specified mse loss which stands for 'Mean Squared Error'. It measures the average deviation of predicted values from actual values. The square ensures you penalize a large deviation higher than a small deviation. This loss is used for regression problems when you have to predict a continuous value like price, clicks, sales etc.
A few suggestions that will help are:
Change the loss to a classification loss function. categorical_cross_entropy is a good choice here. Without going into too many details, in classification problems model outputs the score of a particular sample belonging to a class. The softmax function used by you converts these scores to normalized probabilities. The cross-entropy loss ensures that your model is penalized when it gives a high probability to the wrong class
Try standardizing your data with 0 mean and unit variance. This helps the model convergence.
You may refer to this article for building a neural network for Iris dataset.
I am currently working on a model that reads structured data and determines if someone has a disease. I think the issue is the data is not being split between training and testing data. I am unaware of how I would be able to do that.
I am not sure what to try.
import pandas as pd
import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
import seaborn as sns
from sklearn.tree import DecisionTreeClassifier
heart_data = pd.read_csv('cardio_train.csv')
heart_data.head()
heart_data.shape
heart_data.describe()
heart_data.isnull().sum()
heart_data_columns = heart_data.columns
predictors = heart_data[heart_data_columns[heart_data_columns != 'target']] # all columns except Breast Cancer
target = heart_data['target'] # Breast Cancer column
#This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type
predictors.head()
target.head()
#normalize the data by subtracting the mean and dividing by the standard deviation.
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()
n_cols = predictors_norm.shape[1] # number of predictors
def regression_model():
# create model
model = Sequential()
#inputs
model.add(Dense(50, activation='relu', input_shape=(n_cols,)))
model.add(Dense(50, activation='relu')) # activation function
model.add(Dense(1))
# compile model
model.compile(optimizer='adam', loss='mean_squared_error')
#loss measures the results and figures out how bad it did. Optimizer generates next guess.
return model
# build the model
model = regression_model()
print (model)
# fit the model
history=model.fit(predictors_norm, target, validation_split=0.3, epochs=10, verbose=2)
#Decision Tree
print ("Processing Decision Tree")
dtc = DecisionTreeClassifier()
dtc.fit(predictors_norm,target)
print("Decision Tree Test Accuracy {:.2f}%".format(dtc.score(predictors_norm, target)*100))
#Support Vector Machine
print ("Processing Support Vector Machine")
svm = SVC(random_state = 1)
svm.fit(predictors_norm, target)
print("Test Accuracy of SVM Algorithm: {:.2f}%".format(svm.score(predictors_norm,target)*100))
#Random Forest
print ("Processing Random Forest")
rf = RandomForestClassifier(n_estimators = 1000, random_state = 1)
rf.fit(predictors_norm, target)
print("Random Forest Algorithm Accuracy Score : {:.2f}%".format(rf.score(predictors_norm,target)*100))
The message i am getting is this
Decision Tree Test Accuracy 100.00%
However, support vector machine is getting 73.37%
You are evaluating your model on the same data as when you trained it : you are probably overfitting. To overcome this, you must separate the data into two parts, one for learning, one for testing :
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(predictors, target, test_size=0.2)
Then, learn your model with the train dataset and evaluate it on the test dataset :
dtc = DecisionTreeClassifier()
dtc.fit(x_train, y_train)
accuracy = dtc.score(x_test, y_test) * 100
print(f"Decision Tree test accuracy : {accuracy} %.")
I am learning to do classification for Cover Type data for 7 classes. I train my model with GradientBoostingClassifier from scikit-learn. When I try to plot my loss function this goes like this:
Is this kind of plot shows me that my model suffers from high variance? If yes, what should I do? And I don't know why in the middle of iterations 200 until 500, the plot is shaped like a rectangle.
(EDIT)
To edit this post, I'm not sure what's wrong with my code becaue I just used the regular code to fit the training data. I'm using jupyter notebook. So I'm just going to provide the code
Y = train["Cover_Type"]
X = train.drop({"Cover_Type"}, axis=1)
#split training data dan cross validation
from sklearn.model_selection import train_test_split
X_train, X_val, Y_train, Y_val = train_test_split(X,Y,test_size=0.3,random_state=42)
from sklearn.metrics import mean_squared_error
from sklearn.datasets import make_friedman1
from sklearn.ensemble import GradientBoostingClassifier
params = {'n_estimators': 1000,'learning_rate': 0.3, 'max_features' : 'sqrt'}
dtree=GradientBoostingClassifier(**params)
dtree.fit(X_train,Y_train)
#mau lihat F1-Score
from sklearn.metrics import f1_score
Y_pred = dtree.predict(X_val) #prediksi data cross validation menggunakan model tadi
print Y_pred
score = f1_score(Y_val, Y_pred, average="micro")
print("Gradient Boosting Tree F1-score: "+str(score)) # I got 0.86 F1-Score
import matplotlib.pyplot as plt
# Plot training deviance
# compute test set deviance
val_score = np.zeros((params['n_estimators'],), dtype=np.float64)
for i, Y_pred in enumerate(dtree.staged_predict(X_val)):
val_score[i] = dtree.loss_(Y_val, Y_pred.reshape(-1, 1))
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.title('Deviance')
plt.plot(np.arange(params['n_estimators']) + 1, dtree.train_score_, 'b-',
label='Training Set Deviance')
plt.plot(np.arange(params['n_estimators']) + 1, val_score, 'r-',
label='Validation Set Deviance')
plt.legend(loc='upper right')
plt.xlabel('Boosting Iterations')
plt.ylabel('Deviance')
There are several issues that I will explain them one by one, also I have added the correct code for your example.
staged_predict(X) method shall NOT be used
As staged_predict(X) outputs the predicted class instead of predicted probabilities, it is not correct to use that.
One can (where the context accept) use staged_decision_function(X) method and pass the computed decisions at each stage to the model.loss_ attribute. But in this example, it does not work (the loss based on staged decision increases while the loss decreases).
You should use staged_predict_proba(X) with cross entropy loss
you should use staged_predict_proba(X)
you also need to define a function that calculate the cross entropy loss at each stage.
I have provided the code below. Note that I set the verbosity to 2, and then you can see that the sklearn training loss at each stage is the same as our loss (as a sanity check that our approach works correctly).
Why you have big jumps
I think the reason is that GBC becomes very confident and then predict a label is 1 (as an example) with probability one, while it is not correct (for example the label is 2). This creates big jumps (as cross entropy goes to infinity). In such a scenario you should change your GBC parameters.
The code and the Plot are given below
The code is:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_covtype
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
def _cross_entropy_like_loss(model, input_data, targets, num_estimators):
loss = np.zeros((num_estimators, 1))
for index, predict in enumerate(model.staged_predict_proba(input_data)):
loss[index, :] = -np.sum(np.log([predict[sample_num, class_num-1]
for sample_num, class_num in enumerate(targets)]))
print(f'ce loss {index}:{loss[index, :]}')
return loss
covtype = fetch_covtype()
X = covtype.data
Y = covtype.target
n_estimators = 10
X_train, X_val, Y_train, Y_val = train_test_split(X, Y, test_size=0.3, random_state=42)
clf = GradientBoostingClassifier(n_estimators=n_estimators, learning_rate=0.3, verbose=2 )
clf.fit(X_train, Y_train)
tr_loss_ce = _cross_entropy_like_loss(clf, X_train, Y_train, n_estimators)
test_loss_ce = _cross_entropy_like_loss(clf, X_val, Y_val, n_estimators)
plt.figure()
plt.plot(np.arange(n_estimators) + 1, tr_loss_ce, '-r', label='training_loss_ce')
plt.plot(np.arange(n_estimators) + 1, test_loss_ce, '-b', label='val_loss_ce')
plt.ylabel('Error')
plt.xlabel('num_components')
plt.legend(loc='upper right')
The output of console is like below, from which you can easily verify the approach is correct.
Iter Train Loss Remaining Time
1 482434.6631 1.04m
2 398501.7223 55.56s
3 351391.6893 48.51s
4 322290.3230 41.60s
5 301887.1735 34.65s
6 287438.7801 27.72s
7 276109.2008 20.82s
8 268089.2418 13.84s
9 261372.6689 6.93s
10 256096.1205 0.00s
ce loss 0:[ 482434.6630936]
ce loss 1:[ 398501.72228276]
ce loss 2:[ 351391.68933547]
ce loss 3:[ 322290.32300604]
ce loss 4:[ 301887.17346783]
ce loss 5:[ 287438.7801033]
ce loss 6:[ 276109.20077844]
ce loss 7:[ 268089.2418214]
ce loss 8:[ 261372.66892149]
ce loss 9:[ 256096.1205235]
The plot is here.
It seems like you have several issues. It's hard to say for sure, because you don't provide any code.
Does my model suffer from high variance?
First, your model is overfitting from the start. You can tell that this is the case since your validation loss is increasing although your training is decreasing. What's interesting is that your validation loss is increasing from the very start, which suggests that your model is not working. So to answer your question, yes, it suffers from high variance.
What should I do?
Are you sure there is a trend in your data? The fact that the validation increases from the very start hints that either this model does not apply to your data at all, that your data does not have a trend, or that you have issues with your code. Perhaps try other models, and make sure your code is correct. Again, it's hard to say without a minimal example.
The strange rectangle
This looks strange. Either there is an issue with your data in the validation set (because this impact does not occur on the validation set) or you just have an issue with your code. If you provide a sample, we could probably help you more.
I want the classifier to run faster and stop early if the patience reaches the number I set. In the following code it does 10 iterations of fitting the model.
import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.wrappers.scikit_learn import KerasClassifier
from keras.callbacks import EarlyStopping, ModelCheckpoint
from keras.constraints import maxnorm
from keras.optimizers import SGD
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load dataset
dataframe = pandas.read_csv("sonar.csv", header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:60].astype(float)
Y = dataset[:,60]
# encode class values as integers
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
calls=[EarlyStopping(monitor='acc', patience=10), ModelCheckpoint('C:/Users/Nick/Data Science/model', monitor='acc', save_best_only=True, mode='auto', period=1)]
def create_baseline():
# create model
model = Sequential()
model.add(Dropout(0.2, input_shape=(33,)))
model.add(Dense(33, init='normal', activation='relu', W_constraint=maxnorm(3)))
model.add(Dense(16, init='normal', activation='relu', W_constraint=maxnorm(3)))
model.add(Dense(122, init='normal', activation='softmax'))
# Compile model
sgd = SGD(lr=0.1, momentum=0.8, decay=0.0, nesterov=False)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
return model
numpy.random.seed(seed)
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn=create_baseline, nb_epoch=300, batch_size=16, verbose=0, callbacks=calls)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
Here is the resulting error-
RuntimeError: Cannot clone object <keras.wrappers.scikit_learn.KerasClassifier object at 0x000000001D691438>, as the constructor does not seem to set parameter callbacks
I changed the cross_val_score in the following-
numpy.random.seed(seed)
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasClassifier(build_fn=create_baseline, nb_epoch=300, batch_size=16, verbose=0, callbacks=calls)))
pipeline = Pipeline(estimators)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold, fit_params={'callbacks':calls})
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))
and now I get this error-
ValueError: need more than 1 value to unpack
This code came from here. The code is by far the most accurate I've used so far. The problem is that there is no defined model.fit() anywhere in the code. It also takes forever to fit. The fit() operation occurs at the results = cross_val_score(...) and there's no parameters to throw a callback in there.
How do I go about doing this?
Also, how do I run the model trained on a test set?
I need to be able to save the trained model for later use...
Reading from here, which is the source code of KerasClassifier, you can pass it the arguments of fit and they should be used.
I don't have your dataset so I cannot test it, but you can tell me if this works and if not I will try and adapt the solution. Change this line :
estimators.append(('mlp', KerasClassifier(build_fn=create_baseline, nb_epoch=300, batch_size=16, verbose=0, callbacks=[...your_callbacks...])))
A small explaination of what's happening : KerasClassifier is taking all the possibles arguments for fit, predict, score and uses them accordingly when each method is called. They made a function that filters the arguments that should go to each of the above functions that can be called in the pipeline.
I guess there are several fit and predict calls inside the StratifiedKFold step to train on different splits everytime.
The reason why it takes forever to fit and it fits 10 times is because one fit is doing 300 epochs, as you asked. So the KFold is repeating this step over the different folds :
calls fit with all the parameters given to KerasClassifier (300 epochs and batch size = 16). It's training on 9/10 of your data and using 1/10 as validation.
EDIT :
Ok, so I took the time to download the dataset and try your code... First of all you need to correct a "few" things in your network :
your input have a 60 features. You clearly show it in your data prep :
X = dataset[:,:60].astype(float)
so why would you have this :
model.add(Dropout(0.2, input_shape=(33,)))
please change to :
model.add(Dropout(0.2, input_shape=(60,)))
About your targets/labels. You changed the objective from the original code (binary_crossentropy) to categorical_crossentropy. But you didn't change your Y array. So either do this in your data preparation :
from keras.utils.np_utils import to_categorical
encoded_Y = to_categorical(encoder.transform(Y))
or change your objective back to binary_crossentropy.
Now the network's output size : 122 on the last dense layer? your dataset obviously has 2 categories so why are you trying to output 122 classes? it won't match the target. Please change back your last layer to :
model.add(Dense(2, init='normal', activation='softmax'))
if you choose to use categorical_crossentropy, or
model.add(Dense(1, init='normal', activation='sigmoid'))
if you go back to binary_crossentropy.
So now that your network compiles, I could start to troubleshout.
here is your solution
So now I could get the real error message. It turns out that when you feed fit_params=whatever in the cross_val_score() function, you are feeding those parameters to a pipeline. In order to know to which part of the pipeline you want to send those parameters you have to specify it like this :
fit_params={'mlp__callbacks':calls}
Your error was saying that the process couldn't unpack 'callbacks'.split('__', 1) into 2 values. It was actually looking for the name of the pipeline's step to apply this to.
It should be working now :)
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold, fit_params={'mlp__callbacks':calls})
BUT, you should be aware of what's happening here... the cross validation actually calls the create_baseline() function to recreate the model from scratch 10 times an trains it 10 times on different parts of the dataset. So it's not doing epochs as you were saying, it's doing 300 epochs 10 times.
What is also happening as a consequence of using this tool : since the models are always differents, it means the fit() method is applied 10 times on different models, therefore, the callbacks are also applied 10 different times and the files saved by ModelCheckpoint() get overriden and you find yourself only with the best model of the last run.
This is intrinsec to the tools you use, I don't see any way around this. This comes as consequence to using different general tools that weren't especially thought to be used together with all the possible configurations.
Try:
estimators.append(('mlp',
KerasClassifier(build_fn=create_model2,
nb_epoch=300,
batch_size=16,
verbose=0,
callbacks=[list_of_callbacks])))
where list_of_callbacks is a list of callbacks you want to apply. You could find details here. It's mentioned there that parameters fed to KerasClassifier could be legal fitting parameters.
It's also worth to mention that if you are using multiple runs with GPUs there might be a problem due to several reported memory leaks especially when you are using theano. I also noticed that running multiple fits consequently may show results which seem to be not independent when using sklearn API.
Edit:
Try also:
results = cross_val_score(pipeline, X, encoded_Y, cv=kfold, fit_params = {'mlp__callbacks': calls})
Instead of putting callbacks list in a wrapper instantiation.
This is what I have done
results = cross_val_score(estimator, X, Y, cv=kfold,
fit_params = {'callbacks': [checkpointer,plateau]})
and has worked so far
Despite the TensorFlow, Keras & SciKeras documentation suggesting you can define training callbacks via the fit method, for my setup it turns out (like #NassimBen suggests) you should do it through the model constructor instead.
Rather than this:
model = KerasClassifier(..).fit(X, y, callbacks=[<HERE>])
Try this:
model = KerasClassifier(callbacks=[<HERE>]).fit(X, y)