Do I have to preproces new data again to predict the model? - python

I have a save model and I want to load the model for new data predictions. I have new data and I have predicted the model, but the result of the prediction is completely wrong. Do I have to preproces new data again to predict the model?
This my save model code:
import numpy as np
from numpy import loadtxt
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('Data_Sensor.csv')
dataset.head()
X = dataset.iloc[:, 1:3].values
y = dataset.iloc[:, -1].values
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
labelencoder_y = LabelEncoder()
y = labelencoder_y.fit_transform(y)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
model = Sequential
model.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 2))
model.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu'))
model.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
model.fit(X_train, y_train, batch_size = 10, epochs = 100)
model.save('model.h5')
y_pred = model.predict(X_test)
print(y_pred)
y_pred = (y_pred > 0.5)
print(y_pred)
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
and this is my load model for predict new data, but the results wrong:
import numpy as np
import pandas as pd
import sklearn
from tensorflow.keras.models import load_model
model = load_model('model.h5')
import pandas as pd
dataset = pd.read_csv('Data_Sensor.csv')
dataset.head()
X = dataset.iloc[:, 1:3].values
print(X)
model.predict(X)

Yes, new data must be pre-processed before prediction exactly as you did with the training data.
For your example, you need to retain the fitted StandardScaler, e.g. by using get_params and set_params to restore it.
As suggested in the comments below, a better way of doing this with Keras is to add a BatchNormalization layer at the beginning of the model. This does the same transformation as the standard scaler and is saved together with the rest of the model:
model = Sequential()
model.add(BatchNormalization())
model.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 2))
model.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu'))
model.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))

Related

Error with udemy Deep learning code of ANN

*I'm Attending a deep learning course on Udemy. I've written the code exactly in the same way the instructor said. but having a problem after the classifier.fit(X_train, y_train, batch_size = 10,epochs = 100) The error is as follows
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values
# Encoding categorical data
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from sklearn.compose import ColumnTransformer
label_encoder_x_1 = LabelEncoder()
X[: , 2] = label_encoder_x_1.fit_transform(X[:,2])
transformer = ColumnTransformer(
transformers=[
("OneHot", # Just a name
OneHotEncoder(), # The transformer class
[1] # The column(s) to be applied on.
)
],
remainder='passthrough' # donot apply anything to the remaining columns
)
X = transformer.fit_transform(X.tolist())
X = X.astype('float64')
X = X[:, 1:]
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
#importing keras
import keras
from keras.models import Sequential
from keras.layers import Dense
# Fitting classifier to the Training set
# Create your classifier here
classifier = Sequential()
classifier.add(Dense(output_dim = 6, init = 'uniform', activation = 'relu', input_dim = 11))
classifier.add(Dense(output_dim = 6, init = 'uniform', activation = 'relu'))
classifier.add(Dense(output_dim = 1, init = 'uniform', activation = 'sigmoid'))
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
classifier.fit(X_train, y_train, batch_size = 10, epochs = 100)
# Predicting the Test set results
y_pred = classifier.predict(X_test)
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
File "C:\Anaconda3\envs\py37\lib\site-packages\sklearn\metrics_classification.py", line 268, in confusion_matrix
y_type, y_true, y_pred = _check_targets(y_true, y_pred)
File "C:\Anaconda3\envs\py37\lib\site-packages\sklearn\metrics_classification.py", line 90, in _check_targets
"and {1} targets".format(type_true, type_pred))
ValueError: Classification metrics can't handle a mix of binary and continuous targets
How to solve this*
Problem seems to be with the line, y_pred = classifier.predict(X_test). According to documentation predict_classes is used for getting class predictions, https://www.tensorflow.org/api_docs/python/tf/keras/Sequential#predict_classes. Predict is returning continuous values which are not class labels. I made minor adjustments to your code,
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values
#print(X)
#print(y)
# Encoding categorical data
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from sklearn.compose import ColumnTransformer
label_encoder_x_1 = LabelEncoder()
X[: , 2] = label_encoder_x_1.fit_transform(X[:,2])
transformer = ColumnTransformer(
transformers=[
("OneHot", # Just a name
OneHotEncoder(), # The transformer class
[1] # The column(s) to be applied on.
)
],
remainder='passthrough' # donot apply anything to the remaining columns
)
X = transformer.fit_transform(X.tolist())
X = X.astype('float64')
X = X[:, 1:]
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
#print(sum(y_train))
#print(sum(y_test))
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
#importing keras
import keras
from keras.models import Sequential
from keras.layers import Dense
# Fitting classifier to the Training set
# Create your classifier here
classifier = Sequential()
classifier.add(Dense(output_dim = 6, init = 'uniform', activation = 'relu', input_dim = 11))
classifier.add(Dense(output_dim = 6, init = 'uniform', activation = 'relu'))
classifier.add(Dense(output_dim = 1, init = 'uniform', activation = 'sigmoid'))
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
classifier.fit(X_train, y_train, batch_size = 10, epochs = 10)
# Predicting the Test set results
y_pred = classifier.predict_classes(X_test)
#print(classifier.predict(X_test))
#print(y_pred)
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
#cm = confusion_matrix(y_test, y_pred)
print(confusion_matrix(y_test, y_pred, labels=[0, 1]))
print(classification_report(y_test, y_pred, target_names=['0', '1']))

Why is my r2_score dependent on the units of the dependent variable

I have built a regression model using ANN relating 8 input parameters and 1 output parameter.
code
X = data.iloc[:,:-1]
y = data.iloc[:,8:9]*100
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train_us, X_test_us, y_train_us, y_test_us = train_test_split(X, y, test_size = 0.2, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
sc_Y = StandardScaler()
X_train = sc_X.fit_transform(X_train_us)
X_test = sc_X.transform(X_test_us)
y_train = sc_Y.fit_transform(y_train_us)
y_test = sc_Y.transform(y_test_us)
# Importing the Keras libraries and packages
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
def base_model():
# Initialising the ANN
regressor = Sequential()
# Adding the input layer and the first hidden layer
regressor.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 8))
# Adding the second hidden layer
regressor.add(Dense(units = 4, kernel_initializer = 'uniform', activation = 'relu'))
# Adding the output layer
regressor.add(Dense(units = 1, kernel_initializer = 'uniform'))
# Compiling the ANN
regressor.compile(optimizer = 'adam', loss = 'mse', metrics = ['mae'])
return regressor
# Fitting the ANN to the Training set
regressor = KerasRegressor(build_fn=base_model, epochs=500, batch_size=32)
regressor.fit(X_train,y_train)
# Predicting the Test & Train set with regressor built
y_pred = regressor.predict(X_test)
y_pred = sc_Y.inverse_transform(y_pred)
y_test = sc_Y.inverse_transform(y_test)
#calculate r2_score
from sklearn.metrics import r2_score
score_test = r2_score(y_test,y_pred)
I get an r2_score of 98%.Unit of my output variable is currently metres. If I multiply it by 100 and change it to centi-meters and train the model and calculate the r2_score it is 91%.
Why is my r2_score changing with the unit of the dependent variable. Shouldn't scaling take care of this?
Thanks!!

Modeling KDD Cup 99 dataset using Neural Network

I have task to modeling KDD Cup 99 Dataset using Neural Network. I am using Jupyter Notebook to compile it each functions.
Here is the code:
import pandas
#importing the dataset
dataset = pandas.read_csv('kddcup.data_10_percent_corrected')
#change Multi-class to binary-class
dataset['normal.'] = dataset['normal.'].replace(['back.', 'buffer_overflow.', 'ftp_write.', 'guess_passwd.', 'imap.', 'ipsweep.', 'land.', 'loadmodule.', 'multihop.', 'neptune.', 'nmap.', 'perl.', 'phf.', 'pod.', 'portsweep.', 'rootkit.', 'satan.', 'smurf.', 'spy.', 'teardrop.', 'warezclient.', 'warezmaster.'], 'attack')
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 41].values
#encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_x_1 = LabelEncoder()
labelencoder_x_2 = LabelEncoder()
labelencoder_x_3 = LabelEncoder()
x[:, 1] = labelencoder_x_1.fit_transform(x[:, 1])
x[:, 2] = labelencoder_x_2.fit_transform(x[:, 2])
x[:, 3] = labelencoder_x_3.fit_transform(x[:, 3])
onehotencoder_1 = OneHotEncoder(categorical_features = [1])
x = onehotencoder_1.fit_transform(x).toarray()
onehotencoder_2 = OneHotEncoder(categorical_features = [4])
x = onehotencoder_2.fit_transform(x).toarray()
onehotencoder_3 = OneHotEncoder(categorical_features = [70])
x = onehotencoder_3.fit_transform(x).toarray()
labelencoder_y = LabelEncoder()
y = labelencoder_y.fit_transform(y)
#splitting the dataset into the training set and test set
from sklearn.cross_validation import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 0)
#feature scaling
from sklearn.preprocessing import StandardScaler
sc_x = StandardScaler()
x_train = sc_x.fit_transform(x_train)
x_test = sc_x.transform(x_test)
# Importing the Keras libraries and packages
import keras
from keras.models import Sequential
from keras.layers import Dense
# Initialising the ANN
classifier = Sequential()
# Adding the input layer and the first hidden layer
classifier.add(Dense(output_dim = 60, init = 'uniform', activation = 'relu', input_dim = 118))
#Adding a second hidden layer
classifier.add(Dense(output_dim = 60, init = 'uniform', activation = 'relu'))
#Adding a third hidden layer
classifier.add(Dense(output_dim = 60, init = 'uniform', activation = 'relu'))
# Adding the output layer
classifier.add(Dense(output_dim = 1, init = 'uniform', activation = 'sigmoid'))
# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
# Fitting the ANN to the Training set
#keras.utils.to_categorical(y, num_classes=None, dtype='float32')
from keras.utils import to_categorical
classifier.fit(to_categorical(x_train), to_categorical(y_train), verbose=1, batch_size = 10, nb_epoch = 20)
# Predicting the Test set results
y_pred = classifier.predict(x_test)
y_pred = (y_pred > 0.5)
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
#the performance of the classification model
print("the Accuracy is: "+ str((cm[0,0]+cm[1,1])/(cm[0,0]+cm[0,1]+cm[1,0]+cm[1,1])))
recall = cm[1,1]/(cm[0,1]+cm[1,1])
print("Recall is : "+ str(recall))
print("False Positive rate: "+ str(cm[1,0]/(cm[0,0]+cm[1,0])))
precision = cm[1,1]/(cm[1,0]+cm[1,1])
print("Precision is: "+ str(precision))
print("F-measure is: "+ str(2*((precision*recall)/(precision+recall))))
from math import log
print("Entropy is: "+ str(-precision*log(precision)))
But when I run the code, I found this error
Try using the np.argmax before training your classifier:
conv = to_categorical(y_train)
y_train1 = np.argmax(conv,axis=1)
classifier.fit(x_train, y_train1, verbose=1, batch_size = 10, nb_epoch = 20)

Modeling using Neural Network

I am doing my task to modeling kdd cup 99 dataset using Neural Network. I've tried to modeling it using this code:
#data preprocessing
#importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
#importing the dataset
dataset = pd.read_csv('kddcupdata.gz')
#change Multi-class to binary-class
dataset['normal.'] = dataset['normal.'].replace(['back.', 'buffer_overflow.', 'ftp_write.', 'guess_passwd.', 'imap.', 'ipsweep.', 'land.', 'loadmodule.', 'multihop.', 'neptune.', 'nmap.', 'perl.', 'phf.', 'pod.', 'portsweep.', 'rootkit.', 'satan.', 'smurf.', 'spy.', 'teardrop.', 'warezclient.', 'warezmaster.'], 'attack')
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 41].values
#encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_x_1 = LabelEncoder()
labelencoder_x_2 = LabelEncoder()
labelencoder_x_3 = LabelEncoder()
x[:, 1] = labelencoder_x_1.fit_transform(x[:, 1])
x[:, 2] = labelencoder_x_2.fit_transform(x[:, 2])
x[:, 3] = labelencoder_x_3.fit_transform(x[:, 3])
onehotencoder_1 = OneHotEncoder(categorical_features = [1])
x = onehotencoder_1.fit_transform(x).toarray()
onehotencoder_2 = OneHotEncoder(categorical_features = [4])
x = onehotencoder_2.fit_transform(x).toarray()
onehotencoder_3 = OneHotEncoder(categorical_features = [70])
x = onehotencoder_3.fit_transform(x).toarray()
labelencoder_y = LabelEncoder()
y = labelencoder_y.fit_transform(y)
#splitting the dataset into the training set and test set
from sklearn.cross_validation import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 0)
#feature scaling
from sklearn.preprocessing import StandardScaler
sc_x = StandardScaler()
x_train = sc_x.fit_transform(x_train)
x_test = sc_x.transform(x_test)
# Importing the Keras libraries and packages
import keras
from keras.models import Sequential
from keras.layers import Dense
# Initialising the ANN
classifier = Sequential()
# Adding the input layer and the first hidden layer
classifier.add(Dense(output_dim = 60, init = 'uniform', activation = 'relu', input_dim = 118))
#Adding a second hidden layer
classifier.add(Dense(output_dim = 60, init = 'uniform', activation = 'relu'))
#Adding a third hidden layer
classifier.add(Dense(output_dim = 60, init = 'uniform', activation = 'relu'))
# Adding the output layer
classifier.add(Dense(output_dim = 1, init = 'uniform', activation = 'sigmoid'))
# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
# Fitting the ANN to the Training set
classifier.fit(x_train, y_train, batch_size = 10, nb_epoch = 20)
# Predicting the Test set results
y_pred = classifier.predict(x_test)
y_pred = (y_pred > 0.5)
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
#the performance of the classification model
print("the Accuracy is: "+ str((cm[0,0]+cm[1,1])/(cm[0,0]+cm[0,1]+cm[1,0]+cm[1,1])))
recall = cm[1,1]/(cm[0,1]+cm[1,1])
print("Recall is : "+ str(recall))
print("False Positive rate: "+ str(cm[1,0]/(cm[0,0]+cm[1,0])))
precision = cm[1,1]/(cm[1,0]+cm[1,1])
print("Precision is: "+ str(precision))
print("F-measure is: "+ str(2*((precision*recall)/(precision+recall))))
from math import log
print("Entropy is: "+ str(-precision*log(precision)))
But i have got error like this picture below
I hope someone can help me to fix this code, or by give me the new code, so i can use it to modeling KDD Cup 99 Dataset easily. I really need your help to finish my task.
Thank you.

Why am i getting AttributeError: 'KerasClassifier' object has no attribute 'model'?

This is the code and I'm getting the error in the last line only which is y_pred = classifier.predict(X_test). The error I'm getting is AttributeError: 'KerasClassifier' object has no attribute 'model'
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn import datasets
from sklearn import preprocessing
from keras.utils import np_utils
# Importing the dataset
dataset = pd.read_csv('Data1.csv',encoding = "cp1252")
X = dataset.iloc[:, 1:-1].values
y = dataset.iloc[:, -1].values
# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_0 = LabelEncoder()
X[:, 0] = labelencoder_X_0.fit_transform(X[:, 0])
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])
labelencoder_X_3 = LabelEncoder()
X[:, 3] = labelencoder_X_3.fit_transform(X[:, 3])
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:]
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Creating the ANN!
# Importing the Keras libraries and packages
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
def build_classifier():
# Initialising the ANN
classifier = Sequential()
# Adding the input layer and the first hidden layer
classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 10))
classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu'))
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
return classifier
classifier = KerasClassifier(build_fn = build_classifier, batch_size = 10, epochs = 2)
accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 1, n_jobs=1)
mean = accuracies.mean()
variance = accuracies.std()
# Predicting the Test set results
import sklearn
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
# Predicting new observations
test = pd.read_csv('test.csv',encoding = "cp1252")
test = test.iloc[:, 1:].values
test[:, 0] = labelencoder_X_0.transform(test[:, 0])
test[:, 1] = labelencoder_X_1.transform(test[:, 1])
test[:, 2] = labelencoder_X_2.transform(test[:, 2])
test[:, 3] = labelencoder_X_3.transform(test[:, 3])
test = onehotencoder.transform(test).toarray()
test = test[:, 1:]
new_prediction = classifier.predict_classes(sc.transform(test))
new_prediction1 = (new_prediction > 0.5)
Because you haven't fitted the classifier yet. For classifier to have the model variable available, you need to call
classifier.fit(X_train, y_train)
Although you have used cross_val_score() over the classifier, and found out accuracies, but the main point to note here is that the cross_val_score will clone the supplied model and use them for cross-validation folds. So your original estimator classifier is untouched and untrained.
You can see the working of cross_val_score in my other answer here
So put the above mentioned line just above y_pred = classifier.predict(X_test) line and you are all set. Hope this makes it clear.
You get the error because you didn´t actually train the returned model from KerasClassifier which is a Scikit-learn Wrapper to make use of Scikit-learn functions.
You could for example do a GridSearch (as you might know since the code seems to be from the Udemy ML/DL course):
def build_classifier(optimizer):
classifier = Sequential()
classifier.add(Dense(units = 6, kernel_initializer = 'uniform',
activation = 'relu', input_dim = 11))
classifier.add(Dense(units = 6, kernel_initializer = 'uniform',
activation = 'relu'))
classifier.add(Dense(units = 1, kernel_initializer = 'uniform',
activation = 'sigmoid'))
classifier.compile(optimizer = optimizer, loss =
'binary_crossentropy', metrics = ['accuracy'])
return classifier
classifier = KerasClassifier(build_fn = build_classifier)
parameters = {'batch_size': [25, 32],
'epochs': [100, 500],
'optimizer': ['adam', 'rmsprop']}
grid_search = GridSearchCV(estimator = classifier,
param_grid = parameters,
scoring = 'accuracy',
cv = 10)
grid_search = grid_search.fit(X_train, y_train)
If you don´t need Scikit-learn functionality I suggest to avoid the wrapper and simply build your model with:
model = Sequential()
model.add(Dense(32, input_dim=784))
model.add(Activation('relu'))
…
and then train with:
model.fit( … )

Categories

Resources