problem fiting my dataset into my model python

problem fiting my dataset into my model python - python

I have a problem fitting my dataset into my model. I do not know what this error represent and surely not how to fix it. thank you!
import numpy as np
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
from keras.models import Sequential
from keras.layers import Dense
dataset = pd.read_csv('Churn_Modelling.csv')
dataset
X=dataset.iloc[:,3:13].values
Y=dataset.iloc[:,13].values
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
lableencoder_X_2 = LabelEncoder()
X[:, 2] = lableencoder_X_2.fit_transform(X[:, 2])
ct = ColumnTransformer([('ohe', OneHotEncoder(), [1])], remainder='passthrough')
X = np.array(ct.fit_transform(X), dtype = str)
X = X[:, 1:]
classifier.add(Dense(units = 6,kernel_initializer = 'uniform',activation ='relu',input_dim = 11))
classifier.add(Dense(units = 6,kernel_initializer = 'uniform',activation ='relu'))
classifier.add(Dense(units= 1, kernel_initializer = 'uniform',activation = 'sigmoid'))
classifier.compile(optimizer = 'adam', loss = 'binary_crssentropy', metrics = ['accuracy'])
# fit our dataset to our module.
classifier.fit(X_train, y_train, batch_size = 10, epochs = 100)
Error:

Well error message is pretty clear. The loss function should be binary_crossentropy not binary_crssentropy

Related

Do I have to preproces new data again to predict the model?

I have a save model and I want to load the model for new data predictions. I have new data and I have predicted the model, but the result of the prediction is completely wrong. Do I have to preproces new data again to predict the model?
This my save model code:
import numpy as np
from numpy import loadtxt
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('Data_Sensor.csv')
dataset.head()
X = dataset.iloc[:, 1:3].values
y = dataset.iloc[:, -1].values
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
labelencoder_y = LabelEncoder()
y = labelencoder_y.fit_transform(y)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
model = Sequential
model.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 2))
model.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu'))
model.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
model.fit(X_train, y_train, batch_size = 10, epochs = 100)
model.save('model.h5')
y_pred = model.predict(X_test)
print(y_pred)
y_pred = (y_pred > 0.5)
print(y_pred)
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
and this is my load model for predict new data, but the results wrong:
import numpy as np
import pandas as pd
import sklearn
from tensorflow.keras.models import load_model
model = load_model('model.h5')
import pandas as pd
dataset = pd.read_csv('Data_Sensor.csv')
dataset.head()
X = dataset.iloc[:, 1:3].values
print(X)
model.predict(X)

Yes, new data must be pre-processed before prediction exactly as you did with the training data.
For your example, you need to retain the fitted StandardScaler, e.g. by using get_params and set_params to restore it.
As suggested in the comments below, a better way of doing this with Keras is to add a BatchNormalization layer at the beginning of the model. This does the same transformation as the standard scaler and is saved together with the rest of the model:
model = Sequential()
model.add(BatchNormalization())
model.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 2))
model.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu'))
model.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))

Error with udemy Deep learning code of ANN

*I'm Attending a deep learning course on Udemy. I've written the code exactly in the same way the instructor said. but having a problem after the classifier.fit(X_train, y_train, batch_size = 10,epochs = 100) The error is as follows
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values
# Encoding categorical data
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from sklearn.compose import ColumnTransformer
label_encoder_x_1 = LabelEncoder()
X[: , 2] = label_encoder_x_1.fit_transform(X[:,2])
transformer = ColumnTransformer(
transformers=[
("OneHot", # Just a name
OneHotEncoder(), # The transformer class
[1] # The column(s) to be applied on.
)
],
remainder='passthrough' # donot apply anything to the remaining columns
)
X = transformer.fit_transform(X.tolist())
X = X.astype('float64')
X = X[:, 1:]
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
#importing keras
import keras
from keras.models import Sequential
from keras.layers import Dense
# Fitting classifier to the Training set
# Create your classifier here
classifier = Sequential()
classifier.add(Dense(output_dim = 6, init = 'uniform', activation = 'relu', input_dim = 11))
classifier.add(Dense(output_dim = 6, init = 'uniform', activation = 'relu'))
classifier.add(Dense(output_dim = 1, init = 'uniform', activation = 'sigmoid'))
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
classifier.fit(X_train, y_train, batch_size = 10, epochs = 100)
# Predicting the Test set results
y_pred = classifier.predict(X_test)
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
File "C:\Anaconda3\envs\py37\lib\site-packages\sklearn\metrics_classification.py", line 268, in confusion_matrix
y_type, y_true, y_pred = _check_targets(y_true, y_pred)
File "C:\Anaconda3\envs\py37\lib\site-packages\sklearn\metrics_classification.py", line 90, in _check_targets
"and {1} targets".format(type_true, type_pred))
ValueError: Classification metrics can't handle a mix of binary and continuous targets
How to solve this*

Problem seems to be with the line, y_pred = classifier.predict(X_test). According to documentation predict_classes is used for getting class predictions, https://www.tensorflow.org/api_docs/python/tf/keras/Sequential#predict_classes. Predict is returning continuous values which are not class labels. I made minor adjustments to your code,
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values
#print(X)
#print(y)
# Encoding categorical data
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from sklearn.compose import ColumnTransformer
label_encoder_x_1 = LabelEncoder()
X[: , 2] = label_encoder_x_1.fit_transform(X[:,2])
transformer = ColumnTransformer(
transformers=[
("OneHot", # Just a name
OneHotEncoder(), # The transformer class
[1] # The column(s) to be applied on.
)
],
remainder='passthrough' # donot apply anything to the remaining columns
)
X = transformer.fit_transform(X.tolist())
X = X.astype('float64')
X = X[:, 1:]
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
#print(sum(y_train))
#print(sum(y_test))
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
#importing keras
import keras
from keras.models import Sequential
from keras.layers import Dense
# Fitting classifier to the Training set
# Create your classifier here
classifier = Sequential()
classifier.add(Dense(output_dim = 6, init = 'uniform', activation = 'relu', input_dim = 11))
classifier.add(Dense(output_dim = 6, init = 'uniform', activation = 'relu'))
classifier.add(Dense(output_dim = 1, init = 'uniform', activation = 'sigmoid'))
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
classifier.fit(X_train, y_train, batch_size = 10, epochs = 10)
# Predicting the Test set results
y_pred = classifier.predict_classes(X_test)
#print(classifier.predict(X_test))
#print(y_pred)
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
#cm = confusion_matrix(y_test, y_pred)
print(confusion_matrix(y_test, y_pred, labels=[0, 1]))
print(classification_report(y_test, y_pred, target_names=['0', '1']))

Modeling using Neural Network

I am doing my task to modeling kdd cup 99 dataset using Neural Network. I've tried to modeling it using this code:
#data preprocessing
#importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
#importing the dataset
dataset = pd.read_csv('kddcupdata.gz')
#change Multi-class to binary-class
dataset['normal.'] = dataset['normal.'].replace(['back.', 'buffer_overflow.', 'ftp_write.', 'guess_passwd.', 'imap.', 'ipsweep.', 'land.', 'loadmodule.', 'multihop.', 'neptune.', 'nmap.', 'perl.', 'phf.', 'pod.', 'portsweep.', 'rootkit.', 'satan.', 'smurf.', 'spy.', 'teardrop.', 'warezclient.', 'warezmaster.'], 'attack')
x = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 41].values
#encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_x_1 = LabelEncoder()
labelencoder_x_2 = LabelEncoder()
labelencoder_x_3 = LabelEncoder()
x[:, 1] = labelencoder_x_1.fit_transform(x[:, 1])
x[:, 2] = labelencoder_x_2.fit_transform(x[:, 2])
x[:, 3] = labelencoder_x_3.fit_transform(x[:, 3])
onehotencoder_1 = OneHotEncoder(categorical_features = [1])
x = onehotencoder_1.fit_transform(x).toarray()
onehotencoder_2 = OneHotEncoder(categorical_features = [4])
x = onehotencoder_2.fit_transform(x).toarray()
onehotencoder_3 = OneHotEncoder(categorical_features = [70])
x = onehotencoder_3.fit_transform(x).toarray()
labelencoder_y = LabelEncoder()
y = labelencoder_y.fit_transform(y)
#splitting the dataset into the training set and test set
from sklearn.cross_validation import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.3, random_state = 0)
#feature scaling
from sklearn.preprocessing import StandardScaler
sc_x = StandardScaler()
x_train = sc_x.fit_transform(x_train)
x_test = sc_x.transform(x_test)
# Importing the Keras libraries and packages
import keras
from keras.models import Sequential
from keras.layers import Dense
# Initialising the ANN
classifier = Sequential()
# Adding the input layer and the first hidden layer
classifier.add(Dense(output_dim = 60, init = 'uniform', activation = 'relu', input_dim = 118))
#Adding a second hidden layer
classifier.add(Dense(output_dim = 60, init = 'uniform', activation = 'relu'))
#Adding a third hidden layer
classifier.add(Dense(output_dim = 60, init = 'uniform', activation = 'relu'))
# Adding the output layer
classifier.add(Dense(output_dim = 1, init = 'uniform', activation = 'sigmoid'))
# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
# Fitting the ANN to the Training set
classifier.fit(x_train, y_train, batch_size = 10, nb_epoch = 20)
# Predicting the Test set results
y_pred = classifier.predict(x_test)
y_pred = (y_pred > 0.5)
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
#the performance of the classification model
print("the Accuracy is: "+ str((cm[0,0]+cm[1,1])/(cm[0,0]+cm[0,1]+cm[1,0]+cm[1,1])))
recall = cm[1,1]/(cm[0,1]+cm[1,1])
print("Recall is : "+ str(recall))
print("False Positive rate: "+ str(cm[1,0]/(cm[0,0]+cm[1,0])))
precision = cm[1,1]/(cm[1,0]+cm[1,1])
print("Precision is: "+ str(precision))
print("F-measure is: "+ str(2*((precision*recall)/(precision+recall))))
from math import log
print("Entropy is: "+ str(-precision*log(precision)))
But i have got error like this picture below
I hope someone can help me to fix this code, or by give me the new code, so i can use it to modeling KDD Cup 99 Dataset easily. I really need your help to finish my task.
Thank you.

Why am I getting "1" as the class predicted all the time?

I have this CSV file, where I'm trying to predict the Histology based on the data in the other rows.
I have the code shown below to do that. However, I'm getting all the predictions as 1. Why is that? Although the accuracy I get after training the model is 86.81%.
import numpy as np
import pandas as pd
from keras.layers import Dense, Dropout, BatchNormalization, Activation
import keras.models as md
import keras.layers.core as core
import keras.utils.np_utils as kutils
import keras.layers.convolutional as conv
from keras.layers import MaxPool2D
from subprocess import check_output
dataset = pd.read_csv('mutation-train.csv')
dataset = dataset[['CDS_Mutation',
'Primary_Tissue',
'Genomic',
'Gene_ID',
'Official_Symbol',
'Histology']]
X = dataset.iloc[:,0:5].values
y = dataset.iloc[:,5].values
# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_0 = LabelEncoder()
X[:, 0] = labelencoder_X_0.fit_transform(X[:, 0])
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
labelencoder_X_2= LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])
labelencoder_X_4= LabelEncoder()
X[:, 4] = labelencoder_X_4.fit_transform(X[:, 4])
X = X.astype(float)
labelencoder_y= LabelEncoder()
y = labelencoder_y.fit_transform(y)
onehotencoder0 = OneHotEncoder(categorical_features = [0])
X = onehotencoder0.fit_transform(X).toarray()
X = X[:,0:]
onehotencoder1 = OneHotEncoder(categorical_features = [1])
X = onehotencoder1.fit_transform(X).toarray()
X = X[:,0:]
onehotencoder2 = OneHotEncoder(categorical_features = [2])
X = onehotencoder2.fit_transform(X).toarray()
X = X[:,0:]
onehotencoder4 = OneHotEncoder(categorical_features = [4])
X = onehotencoder4.fit_transform(X).toarray()
X = X[:,0:]
# Splitting the dataset training and test sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2)
# Feature scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Evaluating the ANN
from sklearn.model_selection import cross_val_score
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
model=Sequential()
model.add(Dense(32, activation = 'relu', input_shape=(X.shape[1],)))
model.add(Dense(16, activation = 'relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ["accuracy"])
# Compile model
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
# Fit the model
model.fit(X,y, epochs=3, batch_size=1)
# Evaluate the model
scores = model.evaluate(X,y)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
# Calculate predictions
predictions = model.predict(X)
prediction = pd.DataFrame(predictions,columns=['predictions']).to_csv('prediction.csv')
Thanks.

As you are getting 86.81% accuracy where all the values are 1, it seems like your data is imbalanced it means in your training dataset one of the class has overpowered the other one.
So even if your predict 1 for all the test-data, you will get higher accuracy.
Refer Accuracy paradox
Eg. In your dataset, around 85% data samples are of class 1 and remaining of class 0.
How to deal with it
There are plenty of ways to deal with it.
Upsampling: Create duplicate data for class 0 so both class 1 and class 0 will be in same proportion.
Downsampling: Just remove some of the samples from class 1 to get same proprtion.
change Performance matrix: Rather than using accuracy as performance matrix use,
F1 score, precision or recall
You can assign different penalties to different classes on making a mistake. In this case you give high weightage to class which has low data.
And there more ways to deal with it.
Refer this link for more details.

Why am i getting AttributeError: 'KerasClassifier' object has no attribute 'model'?

This is the code and I'm getting the error in the last line only which is y_pred = classifier.predict(X_test). The error I'm getting is AttributeError: 'KerasClassifier' object has no attribute 'model'
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn import datasets
from sklearn import preprocessing
from keras.utils import np_utils
# Importing the dataset
dataset = pd.read_csv('Data1.csv',encoding = "cp1252")
X = dataset.iloc[:, 1:-1].values
y = dataset.iloc[:, -1].values
# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_0 = LabelEncoder()
X[:, 0] = labelencoder_X_0.fit_transform(X[:, 0])
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])
labelencoder_X_3 = LabelEncoder()
X[:, 3] = labelencoder_X_3.fit_transform(X[:, 3])
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:]
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Creating the ANN!
# Importing the Keras libraries and packages
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
def build_classifier():
# Initialising the ANN
classifier = Sequential()
# Adding the input layer and the first hidden layer
classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 10))
classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu'))
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
return classifier
classifier = KerasClassifier(build_fn = build_classifier, batch_size = 10, epochs = 2)
accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 1, n_jobs=1)
mean = accuracies.mean()
variance = accuracies.std()
# Predicting the Test set results
import sklearn
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
# Predicting new observations
test = pd.read_csv('test.csv',encoding = "cp1252")
test = test.iloc[:, 1:].values
test[:, 0] = labelencoder_X_0.transform(test[:, 0])
test[:, 1] = labelencoder_X_1.transform(test[:, 1])
test[:, 2] = labelencoder_X_2.transform(test[:, 2])
test[:, 3] = labelencoder_X_3.transform(test[:, 3])
test = onehotencoder.transform(test).toarray()
test = test[:, 1:]
new_prediction = classifier.predict_classes(sc.transform(test))
new_prediction1 = (new_prediction > 0.5)

Because you haven't fitted the classifier yet. For classifier to have the model variable available, you need to call
classifier.fit(X_train, y_train)
Although you have used cross_val_score() over the classifier, and found out accuracies, but the main point to note here is that the cross_val_score will clone the supplied model and use them for cross-validation folds. So your original estimator classifier is untouched and untrained.
You can see the working of cross_val_score in my other answer here
So put the above mentioned line just above y_pred = classifier.predict(X_test) line and you are all set. Hope this makes it clear.

You get the error because you didn´t actually train the returned model from KerasClassifier which is a Scikit-learn Wrapper to make use of Scikit-learn functions.
You could for example do a GridSearch (as you might know since the code seems to be from the Udemy ML/DL course):
def build_classifier(optimizer):
classifier = Sequential()
classifier.add(Dense(units = 6, kernel_initializer = 'uniform',
activation = 'relu', input_dim = 11))
classifier.add(Dense(units = 6, kernel_initializer = 'uniform',
activation = 'relu'))
classifier.add(Dense(units = 1, kernel_initializer = 'uniform',
activation = 'sigmoid'))
classifier.compile(optimizer = optimizer, loss =
'binary_crossentropy', metrics = ['accuracy'])
return classifier
classifier = KerasClassifier(build_fn = build_classifier)
parameters = {'batch_size': [25, 32],
'epochs': [100, 500],
'optimizer': ['adam', 'rmsprop']}
grid_search = GridSearchCV(estimator = classifier,
param_grid = parameters,
scoring = 'accuracy',
cv = 10)
grid_search = grid_search.fit(X_train, y_train)
If you don´t need Scikit-learn functionality I suggest to avoid the wrapper and simply build your model with:
model = Sequential()
model.add(Dense(32, input_dim=784))
model.add(Activation('relu'))
…
and then train with:
model.fit( … )

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

problem fiting my dataset into my model python - python

Well error message is pretty clear. The loss function should be binary_crossentropy not binary_crssentropy

Related

Do I have to preproces new data again to predict the model?

Error with udemy Deep learning code of ANN

Modeling using Neural Network

Why am I getting "1" as the class predicted all the time?

Why am i getting AttributeError: 'KerasClassifier' object has no attribute 'model'?

Categories

Resources