Nonsensical Confusion Matrix for ANN

Nonsensical Confusion Matrix for ANN - python

I have used the following methods to attempt to create an ANN model. However, my classification matrix (at the bottom) strongly indicates something has gone amiss. However, I am not sure where the problem started and why.
The dataset I used was split as such:
X = df.drop('Recurrence', axis = 1)
y = df['Recurrence']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 0)
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)
This was the train/test method:
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
# Set seed for reproducibility
SEED = 1
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
clf = MLPClassifier(hidden_layer_sizes=(100), activation='logistic', solver='lbfgs', learning_rate= 'adaptive',
random_state=SEED, max_iter=200).fit(X_train, y_train)
classificationSummary(y_train, clf.predict(X_train))
classificationSummary(y_test, clf.predict(X_test))
The following classification summary was given
Confusion Matrix (Accuracy 1.0000)
Prediction
Actual 0
0 1
Confusion Matrix (Accuracy 0.0000)
Prediction
Actual 0 1
0 0 1
1 0 0

Related

how to input the model into the KNN classification algorithm?

I want to make image clasification using KNN. i use https://pythonprogramming.net/loading-custom-data-deep-learning-python-tensorflow-keras/ to make a model. i have 20 image which 10 image in dog category and 10 image in cat category. I'm having trouble entering the model into the KNN algorithm,there is a problem in my coding. this is my code:
knn_model=KNeighborsClassifier(n_neighbors=3) #define K=3
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state=0)
predict_knn=knn_model.predict(X_test)
print(predict_knn)
there is an error : found input variables with inconsistent numbers of samples: [60, 20]
I need your opinion how to fix this code. thank you.

The problem could be due to the inconsistent sample size of X and y.
1. len(y) == 20
# Works
import numpy as np
from sklearn.model_selection import train_test_split
X, y = np.arange(20*32*32*3).reshape((20, 32, 32, 3)), list(range(20))
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state=0)
2. len(y) == 60
# Does not work
X, y = np.arange(20*32*32*3).reshape((20, 32, 32, 3)), list(range(60))
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state=0)
The second script produces the below error.

How do I return the result of each cross validation prediction

I have a task that requires me to analyse a model but I need the output predictions for each cross validation step- and the data that the cross validation used in that step.
Here is my code:
results= cross_validate(MLPClassifier, X_train, y_train, cv=5,return_estimator = True)
Which did not work. Also,
results= cross_val_predict(MLPClassifier, X_train, y_train, cv=5)
Neither worked, however the second method gave me the a set of predictions that are the shape of y_train (labels). However I expected a smaller value to be returned say 10% the size of y_train.
Also I'm unsure how to obtain the data used for each cross validation step.

How about using one of the Cross Validation iterators?
from sklearn.datasets import make_classification
from sklearn.model_selection import ShuffleSplit
from sklearn.neural_network import MLPClassifier
X, y = make_classification(n_samples=1000, random_state=0)
datasets = {} # [(X_train, y_train), (X_test, y_test)]
results = {}
ss = ShuffleSplit(n_splits=5, test_size=0.25, random_state=0)
for idx, (train_index, test_index) in enumerate(ss.split(X)):
X_train, y_train = X[train_index], y[train_index]
X_test, y_test = X[test_index], y[test_index]
datasets[f"train_{idx}"] = X_train, y_train
datasets[f"test_{idx}"] = X_test, y_test
model = MLPClassifier(random_state=0).fit(X_train, y_train)
results[f"accuracy_{idx}"] = model.score(X_test, y_test)
results
Output:
{'accuracy_0': 0.968,
'accuracy_1': 0.924,
'accuracy_2': 0.94,
'accuracy_3': 0.944,
'accuracy_4': 0.964}

How to fetch R2 for each target in sklearn MultiOutputRegressor() rather than the overall R2?

For example, Xs has 5 independent variables, and Ys has 5 dependent variables:
x_train, x_test, y_train, y_test = train_test_split(Xs, Ys, test_size=0.2, random_state=2)
model = lgb.LGBMRegressor()
wrapper = MultiOutputRegressor(model)
model.fit(x_train, y_train)
model.score(x_test, y_test)
Could only get the overall R2 through the code above, what if I want to check the R2 for each Y?
Is it possible?
Thanks

You can use scikit-learn r2_score with multioutput='raw_values':
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.multioutput import MultiOutputRegressor
from sklearn.metrics import r2_score
import lightgbm as lgb
# generate the data
X, Y = make_regression(n_targets=5, n_features=10, n_samples=1000, random_state=42)
# split the data
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)
# instantiate the model
model = MultiOutputRegressor(estimator=lgb.LGBMRegressor())
# fit the model
model.fit(X_train, Y_train)
# generate the model predictions
Y_pred = model.predict(X_test)
# calculate the individual R2's
print(r2_score(Y_test, Y_pred, multioutput='raw_values'))
# [0.907924 0.925267 0.906492 0.939653 0.881619]
print([r2_score(Y_test[:, i], Y_pred[:, i]) for i in range(Y_test.shape[1])])
# [0.907924, 0.925267, 0.906492, 0.939653, 0.881619]
# calculate the overall R2
print(model.score(X_test, Y_test))
# 0.9121908184618046
print(r2_score(Y_test, Y_pred, multioutput='uniform_average'))
# 0.9121908184618046

Build SVMs with different kernels (RBF)

I have this in python
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, roc_auc_score
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)
# The gamma parameter is the kernel coefficient for kernels rbf/poly/sigmoid
svm = SVC(gamma='auto', probability=True)
svm.fit(X_train,y_train.values.ravel())
prediction = svm.predict(X_test)
prediction_prob = svm.predict_proba(X_test)
print('Accuracy:', accuracy_score(y_test,prediction))
print('AUC:',roc_auc_score(y_test,prediction_prob[:,1]))
print(X_train)
print(y_train)
Now I want to build this with a different kernel rbf and store the values into arrays.
so something like this
def svm_grid_search(parameters, cv):
# Store the outcome of the folds in these lists
means = []
stds = []
params = []
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)
for parameter in parameters:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)
# The gamma parameter is the kernel coefficient for kernels rbf/poly/sigmoid
svm = SVC(gamma=1,kernel ='rbf',probability=True)
svm.fit(X_train,y_train.values.ravel())
prediction = svm.predict(X_test)
prediction_prob = svm.predict_proba(X_test)
return means, stddevs, params
I know I want to loop around the parameters and then store the values into the lists.
But I struggle how to do so ...
So what I try to do is to loop and then store the results of the SVM in the arrays with the
kernel = parameter
I would be very thankful if you could help me out here.

This is what GridSearchCV is for. Link here
See here for an example

How to compute "y_train_true, y_train_prob, y_test_true, y_test_prob"?

I have computed X_train, X_test, y_train, y_test. But I can not compute y_train_true, y_train_prob, y_test_true, y_test_prob.
How can I compute y_train_true, y_train_prob, y_test_true, y_test_prob from the following code ?
X_train:
X_test:
y_train:
y_test:
N.B,
y_train_true: True binary labels of 0 or 1 in the training dataset
y_train_prob: Probability in range {0,1} predicted by the model for the training dataset
y_test_true: True binary labels of 0 or 1 in the testing dataset
y_test_prob: Probability in range {0,1} predicted by the model for the testing dataset
Code :
# Split test and train data
import numpy as np
from sklearn.model_selection import train_test_split
X = np.array(dataset.ix[:, 1:10])
y = np.array(dataset['benign_malignant'])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
#Define Classifier and ====
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
# knn = KNeighborsClassifier(n_neighbors=11)
knn.fit(X_train, y_train)
# Predicting the Test set results
y_pred = knn.predict(X_train)

Well in your case y_train and y_test is already y_train_true and y_test_true. To get y_train_prob and y_test_prob, you need to take a model. I don't know which dataset you're using but it seems to be a binary classification problem so that you could use logistic regression to do this so,
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
knn.fit(X_train, y_train)
y_train_prob = knn.predict_proba(X_train)
y_test_prob = knn.predict_proba(X_test)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Nonsensical Confusion Matrix for ANN - python

Related

how to input the model into the KNN classification algorithm?

How do I return the result of each cross validation prediction

How to fetch R2 for each target in sklearn MultiOutputRegressor() rather than the overall R2?

Build SVMs with different kernels (RBF)

How to compute "y_train_true, y_train_prob, y_test_true, y_test_prob"?

Categories

Resources