how to input the model into the KNN classification algorithm? - python

I want to make image clasification using KNN. i use https://pythonprogramming.net/loading-custom-data-deep-learning-python-tensorflow-keras/ to make a model. i have 20 image which 10 image in dog category and 10 image in cat category. I'm having trouble entering the model into the KNN algorithm,there is a problem in my coding. this is my code:
knn_model=KNeighborsClassifier(n_neighbors=3) #define K=3
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state=0)
predict_knn=knn_model.predict(X_test)
print(predict_knn)
there is an error : found input variables with inconsistent numbers of samples: [60, 20]
I need your opinion how to fix this code. thank you.

The problem could be due to the inconsistent sample size of X and y.
1. len(y) == 20
# Works
import numpy as np
from sklearn.model_selection import train_test_split
X, y = np.arange(20*32*32*3).reshape((20, 32, 32, 3)), list(range(20))
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state=0)
2. len(y) == 60
# Does not work
X, y = np.arange(20*32*32*3).reshape((20, 32, 32, 3)), list(range(60))
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state=0)
The second script produces the below error.

Related

Nonsensical Confusion Matrix for ANN

I have used the following methods to attempt to create an ANN model. However, my classification matrix (at the bottom) strongly indicates something has gone amiss. However, I am not sure where the problem started and why.
The dataset I used was split as such:
X = df.drop('Recurrence', axis = 1)
y = df['Recurrence']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 0)
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)
This was the train/test method:
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
# Set seed for reproducibility
SEED = 1
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
clf = MLPClassifier(hidden_layer_sizes=(100), activation='logistic', solver='lbfgs', learning_rate= 'adaptive',
random_state=SEED, max_iter=200).fit(X_train, y_train)
classificationSummary(y_train, clf.predict(X_train))
classificationSummary(y_test, clf.predict(X_test))
The following classification summary was given
Confusion Matrix (Accuracy 1.0000)
Prediction
Actual 0
0 1
Confusion Matrix (Accuracy 0.0000)
Prediction
Actual 0 1
0 0 1
1 0 0

How to solve "ValueError: y should be a 1d array, got an array of shape (3, 5) instead." for naive Bayes?

from sklearn.model_selection import train_test_split
X = data.drop('Vickers Hardness\n(HV0.5)', axis=1)
y = data['Vickers Hardness\n(HV0.5)']
X_train, y_train, X_test, y_test = train_test_split(X, y, test_size = 0.3)
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred = gnb.predict(X_test)
ValueError: y should be a 1d array, got an array of shape (3, 5) instead.
Used data:
How to rectify this error in naive bayes? how can I put y in 1D array?
The assignments of the train/test split are not ordered right, use:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)

How to solve sklearn error: "Found input variables with inconsistent numbers of samples"?

I have a challenge using the sklearn 70-30 division. I receive an error on line:
X_train, X_test, y_train, y_test = train_test_split(X_smote, y_smote, test_size=0.3, stratify=y)
The error is:
Found input variables with inconsistent numbers of samples
Context
from imblearn.over_sampling import SMOTE
sm = SMOTE(k_neighbors = 1)
X = data.drop('cluster',axis=1)
y = data['cluster']
X_smote, y_smote= sm.fit_sample(X,y)
data_bal = pd.DataFrame(columns=X.columns.values, data=X_smote)
data_bal['cluster']=y_smote
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_smote, y_smote, test_size=0.3, stratify=y)
y_train.value_counts().plot(kind='bar')
Edit
I solve the error, I just had to put the stratify=y in stratify=y_smote
Just an observation in your line of code:
X_train, X_test, y_train, y_test = train_test_split(X_smote, y_smote, test_size=0.3, stratify=y)
The error thrown typically is a result of some input value that is expected to have a particular dimension or length that is consistent with other input values.
Check the length and/or dimensions of X_smote, y_smote and y to see if they are all as expected.
I got the same Issue but when I changed
x_train,y_train,x_test,y_test = train_test_split(x,y,test_size=0.25,random_state=42)
to
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.25,random_state=42)
my error got removed.

How do I properly fit a sci-kit learn model using a pandas dataframe?

I am trying to create a machine learning program in sci-kit learn. I am using a CSV file to store data, and have decided to use Pandas data frame to import and format this data. I cannot figure out how to fit this data frame with the model.
My CSV file has one feature, age, and one target, weight. I am using a linear regression algorithm to predict the weight using the age. I do realize this isn't the best algorithm to use with this data.
When I run this code I get the error "ValueError: Found input variables with inconsistent numbers of samples: [10, 40]"
Here is my code:
# Imports
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
# Load And Split Data
data = pd.read_csv("awd.csv")
feature_cols = ['Ages']
X = data.loc[:, feature_cols]
y = data.loc[:, "Weights"]
X_train, y_train, X_test, y_test = train_test_split(X, y, random_state=0, train_size=0.2)
# Train Model
lr = LinearRegression()
lr.fit(X_train, y_train)
# Scores
print(f"Test set score: {round(lr.score(X_test, y_test), 3)}")
print(f"Training set score: {round(lr.score(X_train, y_train), 3)}")
The first 5 lines of my CSV file:
Ages,Weights
1,19
1,21
2,26
2,32
You're assigning the return values incorrectly. See below:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, train_size=0.2)
You should correct the order of X_train, X_test, y_train and y_test like this:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
See the relevant documentation for details.

matlab equivalent of python sklearn train_test_split function?

How can I get in matlab the equivalent of the python code
x_train, x_test, y_train, y_test = sk.cross_validation.train_test_split(X,y)
The train and test dataset should be randomly sampled because I will repeat this procedure more times to perform bootstrap.
Say you have 150 samples that you want to split into 100 samples for training and 50 samples for testing. You could just do:
Python:
import numpy as np
idx = np.random.permutation(range(len(y)))
X_train, y_train = X[idx[:100]], y[idx[:100]]
X_test, y_test = X[idx[100:]], y[idx[100:]]
MATLAB/Octave:
idx = randperm(length(y))
X_train, y_train = X(idx(1:100)), y(idx(1:100))
X_test, y_test = X(idx(100:150)), y(idx(100:150))

Categories

Resources