I have a feature set Xtrain with dimensions (n_obs,n_features) and responses ytrain with dim (n_obs) . I am attempting to use KNN as a classifier.
from sklearn.neighbors import KNeighborsClassifier
neigh = KNeighborsClassifier()
clf = neigh(n_neighbors = 10)
clf.fit(Xtrain,ytrain)
I get error message:
TypeError
Traceback (most recent call last)
22 clf = neigh(n_neighbors = 10)
23 # Fit best model to data
24 clf.fit(Xtrain, ytrain)
TypeError: 'KNeighborsClassifier' object is not callable
Not sure what the problem is...any help appreciated.
Try:
clf = KNeighborsClassifier(n_neighbors = 10)
clf.fit(Xtrain,ytrain)
Classifier parameters go inside the constructor. You where trying to create a new object with an already instantiated classifier.
The following:
from sklearn.neighbors import KNeighborsClassifier
neigh = KNeighborsClassifier
clf = neigh(n_neighbors = 10)
clf.fit(Xtrain, ytrain)
would also work.
Related
Trying to do SVR for multiple outputs. Started by hyper-parameter tuning which worked for me. Now I want to create the model using the optimum parameters but I am getting an error. How to fix this?
from sklearn.svm import SVR
from sklearn.model_selection import GridSearchCV
from sklearn.multioutput import MultiOutputRegressor
svr = SVR()
svr_regr = MultiOutputRegressor(svr)
from sklearn.model_selection import KFold
kfold_splitter = KFold(n_splits=6, random_state = 0,shuffle=True)
svr_gs = GridSearchCV(svr_regr,
param_grid = {'estimator__kernel': ('linear','poly','rbf','sigmoid'),
'estimator__C': [1,1.5,2,2.5,3,3.5,4,4.5,5,5.5,6,6.5,7,7.5,8,8.5,9,9.5,10],
'estimator__degree': [3,8],
'estimator__coef0': [0.01,0.1,0.5],
'estimator__gamma': ('auto','scale'),
'estimator__tol': [1e-3, 1e-4, 1e-5, 1e-6]},
cv=kfold_splitter,
n_jobs=-1,
scoring='r2')
svr_gs.fit(X_train, y_train)
print(svr_gs.best_params_)
#print(gs.best_score_)
Output:
{'estimator__C': 10, 'estimator__coef0': 0.01, 'estimator__degree': 3, 'estimator__gamma': 'auto', 'estimator__kernel': 'rbf', 'estimator__tol': 1e-06}
Trying to create a model using the output:
SVR_model = svr_regr (kernel='rbf',C=10,
coef0=0.01,degree=3,
gamma='auto',tol=1e-6,random_state=42)
SVR_model.fit(X_train, y_train)
SVR_model_y_predict = SVR_model.predict((X_test))
SVR_model_y_predict
Error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/var/folders/mm/r4gnnwl948zclfyx12w803040000gn/T/ipykernel_96269/769104914.py in <module>
----> 1 SVR_model = svr_regr (estimator__kernel='rbf',estimator__C=10,
2 estimator__coef0=0.01,estimator__degree=3,
3 estimator__gamma='auto',estimator__tol=1e-6,random_state=42)
4
5
TypeError: 'MultiOutputRegressor' object is not callable
Please consult the MultiOutputRegressor docs.
The regressor you got back is the model.
It is not a method, but it does offer
a bunch of fun methods that you can call,
such as .fit(), .predict(), and .score().
You are trying to specify kernel and a few
other parameters.
It appears you wanted to offer those
to SVR(), at the top of your code.
Here's the piece of the code:
from sklearn.model_selection import StratifiedKFold
from sklearn.linear_model import LogisticRegressionCV
skf = StratifiedKFold(n_splits=5)
skf_1 = skf.split(titanic_dataset, surv_titanic)
ls_1 = np.logspace(-1.0, 2.0, num=500)
clf = LogisticRegressionCV(Cs=ls_1, cv = skf_1, scoring = "roc_auc", n_jobs=-1, random_state=17)
clf_model = clf.fit(x_train, y_train)
This says:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-130-b99a5912ff5a> in <module>
----> 1 clf_model = clf.fit(x_train, y_train)
H:\Anaconda_3\lib\site-packages\sklearn\linear_model\_logistic.py in fit(self, X, y, sample_weight)
2098 # (n_classes, n_folds, n_Cs . n_l1_ratios) or
2099 # (1, n_folds, n_Cs . n_l1_ratios)
-> 2100 coefs_paths, Cs, scores, n_iter_ = zip(*fold_coefs_)
2101 self.Cs_ = Cs[0]
2102 if multi_class == 'multinomial':
ValueError: not enough values to unpack (expected 4, got 0)
The train and test datasets had been prepared before, and they behave nicely with other classifiers.
Such a generic error message tells me nothing. What is the problem here?
In short, the issue was that you passed the result of skf.split(titanic_dataset, surv_titanic) to the cv argument on LogisticRegressionCV when you needed to pass StratifiedKFold(n_splits=5) directly instead.
Below I show the code that reproduced your error, and below that I show two alternative methods that accomplish what I believe you were trying to do.
# Some example data
data = load_breast_cancer()
X = data['data']
y = data['target']
# Set up the stratifiedKFold
skf = StratifiedKFold(n_splits=5)
# Don't do this... only here to reproduce the error
skf_indicies = skf.split(X, y)
# Some regularization
ls_1 = np.logspace(-1.0, 2.0, num=5)
# This creates your error
clf_error = LogisticRegressionCV(Cs=ls_1,
cv = skf_indicies,
scoring = "roc_auc",
n_jobs=-1,
random_state=17)
# Error created by passing result of skf.split to cv
clf_model = clf_error.fit(X, y)
# This is probably what you meant to do
clf_using_skf = LogisticRegressionCV(Cs=ls_1,
cv = skf,
scoring = "roc_auc",
n_jobs=-1,
random_state=17,
max_iter=1_000)
# This will now fit without the error
clf_model_skf = clf_using_skf.fit(X, y)
# This is the easiest method, and from the docs also does the
# same thing as StratifiedKFold
# https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegressionCV.html
clf_easiest = LogisticRegressionCV(Cs=ls_1,
cv = 5,
scoring = "roc_auc",
n_jobs=-1,
random_state=17,
max_iter=1_000)
# This will now fit without the error
clf_model_easiest = clf_easiest.fit(X, y)
I don't know how to fix this problem, can anyone explain me?
Im truying to get best precision_score in loop, by changing the parameter of DecisionTreeClassifier
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import precision_score
from sklearn.model_selection import train_test_split
df = pd.read_csv('songs.csv')
X = df.drop(['song','artist','genre','lyrics'],axis=1)
y = df.artist
X_train,X_test,y_train,y_test = train_test_split(X,y)
scores_data = pd.DataFrame()
for depth in range(1,100):
clf = DecisionTreeClassifier(max_depth=depth,criterion='entropy').fit(X_train,y_train)
train_score = clf.score(X_train,y_train)
test_score = clf.score(X_test,y_test)
preds = clf.predict(X_test)
precision_score = precision_score(y_test,preds,average='micro')
temp_scores = pd.DataFrame({'depth':[depth],
'test_score':[test_score],
'train_score':[train_score],
'precision_score:':[precision_score]})
scores_data = scores_data.append(temp_scores)
This is my error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-50-f4a4eaa48ce6> in <module>
17 test_score = clf.score(X_test,y_test)
18 preds = clf.predict(X_test)
---> 19 precision_score = precision_score(y_test,preds,average='micro')
20
21 temp_scores = pd.DataFrame({'depth':[depth],
**TypeError: 'numpy.float64' object is not callable**
This is the dataset
Your last lines in the cycle:
precision_score = precision_score(y_test,preds,average='micro')
temp_scores = pd.DataFrame({'depth':[depth],
'test_score':[test_score],
'train_score':[train_score],
'precision_score:':[precision_score]})
scores_data = scores_data.append(temp_scores)
should be changed to:
precision_score_ = precision_score(y_test,preds,average='micro')
temp_scores = pd.DataFrame({'depth':[depth],
'test_score':[test_score],
'train_score':[train_score],
'precision_score:':[precision_score_]})
scores_data = scores_data.append(temp_scores)
You're defining precision_score as numpy array and then calling it (next cycle) as if being a function.
I am working on a Random Forest classification model using stratified k-fold cross validation. I want to plot the feature importance of each fold. My input data is in the form of numpy arrays, however I am unable to put the feature names in my code below. How can I structure this code so that I can pull feature names, so I can plot the built-in feature importance?
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, KFold, cross_validate, cross_val_score, StratifiedKFold, RandomizedSearchCV
from sklearn.metrics import classification_report, confusion_matrix, f1_score, mean_squared_error
import matplotlib.pyplot as plt
y_downsample = downsampled[['dependent_variable']].values
X_downsample = downsampled[['Feature1'
,'Feature2'
,'Feature3'
,'Feature4'
,'Feature5'
,'Feature6'
,'Feature7'
,'Feature8'
,'Feature9'
,'Feature10']].values
skf = StratifiedKFold(n_splits=10, shuffle=True, random_state=1)
f1_results = []
accuracy_results = []
precision_results = []
recall_results = []
feature_imp = []
for train_index, test_index in skf.split(X_downsample,y_downsample):
X_train, X_test = X_downsample[train_index], X_downsample[test_index]
y_train, y_test = y_downsample[train_index], y_downsample[test_index]
model = RandomForestClassifier(n_estimators = 100, random_state = 24)
model.fit(X_train, y_train.ravel())
y_pred = model.predict(X_test)
f1_results.append(metrics.f1_score(y_test, y_pred))
accuracy_results.append(metrics.accuracy_score(y_test, y_pred))
precision_results.append(metrics.precision_score(y_test, y_pred))
recall_results.append(metrics.recall_score(y_test, y_pred))
# plot
importances = pd.DataFrame({'FEATURE':pd.DataFrame(X_downsample.columns),'IMPORTANCE':np.round(model.feature_importances_,3)})
importances = importances.sort_values('IMPORTANCE',ascending=False).set_index('FEATURE')
importances.plot.bar()
plt.show()
print("Accuracy: ", np.mean(accuracy_results))
print("Precision: ", np.mean(precision_results))
print("Recall: ", np.mean(recall_results))
print("F1-score: ", np.mean(f1_results))
--------------------------------------------------------------------------- AttributeError Traceback (most recent call
> last) in
> 21
> 22 # plot
> ---> 23 importances = pd.DataFrame({'FEATURE':pd.DataFrame(X_downsample.columns),'IMPORTANCE':np.round(model.feature_importances_,3)})
> 24 importances = importances.sort_values('IMPORTANCE',ascending=False).set_index('FEATURE')
> 25
>
> AttributeError: 'numpy.ndarray' object has no attribute 'columns'
from sklearn import datasets
import numpy as np
# Assigning the petal length and petal width of the 150 flower samples to Matrix X
# Class labels of the flower to vector y
iris = datasets.load_iris()
X = iris.data[:, [2, 3]]
y = iris.target
print('Class labels:', np.unique(y))
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)
print('Labels counts in y:', np.bincount(y))
print('Labels counts in y_train:', np.bincount(y_train))
print ('Labels counts in y_test:', np.bincount(y_test))
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)
from sklearn.linear_model import Perceptron
ppn = Perceptron(n_iter=40, eta0=0.1, random_state=1)
ppn.fit(X_train_std, y_train)
y_pred = ppn.predict(X_test_std)
print('Misclassified samples: %d' % (y_test != y_pred).sum())
When I run I get this error message:
Traceback (most recent call last):
File "c:/Users/Desfios 5/Desktop/Python/Ch3.py", line 27, in <module>
ppn = Perceptron(n_iter=40, eta0=0.1, random_state=1)
File "C:\Users\Desfios 5\AppData\Roaming\Python\Python38\site-packages\sklearn\utils\validation.py", line 72, in inner_f
return f(**kwargs)
TypeError: __init__() got an unexpected keyword argument 'n_iter'
I've tried uninstalling and installing scikit-learn but that did not help. Any help?
I just change the n_iter to max_iter and it work for me
ppn = Perceptron(max_iter=40, eta0=0.3, random_state=0)
You receive this error
TypeError: init() got an unexpected keyword argument 'n_iter'
because the Perceptron has no parameter 'n_iter' you can use before fitting it.
You are trying to access the n_iter_ attribute, which is an "Estimated attribute" (you can tell by the underscore at the end) and only stored after the fit method has been called. Reference in Documentation
Before fitting, you can only access the n_iter_no_change parameter for n_iter.