I don't know how to fix this problem, can anyone explain me?
Im truying to get best precision_score in loop, by changing the parameter of DecisionTreeClassifier
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import precision_score
from sklearn.model_selection import train_test_split
df = pd.read_csv('songs.csv')
X = df.drop(['song','artist','genre','lyrics'],axis=1)
y = df.artist
X_train,X_test,y_train,y_test = train_test_split(X,y)
scores_data = pd.DataFrame()
for depth in range(1,100):
clf = DecisionTreeClassifier(max_depth=depth,criterion='entropy').fit(X_train,y_train)
train_score = clf.score(X_train,y_train)
test_score = clf.score(X_test,y_test)
preds = clf.predict(X_test)
precision_score = precision_score(y_test,preds,average='micro')
temp_scores = pd.DataFrame({'depth':[depth],
'test_score':[test_score],
'train_score':[train_score],
'precision_score:':[precision_score]})
scores_data = scores_data.append(temp_scores)
This is my error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-50-f4a4eaa48ce6> in <module>
17 test_score = clf.score(X_test,y_test)
18 preds = clf.predict(X_test)
---> 19 precision_score = precision_score(y_test,preds,average='micro')
20
21 temp_scores = pd.DataFrame({'depth':[depth],
**TypeError: 'numpy.float64' object is not callable**
This is the dataset
Your last lines in the cycle:
precision_score = precision_score(y_test,preds,average='micro')
temp_scores = pd.DataFrame({'depth':[depth],
'test_score':[test_score],
'train_score':[train_score],
'precision_score:':[precision_score]})
scores_data = scores_data.append(temp_scores)
should be changed to:
precision_score_ = precision_score(y_test,preds,average='micro')
temp_scores = pd.DataFrame({'depth':[depth],
'test_score':[test_score],
'train_score':[train_score],
'precision_score:':[precision_score_]})
scores_data = scores_data.append(temp_scores)
You're defining precision_score as numpy array and then calling it (next cycle) as if being a function.
Related
I have a problem trying to use KNN
I'm applying the training and tests fit and I'm getting this error:
ValueError: Found input variables with inconsistent numbers of samples: [4482, 2015]
Full error:
The problem is that the dataframe has already been treated and is without any problem
X_treino and y_treino shape:
Im going to put here all the code sequence I made and that is giving an error at the end:
X = wines_class.drop(['color'], axis=1)
y = wines_class['color']
from sklearn.model_selection import train_test_split
X_treino, y_treino, X_teste, y_teste = train_test_split(X, y, test_size=0.31, random_state=0)
print(X.shape,y.shape)
(6497, 12) (6497,)
from sklearn.metrics import roc_auc_score
from sklearn.metrics import confusion_matrix
def class_pontos(clf, y_predito):
acc_treino = clf.score(X_treino, y_treino)*100
acc_teste = clf.score(X_teste, y_teste)*100
roc = roc_auc_score(y_test, y_predito)*100
vn, fp, fn, vp = confusion_matrix(y_test, y_predito).ravel()
cm = confusion_matrix(y_teste, y_predito)
Correto = vp + vn
Incorreto = fp + fn
return acc_treino, acc_teste, roc, Correto, Incorreto, cm
#KNN
from sklearn.neighbors import KNeighborsClassifier
class_knn = KNeighborsClassifier()
class_knn.fit(X_treino, y_treino)
y_pred_knn = class_knn.predict(X_teste)
print(class_pontos(class_knn, y_pred_knn))
Im sharing the csv here in this drive
I am working on a Random Forest classification model using stratified k-fold cross validation. I want to plot the feature importance of each fold. My input data is in the form of numpy arrays, however I am unable to put the feature names in my code below. How can I structure this code so that I can pull feature names, so I can plot the built-in feature importance?
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, KFold, cross_validate, cross_val_score, StratifiedKFold, RandomizedSearchCV
from sklearn.metrics import classification_report, confusion_matrix, f1_score, mean_squared_error
import matplotlib.pyplot as plt
y_downsample = downsampled[['dependent_variable']].values
X_downsample = downsampled[['Feature1'
,'Feature2'
,'Feature3'
,'Feature4'
,'Feature5'
,'Feature6'
,'Feature7'
,'Feature8'
,'Feature9'
,'Feature10']].values
skf = StratifiedKFold(n_splits=10, shuffle=True, random_state=1)
f1_results = []
accuracy_results = []
precision_results = []
recall_results = []
feature_imp = []
for train_index, test_index in skf.split(X_downsample,y_downsample):
X_train, X_test = X_downsample[train_index], X_downsample[test_index]
y_train, y_test = y_downsample[train_index], y_downsample[test_index]
model = RandomForestClassifier(n_estimators = 100, random_state = 24)
model.fit(X_train, y_train.ravel())
y_pred = model.predict(X_test)
f1_results.append(metrics.f1_score(y_test, y_pred))
accuracy_results.append(metrics.accuracy_score(y_test, y_pred))
precision_results.append(metrics.precision_score(y_test, y_pred))
recall_results.append(metrics.recall_score(y_test, y_pred))
# plot
importances = pd.DataFrame({'FEATURE':pd.DataFrame(X_downsample.columns),'IMPORTANCE':np.round(model.feature_importances_,3)})
importances = importances.sort_values('IMPORTANCE',ascending=False).set_index('FEATURE')
importances.plot.bar()
plt.show()
print("Accuracy: ", np.mean(accuracy_results))
print("Precision: ", np.mean(precision_results))
print("Recall: ", np.mean(recall_results))
print("F1-score: ", np.mean(f1_results))
--------------------------------------------------------------------------- AttributeError Traceback (most recent call
> last) in
> 21
> 22 # plot
> ---> 23 importances = pd.DataFrame({'FEATURE':pd.DataFrame(X_downsample.columns),'IMPORTANCE':np.round(model.feature_importances_,3)})
> 24 importances = importances.sort_values('IMPORTANCE',ascending=False).set_index('FEATURE')
> 25
>
> AttributeError: 'numpy.ndarray' object has no attribute 'columns'
My program throws an error:
TypeError: 'DataFrame' object is not callable
I am using numpy and pandas with python 3.6. The error is encountered at line 15 identified with "**" below.
import pandas as pd
import numpy as np
import sklearn
from sklearn import linear_model
from sklearn.utils import shuffle
data = pd.read_csv("student-mat.csv", sep=";")
print("Starting data manipulation...")
data = data[["G1", "G2", "G3", "studytime", "failures", "absences"]]
predict = "G3"
x = np.array(data.drop([predict], 1))
y = np.array(data([predict]))
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y, test_size=0.1)
linear = linear_model.LinearRegression()
linear.fit(x_train, y_train)
acc = linear.score(x_test, y_test)
print("Accuracy: " + str(acc))
print("Coefficient: " + str(linear.coef_))
print("Intercept: " + str(linear.intercept_))
Change your line
y = np.array(data([predict]))
to
y = np.array(data[predict])
When you use () after any variable, python expects it to be a function and that's what error message is about
Use only [] to access column from any dataframe i.e. data["predict"]
I'm testing machine learning methods on a csv file with kickstarter project data. But even though I can get "accuracy score", I get the following error when I try to get "r2 score". What would be the reason?
import numpy as np
import pandas as pd
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import accuracy_score
from sklearn.metrics import r2_score
veri = pd.read_csv("kick_rev.csv")
veri = veri.drop(['id'], axis=1)
veri = veri.drop(['i'], axis=1)
y = np.array(veri['state_num'])
x = np.array(veri.drop(['state_num','usd_goal_real','deadline','launched','country'], axis=1))
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.33)
DTR = DecisionTreeRegressor()
DTR.fit(X_train,y_train)
ytahmin = DTR.predict(x)
DTR.fit(veri[['goal','pledged','backers','usd_pledged','usd_pledged_real','category_num','category_main_num','currency_num','country_num']],veri.state_num)
accuracy_score = DTR.score(X_test,y_test)
a = np.array([5000,94175.0,1,57763.8,6469.73,13,6,0,0]).reshape(1, -1)
predict_DTR = DTR.predict(a)
r2 = DTR.r2_score(X_test, y_test)
print(accuracy_score)
print(r2)
Error:
AttributeError: 'DecisionTreeRegressor' object has no attribute 'r2_score'
R2 Score is between predicted and actual value. So you can't use Train features and prediction for comparision
r2_score(y_pred, y_true)
You can use this link for more clarification
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html
I have a feature set Xtrain with dimensions (n_obs,n_features) and responses ytrain with dim (n_obs) . I am attempting to use KNN as a classifier.
from sklearn.neighbors import KNeighborsClassifier
neigh = KNeighborsClassifier()
clf = neigh(n_neighbors = 10)
clf.fit(Xtrain,ytrain)
I get error message:
TypeError
Traceback (most recent call last)
22 clf = neigh(n_neighbors = 10)
23 # Fit best model to data
24 clf.fit(Xtrain, ytrain)
TypeError: 'KNeighborsClassifier' object is not callable
Not sure what the problem is...any help appreciated.
Try:
clf = KNeighborsClassifier(n_neighbors = 10)
clf.fit(Xtrain,ytrain)
Classifier parameters go inside the constructor. You where trying to create a new object with an already instantiated classifier.
The following:
from sklearn.neighbors import KNeighborsClassifier
neigh = KNeighborsClassifier
clf = neigh(n_neighbors = 10)
clf.fit(Xtrain, ytrain)
would also work.