'tuple' object has no attribute 'data' - python

I'm practicing about classifications. I could see iris has both attribute when I printed it. But I still getting same error.
from sklearn import datasets
from sklearn.metrics import classification_report
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
iris = datasets.load_iris(),
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
nb_classifier = GaussianNB()
nb_classifier.fit(X_train, y_train)
y_pred = nb_classifier.predict(X_test)
print(classification_report(y_test, y_pred))

Remove the , at the end of
iris = datasets.load_iris(),
Otherwise iris becomes a tuple with one element. Your line is identical to:
iris = (datasets.load_iris(),)

Related

ValueError: Unknown label type:

I am just Starting this naive Bayes Theoram and Getting stuck by this please tell me what to do,
this is my code ::
import sklearn
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import accuracy_score
df = (pd.read_csv("C:\\Users\\dhana\\Downloads\\Mechanics\\train.csv"))
X = []
Y = []
for i in df.Age:
X.append(str(i))
for j in df.Fare:
Y.append(j)
X_train, X_test, y_train, y_test = train_test_split(X,Y,test_size=0.3,random_state= 17)
model = GaussianNB()
model.fit(X_train,y_train)
print(model)
y_expect = y_test
y_pred = model.predict(X_test)
print(accuracy_score(y_expect,y_pred))```

Why Score from LinearRegression is giving different result than r2_score from sklearn.metrics

Ideally I should get same result as score is nothing but R-Square. But not sure why results are coming different.
from sklearn.datasets import california_housing
data = california_housing.fetch_california_housing()
data.data.shape
data.feature_names
data.target_names
import pandas as pd
house_data = pd.DataFrame(data.data, columns=data.feature_names)
house_data.describe()
house_data['Price'] = data.target
X = house_data.iloc[:, 0:8].values
y = house_data.iloc[:, -1].values
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.33, random_state = 0)
# Fitting Simple Linear Regression to the Training set
from sklearn.linear_model import LinearRegression
linear_model = LinearRegression()
linear_model.fit(X_train, y_train)
#Check R-square on training data
from sklearn.metrics import mean_squared_error, r2_score
y_pred = linear_model.predict(X_test)
print(linear_model.score(X_test, y_test))
print(r2_score(y_pred, y_test))
Output
0.5957643114594776
0.34460597952465033
from the docs: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html
sklearn.metrics.r2_score(y_true, y_pred,...)
You are passing y_true and y_pred the wrong way around. If you switch them you get the correct result.
print(linear_model.score(X_test, y_test))
print(r2_score(y_test, y_pred))
0.5957643114594777
0.5957643114594777

Standardized data of SVM - Scikit-learn/ Python

This is for an assignment where the SVM methods has to be used for model accuracy.
There were 3 parts, wrote the below code
import sklearn.datasets as datasets
import sklearn.model_selection as ms
from sklearn.model_selection import train_test_split
digits = datasets.load_digits();
X = digits.data
y = digits.target
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=30, stratify=y)
print(X_train.shape)
print(X_test.shape)
from sklearn.svm import SVC
svm_clf = SVC().fit(X_train, y_train)
print(svm_clf.score(X_test,y_test))
But after this, the question is as below
Perform Standardization of digits.data and store the transformed data
in variable digits_standardized.
Hint : Use required utility from sklearn.preprocessing. Once again,
split digits_standardized into two sets names X_train and X_test.
Also, split digits.target into two sets Y_train and Y_test.
Hint: Use train_test_split method from sklearn.model_selection; set
random_state to 30; and perform stratified sampling. Build another SVM
classifier from X_train set and Y_train labels, with default
parameters. Name the model as svm_clf2.
Evaluate the model accuracy on testing data set and print it's score.
On top of the above code, tried writing this, but seems to be failing. Can anyone help on how the data can be standardized.
std_scale = preprocessing.StandardScaler().fit(X_train)
X_train_std = std_scale.transform(X_train)
X_test_std = std_scale.transform(X_test)
svm_clf2 = SVC().fit(X_train, y_train)
print(svm_clf.score(X_test,y_test))
Tried the below. Seems to be working.
import sklearn.datasets as datasets
import sklearn.model_selection as ms
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
digits = datasets.load_digits();
X = digits.data
scaler = StandardScaler()
scaler.fit(X)
digits_standardized = scaler.transform(X)
y = digits.target
X_train, X_test, y_train, y_test = train_test_split(digits_standardized, y, random_state=30, stratify=y)
#print(X_train.shape)
#print(X_test.shape)
from sklearn.svm import SVC
svm_clf2 = SVC().fit(X_train, y_train)
print("Accuracy ",svm_clf2.score(X_test,y_test))
Try this as final code includes all Tasks
import sklearn.datasets as datasets
import sklearn.model_selection as ms
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
digits = datasets.load_digits()
X = digits.data
y = digits.target
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=30, stratify=y)
print(X_train.shape)
print(X_test.shape)
svm_clf = SVC().fit(X_train, y_train)
print(svm_clf.score(X_test,y_test))
scaler = StandardScaler()
scaler.fit(X)
digits_standardized = scaler.transform(X)
X_train, X_test, y_train, y_test = train_test_split(digits_standardized, y, random_state=30, stratify=y)
svm_clf2 = SVC().fit(X_train, y_train)
print(svm_clf2.score(X_test,y_test))

How to change from normal machine learning technique to cross validation?

from sklearn.svm import LinearSVC
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.metrics import accuracy_score
X = data['Review']
y = data['Category']
tfidf = TfidfVectorizer(ngram_range=(1,1))
classifier = LinearSVC()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)
clf = Pipeline([
('tfidf', tfidf),
('clf', classifier)
])
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred))
accuracy_score(y_test, y_pred)
This is the code to train a model and prediction. I need to know my model performance. so where should I change to become cross_val_score?
use this:(it is an example from my previous project)
import numpy as np
from sklearn.model_selection import KFold, cross_val_score
kfolds = KFold(n_splits=5, shuffle=True, random_state=42)
def cv_f1(model, X, y):
score = np.mean(cross_val_score(model, X, y,
scoring="f1",
cv=kfolds))
return (score)
model = ....
score_f1 = cv_f1(model, X_train, y_train)
you can have multiple scoring. you should just change scoring="f1".
if you want to see score for each fold just remove np.mean
from sklearn documentation
The simplest way to use cross-validation is to call the cross_val_score helper function on the estimator and the dataset.
In your case it will be
from sklearn.model_selection import cross_val_score
scores = cross_val_score(clf, X_train, y_train, cv=5)
print(scores)

Cross validation and standaridization in skitlearn

I would like to find the accuracy of a sklearn classifier with K-cross validation. I can estimate the accuracy normally without cross-validation. However, how can I improve this code to do cross validation and apply a StandardScaler at the same time?
from sklearn.datasets import load_iris
from sklearn.cross_validation import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn import metrics
from sklearn.cross_validation import cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn import svm
from sklearn.pipeline import Pipeline
iris = load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=4)
pipe_lrSVC = Pipeline([('scaler', StandardScaler()), ('clf', svm.LinearSVC())])
pipe_lrSVC.fit(X_train, y_train)
y_pred = pipe_lrSVC.predict(X_test)
print(metrics.accuracy_score(y_test, y_pred))
Simply use the pipeline as the estimator input to cross_val_score:
cross_val_score(pipe_lrSVC, iris.data, iris.target, cv=5)

Categories

Resources