Error while implementing Gaussian Regression. How to solve it? - python

I am running a Gaussian regression in Python. My data set has the shape of (10000,5). But when I try to fit the model I get an error:
AttributeError: 'list' object has no attribute 'n_dims'
How do I resolve this?
I initially thought this error is being caused as the dimension of my dependent variable might be different from the independent variable. But even after changing them to the same dimension, I am unable to find the problem with the code. Any help will be much appreciated.
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import (RBF, Matern, RationalQuadratic,
ExpSineSquared, DotProduct,
ConstantKernel)
data_set = pd.read_excel(r'XXXXX', sheet = 'Worksheet', header = 0)
data_set.head()
test_set = data_set
y = test_set.iloc[:,4]
test_set.drop(test_set.columns[4], axis = 1, inplace = True)
X = test_set
x=StandardScaler().fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=0)
y_train = np.array(y_train)
y_test = np.array(y_test)
y_train = np.reshape(y_train, (7000,1))
y_test = np.reshape(y_test, (3000,1))
kernels = [1.0 * RBF(length_scale=1.0, length_scale_bounds=(1e-1, 10.0))]
gp = GaussianProcessRegressor(kernel=kernels)
gp.fit(X_train, y_train)
File "<ipython-input-23-5a576449fdb6>", line 1, in <module>
gp.fit(X_train, y_train)
File "C:\Program Files\Anaconda\lib\site-packages\sklearn\gaussian_process\gpr.py", line 203, in fit
if self.optimizer is not None and self.kernel_.n_dims > 0:
AttributeError: 'list' object has no attribute 'n_dims'

When initializing the GaussianProcessRegressor(kernel=kernels) the argument passed as kernel has to be a kernel object. You are passing a list.
More information in the documentation here.

Related

ValueError: bad input shape (560, 5) sklearn

I am starting to write the learning machine model. I have a Y_train dataset containing the labels where there are 5 classes. The X_train dataset contains the samples. I try to make my model with the help of a logistic regression.
X_train ((560, 20531)) and Y_train ((560, 5)) have the same dimensions.
I have seen a few publications associated with the same problem but I have not been able to solve the problem.
I don't know how to correct this error,can you help me please ?
X = pd.read_csv('/Users/lottie/desktop/data.csv', header=None, skiprows=[0])
Y = pd.read_csv('/Users/lottie/desktop/labels.csv', header=None)
Y_encoded = list()
for i in Y.loc[0:,1] :
if i == 'BRCA' : Y_encoded.append(0)
if i == 'KIRC' : Y_encoded.append(1)
if i == 'COAD' : Y_encoded.append(2)
if i == 'LUAD' : Y_encoded.append(3)
if i == 'PRAD' : Y_encoded.append(4)
Y_bis = to_categorical(Y_encoded)
#separation of the data
X_train, X_test, Y_train, Y_test = train_test_split(X, Y_bis, test_size=0.30, random_state=42)
regression_log = linear_model.LogisticRegression(multi_class='multinomial', solver='newton-cg')
X_train=X_train.iloc[:,1:]
#train model
train_train = regression_log.fit(X_train, Y_train)
You get that error because your label is categorical. You need to use a label encoder to encode it into 0,1,2.. , check out help page from scikit-learn. Below would be an implementation using an example dataset similar to yours:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import linear_model
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
Y = pd.DataFrame({'label':np.random.choice(['BRCA','KIRC','COAD','LUAD','PRAD'],560)})
X = pd.DataFrame(np.random.normal(0,1,(560,5)))
Y_encoded = le.fit_transform(Y['label'])
X_train, X_test, Y_train, Y_test = train_test_split(X, Y_encoded, test_size=0.30, random_state=42)
regression_log = linear_model.LogisticRegression(multi_class='multinomial', solver='newton-cg')
X_train=X_train.iloc[:,1:]
train_train = regression_log.fit(X_train, Y_train)

TypeError: Cannot clone object '<>' (type <class ''>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' methods

I want to use votingClassifier or EnsembleVoteClassifier voting method with 3 different models but I have this error
I need your help to solve this problem!
import numpy as np
import matplotlib.pyplot as plt
from mlxtend.classifier import EnsembleVoteClassifier
from mlxtend.plotting import plot_decision_regions
# Initializing Classifiers
clf1 = modelvgg16
clf2 = AlexNetModel
clf3 = InceptionV3Model
for model in [clf1, clf2,clf3]:
model._estimator_type = "classifier"
#print(model._estimator_type)
eclf = EnsembleVoteClassifier(clfs=[clf1, clf2,clf3],weights=[2, 1, 1], voting='soft')
X, Y = training_set.next()
Y=np.zeros(X.shape[0]) # number of calsses is 38
print("X.shape =",X.shape) # X.shape = (128, 224, 224, 3)
print("Y.shape =",Y.shape) # Y.shape = (38,)
######################### Split train+test #######################################
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size = 0.20,random_state=2)
# Whole Wine Classifier
ensemble_model.fit(x_train, y_train)
y_pred = ensemble_model.predict(x_test)
from sklearn.metrics import accuracy_score
print("accueacy : ",accuracy_score(y_test,y_pred))
for more information see my project on this link:
my project
I got the same error when running this code:

TypeError: __init__() got an unexpected keyword argument 'n_iter'. Is this a problem with the way I installed scikit-learn?

from sklearn import datasets
import numpy as np
# Assigning the petal length and petal width of the 150 flower samples to Matrix X
# Class labels of the flower to vector y
iris = datasets.load_iris()
X = iris.data[:, [2, 3]]
y = iris.target
print('Class labels:', np.unique(y))
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)
print('Labels counts in y:', np.bincount(y))
print('Labels counts in y_train:', np.bincount(y_train))
print ('Labels counts in y_test:', np.bincount(y_test))
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
sc.fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)
from sklearn.linear_model import Perceptron
ppn = Perceptron(n_iter=40, eta0=0.1, random_state=1)
ppn.fit(X_train_std, y_train)
y_pred = ppn.predict(X_test_std)
print('Misclassified samples: %d' % (y_test != y_pred).sum())
When I run I get this error message:
Traceback (most recent call last):
File "c:/Users/Desfios 5/Desktop/Python/Ch3.py", line 27, in <module>
ppn = Perceptron(n_iter=40, eta0=0.1, random_state=1)
File "C:\Users\Desfios 5\AppData\Roaming\Python\Python38\site-packages\sklearn\utils\validation.py", line 72, in inner_f
return f(**kwargs)
TypeError: __init__() got an unexpected keyword argument 'n_iter'
I've tried uninstalling and installing scikit-learn but that did not help. Any help?
I just change the n_iter to max_iter and it work for me
ppn = Perceptron(max_iter=40, eta0=0.3, random_state=0)
You receive this error
TypeError: init() got an unexpected keyword argument 'n_iter'
because the Perceptron has no parameter 'n_iter' you can use before fitting it.
You are trying to access the n_iter_ attribute, which is an "Estimated attribute" (you can tell by the underscore at the end) and only stored after the fit method has been called. Reference in Documentation
Before fitting, you can only access the n_iter_no_change parameter for n_iter.

AttributeError: 'DecisionTreeRegressor' object has no attribute 'r2_score'

I'm testing machine learning methods on a csv file with kickstarter project data. But even though I can get "accuracy score", I get the following error when I try to get "r2 score". What would be the reason?
import numpy as np
import pandas as pd
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import accuracy_score
from sklearn.metrics import r2_score
veri = pd.read_csv("kick_rev.csv")
veri = veri.drop(['id'], axis=1)
veri = veri.drop(['i'], axis=1)
y = np.array(veri['state_num'])
x = np.array(veri.drop(['state_num','usd_goal_real','deadline','launched','country'], axis=1))
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.33)
DTR = DecisionTreeRegressor()
DTR.fit(X_train,y_train)
ytahmin = DTR.predict(x)
DTR.fit(veri[['goal','pledged','backers','usd_pledged','usd_pledged_real','category_num','category_main_num','currency_num','country_num']],veri.state_num)
accuracy_score = DTR.score(X_test,y_test)
a = np.array([5000,94175.0,1,57763.8,6469.73,13,6,0,0]).reshape(1, -1)
predict_DTR = DTR.predict(a)
r2 = DTR.r2_score(X_test, y_test)
print(accuracy_score)
print(r2)
Error:
AttributeError: 'DecisionTreeRegressor' object has no attribute 'r2_score'
R2 Score is between predicted and actual value. So you can't use Train features and prediction for comparision
r2_score(y_pred, y_true)
You can use this link for more clarification
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html

python, sklearn: 'dict' object is not callable using GridSearchCV and SVC

I'm trying to use GridSearchCV to optimize the parameters for the classifier svm.SVC (both from sklearn).
from sklearn.grid_search import GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix
import numpy as np
X_train = np.array([[1,2],[3,4],[5,6],[2,3],[9,4],[4,5],[2,7],[1,0],[4,7],[2,9])
Y_train = np.array([0,1,0,1,0,0,1,1,0,1])
X_test = np.array([[2,4],[5,3],[7,1],[2,4],[6,4],[2,7],[9,2],[7,5],[1,6],[0,3]])
Y_test = np.array([1,0,0,0,1,0,1,1,0,0])
parameters = {'kernel':['rbf'],'C':np.linspace(10,100,10)}
clf1 = GridSearchCV(SVC(), parameters, verbose = 10)
clf1.fit(X_train, Y_train)
cm = confusion_matrix(Y_test, clf1.predict(X_test))
bp = clf1.best_params_
The output shows it completing GridSearchCV, but then it throws the error:
Traceback (most recent call last):
File "<ipython console>", line 1, in <module>
File "C:\Python27\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 479, in runfile
execfile(filename, namespace)
File "I:\setup\Desktop\Stats\FinalProject.py", line 112, in <module>
clf1 = GridSearchCV(SVC(), parameters, verbose = 10)
TypeError: 'dict' object is not callable
When I am running the code you posted:
from sklearn.grid_search import GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix
import numpy as np
X_train = np.array([[1,2],[3,4],[5,6]])
Y_train = np.array([0,1,0])
X_test = np.array([[2,4],[5,3],[7,1]])
Y_test = np.array([1,0,0])
parameters = {'kernel':['rbf'],'C':np.linspace(10,100,10)}
clf1 = GridSearchCV(SVC(), parameters, verbose = 10)
clf1.fit(X_train, Y_train)
cm = confusion_matrix(Y_test, clf1.predict(X_test))
bp = clf1.best_params_
I'm getting this error:
File "C:\Anaconda\lib\site-packages\sklearn\svm\base.py", line 447, in _validate_targets
% len(cls))
ValueError: The number of classes has to be greater than one; got 1
Since the train data consist of 3 samples, when the GridSearchCV break the data into 3 folds (BTW you can control this parameter, it is called cv).
e.g. -
fold1 = [1,2] , label1 = 0
fold2 = [3,4] , label2 = 1
fold3 = [5,6] , label3 = 0
Now, in some iteration, it takes the first and the third folds to train on, and the second fold is used for validation.
Please note that these training folds contains only 1 type of label! (the label 0) hence the error it prints.
If I create the data in this manner:
X, Y = datasets.make_classification(n_samples=1000, n_features=4,
n_informative=2, n_redundant=2, n_classes=2)
X_train, X_test, Y_train, Y_test = sklearn.cross_validation.train_test_split(X,Y,
test_size =0.2)
It runs just fine.
I guess you have some other problem, but regarding the code you entered - this is the error it has.

Categories

Resources