Object has no attribute in scikit-learn, how can I access it? - python

I would like to use different parameters of scikit's SVC classifier with cross-vlidation, so I tried the following:
Then, let's use SVC algorithm:
from sklearn import svm
print('Support vector machine(SVM): {:.2f}'.format(metrics.accuracy_score(
y, stratified_cv(X, y, svm.SVC(kernel='linear')))))
But it seems I can not access to the object:
AttributeError Traceback (most recent call last)
<ipython-input-16-dacd8d429376> in <module>()
5
6 print('Support vector machine(SVM): {:.2f}'.format(metrics.accuracy_score(
----> 7 y, stratified_cv(X, y, svm.SVC(kernel='linear')))))
8
AttributeError: 'SVC' object has no attribute 'SVC'
Interestingly, when I try this:
print('Support vector machine(SVM): {:.2f}'.format(metrics.accuracy_score(
y, stratified_cv(X, y, svm.SVC))))
I get:
Support vector machine(SVM): 0.46
What could be happening?...any idea of given the above cross validation strategy, how to set up my own SVM configuration?. Thanks in advance guys!

You need a partial from python. In general, your function requires you to pass something that can be called with clf_class(**kwargs), so if you pass a particular object (obtained through clf = SVC(kernel='linear')) it won't work, as you try to do
SVC(kernel='linear')(**kwargs) # error!
you want to call
SVC(kernel='linear', **kwargs)
so you can declare the partial function in python
from functools import partial
linear_svm = partial(svm.SVC, kernel='linear')
and now you can call
linear_svm(**kwargs)

Related

Trying to do SVR for Multi-outputs

Since SVR supports only a single output, I am trying to employ SVR on my model which has 6 inputs and 19 outputs using MultiOutputRegressor.
I am starting with hyper-parameter tuning. However, I am getting the error below. How can I modify my code to support MultiOutputRegressor?
from sklearn.svm import SVR
from sklearn.model_selection import RandomizedSearchCV
svr = SVR()
svr_regr = MultiOutputRegressor(svr)
from sklearn.model_selection import KFold
kfold_splitter = KFold(n_splits=6, random_state = 0,shuffle=True)
#On each iteration, the algorithm will choose a difference combination of the features.
svr_random = RandomizedSearchCV(svr_regr,
param_distributions = {'kernel': ('linear','poly','rbf','sigmoid'),
'C': [1,1.5,2,2.5,3,3.5,4,4.5,5,5.5,6,6.5,7,7.5,8,8.5,9,9.5,10],
'degree': [3,8],
'coef0': [0.01,0.1,0.5],
'gamma': ('auto','scale')
'tol': [1e-3, 1e-4, 1e-5, 1e-6]},
n_iter=100,
cv=kfold_splitter,
n_jobs=-1,
random_state=42,
scoring='r2')
svr_random.fit(X_train, y_train)
print(svr_random.best_params_)
Error:
ValueError: Invalid parameter kernel for estimator MultiOutputRegressor(estimator=SVR()). Check the list of available parameters with `estimator.get_params().keys()`.
After getting the optimum parameters:
SVR_model = svr_regr (kernel='rbf',C=10,
coef0=0.01,degree=3,
gamma='auto',tol=1e-6,random_state=42)
SVR_model.fit(X_train, y_train)
SVR_model_y_predict = SVR_model.predict((X_test))
SVR_model_y_predict
Error after getting the optimum parameters:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/var/folders/mm/r4gnnwl948zclfyx12w803040000gn/T/ipykernel_96269/769104914.py in <module>
----> 1 SVR_model = svr_regr (estimator__kernel='rbf',estimator__C=10,
2 estimator__coef0=0.01,estimator__degree=3,
3 estimator__gamma='auto',estimator__tol=1e-6,random_state=42)
4
5
TypeError: 'MultiOutputRegressor' object is not callable
I tried to reproduce a simple example of MultiOutputRegressor without using GridSearchCV (i.e. just the fit and predict methods), which seemed to work fine. The error message:
Check the list of available parameters with estimator.get_params().keys()
suggests that the parameters that you are optimising in GridSearchCV, i.e. through param_distributions, don't match the parameters accepted by MultiOutputRegressor. Looking at the API reference, there are only a few parameters that MultiOutputRegressor takes, and the parameters you are trying to pass through to SVR, e.g. C and tol belong to the support vector machine estimator.
You may be able to pass through parameters to SVR via nested parameters similar to how it's done in a pipeline.

numpy.float64' object is not callable - hyperparameter tuning

I'm trying to do hyperparameter tuning and every time I run this code.
from sklearn.model_selection import GridSearchCV
param_grid = {'C':[0,1,1,100,1000], 'kernel':['rbf','poly','sigmoid','linear'],'degree':[1,2,3,4,5,6]}
grid =GridSearchCV(svc.sc(),param_grid)
grid.fit(X_train,y_train)
I get this error
TypeError Traceback (most recent call last)
<ipython-input-64-74de9eeb3cae> in <module>
3
4 param_grid = {'C':[0,1,1,100,1000], 'kernel':['rbf','poly','sigmoid','linear'],'degree':[1,2,3,4,5,6]}
----> 5 grid =GridSearchCV(svc.sc(),param_grid)
6 grid.fit(X_train,y_train)
TypeError: 'numpy.float64' object is not callable
Any idea what to do? Also svc.sc is the way defined the model.
What is svc.sc()? Either way, you're probably not meant to call it at that point, just pass it as the callback to GridSearchCV, i.e. drop the parentheses:
grid = GridSearchCV(svc.sc, param_grid)

How to solve a TypeError using LeaveOneOut

I have been trying to work through the Vanderplass book and I have been stuck on this cell for days now:
from sklearn.model_selection import cross_val_score
cross_val_score(model, X, y, cv=5)
from sklearn.model_selection import LeaveOneOut
scores = cross_val_score(model, X, y, cv=LeaveOneOut(len(X)))
scores
TypeError Traceback (most recent call last)
<. ipython-input-78-029fa0c72898> in <module>
1 from sklearn.model_selection import LeaveOneOut
----> 2 scores = cross_val_score(model, X, y, cv=LeaveOneOut(len(X)))
3 scores
TypeError: LeaveOneOut() takes no arguments
import sklearn
sklearn.__version__
0.22.1'
Thanks in advance for any help!
Cameron
Welcome to Stack Overflow!
The error says LeaveOneOut() takes no arguments, but when you instantiated LeaveOneOut you passed it len(X) as an argument (in LeaveOneOut(len(X))).
If you change your scores line to the line below it should work:
scores = cross_val_score(model, X, y, cv=LeaveOneOut())
However, note this warning from the scikit-learn documentation:
Note: LeaveOneOut() is equivalent to KFold(n_splits=n)...Due to the
high number of test sets (which is the same as the number of samples)
this cross-validation method can be very costly. For large datasets
one should favor KFold, ShuffleSplit or StratifiedKFold.
In case that's not clear, that's a suggestion to use e.g. KFold with like n=5, which is usually going to serve you better than LeaveOneOut.
or, can also use
from sklearn.model_selection import cross_val_score
cross_val_score(model, X, y, cv=5)
from sklearn.model_selection import LeaveOneOut
scores = cross_val_score(model, X, y, cv=LeaveOneOut().split(X))
scores

RANSAC algorithm using scikit-learn's RANSACRegressor

I tried to use the code below for fitting a robust regression model using RANSAC
from sklearn.linear_model import RANSACRegressor
ransac = RANSACRegressor(LinearRegression(),
max_trials=100,
min_samples=50,
residual_metric=lambda x: np.sum(np.abs(x), axis=1),
residual_threshold=5.0,
random_state=0)
ransac.fit(X,y)
And I get the following error below:
TypeError Traceback (most recent call last)
<ipython-input-38-832d8b5d351b> in <module>
5 residual_metric=lambda x: np.sum(np.abs(x), axis=1),
6 residual_threshold=5.0,
----> 7 random_state=0)
8 ransac.fit(X,y)
TypeError: __init__() got an unexpected keyword argument 'residual_metric'
Can you help me know what's wrong?
Most likely you got this code that was using an old version of ransac. The input residual_metric is deprecated. If you run without that, it works ok:
from sklearn.linear_model import RANSACRegressor, LinearRegression
ransac = RANSACRegressor(LinearRegression(),
max_trials=100,
min_samples=50,
residual_threshold=5.0,
random_state=0)
ransac
RANSACRegressor(base_estimator=LinearRegression(), min_samples=50,
random_state=0, residual_threshold=5.0)

TypeError: __init__() got multiple values for argument 'n_splits'

I'm using SKLearn version (0.20.2) following by:
from sklearn.model_selection import StratifiedKFold
grid = GridSearchCV(
pipeline, # pipeline from above
params, # parameters to tune via cross validation
refit=True, # fit using all available data at the end, on the best found param combination
scoring='accuracy', # what score are we optimizing?
cv=StratifiedKFold(label_train, n_splits=5), # what type of cross validation to use
)
But i don't understand why i will get this error:
TypeError Traceback (most recent call last)
<ipython-input-26-03a56044cb82> in <module>()
10 refit=True, # fit using all available data at the end, on the best found param combination
11 scoring='accuracy', # what score are we optimizing?
---> 12 cv=StratifiedKFold(label_train, n_splits=5), # what type of cross validation to use
13 )
TypeError: __init__() got multiple values for argument 'n_splits'
Im already tried n_fold but come with the same error result. And also tired to update my scikit version and my conda. Any idea to fix this ? Thanks a lot!
StratifiedKFold takes exactly 3 arguments when initialized, none of which are the training data:
StratifiedKFold(n_splits=’warn’, shuffle=False, random_state=None)
So when you call StratifiedKFold(label_train, n_splits=5) it thinks you passed n_splits twice.
Instead, create the object, then use the methods as described in the example on the sklearn docs page for using the object to split your data:
get_n_splits([X, y, groups]) Returns the number of splitting
iterations in the cross-validator split(X, y[, groups]) Generate
indices to split data into training and test set.
StratifiedKFold takes three arguments but you are passing two arguments. See more in sklearn documentation
Create StratifiedKFold object and pass it to GridSearchCV as below.
skf = StratifiedKFold(n_splits=5)
skf.get_n_splits(X_train, Y_train)
grid = GridSearchCV(
pipeline, # pipeline from above
params, # parameters to tune via cross validation
refit=True, # fit using all available data at the end, on the best found param combination
scoring='accuracy', # what score are we optimizing?
cv=skf, # what type of cross validation to use
)

Categories

Resources