Error while doing SVR for multiple outputs - python

Trying to do SVR for multiple outputs. Started by hyper-parameter tuning which worked for me. Now I want to create the model using the optimum parameters but I am getting an error. How to fix this?
from sklearn.svm import SVR
from sklearn.model_selection import GridSearchCV
from sklearn.multioutput import MultiOutputRegressor
svr = SVR()
svr_regr = MultiOutputRegressor(svr)
from sklearn.model_selection import KFold
kfold_splitter = KFold(n_splits=6, random_state = 0,shuffle=True)
svr_gs = GridSearchCV(svr_regr,
param_grid = {'estimator__kernel': ('linear','poly','rbf','sigmoid'),
'estimator__C': [1,1.5,2,2.5,3,3.5,4,4.5,5,5.5,6,6.5,7,7.5,8,8.5,9,9.5,10],
'estimator__degree': [3,8],
'estimator__coef0': [0.01,0.1,0.5],
'estimator__gamma': ('auto','scale'),
'estimator__tol': [1e-3, 1e-4, 1e-5, 1e-6]},
cv=kfold_splitter,
n_jobs=-1,
scoring='r2')
svr_gs.fit(X_train, y_train)
print(svr_gs.best_params_)
#print(gs.best_score_)
Output:
{'estimator__C': 10, 'estimator__coef0': 0.01, 'estimator__degree': 3, 'estimator__gamma': 'auto', 'estimator__kernel': 'rbf', 'estimator__tol': 1e-06}
Trying to create a model using the output:
SVR_model = svr_regr (kernel='rbf',C=10,
coef0=0.01,degree=3,
gamma='auto',tol=1e-6,random_state=42)
SVR_model.fit(X_train, y_train)
SVR_model_y_predict = SVR_model.predict((X_test))
SVR_model_y_predict
Error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/var/folders/mm/r4gnnwl948zclfyx12w803040000gn/T/ipykernel_96269/769104914.py in <module>
----> 1 SVR_model = svr_regr (estimator__kernel='rbf',estimator__C=10,
2 estimator__coef0=0.01,estimator__degree=3,
3 estimator__gamma='auto',estimator__tol=1e-6,random_state=42)
4
5
TypeError: 'MultiOutputRegressor' object is not callable

Please consult the MultiOutputRegressor docs.
The regressor you got back is the model.
It is not a method, but it does offer
a bunch of fun methods that you can call,
such as .fit(), .predict(), and .score().
You are trying to specify kernel and a few
other parameters.
It appears you wanted to offer those
to SVR(), at the top of your code.

Related

Trying to do SVR for Multi-outputs

Since SVR supports only a single output, I am trying to employ SVR on my model which has 6 inputs and 19 outputs using MultiOutputRegressor.
I am starting with hyper-parameter tuning. However, I am getting the error below. How can I modify my code to support MultiOutputRegressor?
from sklearn.svm import SVR
from sklearn.model_selection import RandomizedSearchCV
svr = SVR()
svr_regr = MultiOutputRegressor(svr)
from sklearn.model_selection import KFold
kfold_splitter = KFold(n_splits=6, random_state = 0,shuffle=True)
#On each iteration, the algorithm will choose a difference combination of the features.
svr_random = RandomizedSearchCV(svr_regr,
param_distributions = {'kernel': ('linear','poly','rbf','sigmoid'),
'C': [1,1.5,2,2.5,3,3.5,4,4.5,5,5.5,6,6.5,7,7.5,8,8.5,9,9.5,10],
'degree': [3,8],
'coef0': [0.01,0.1,0.5],
'gamma': ('auto','scale')
'tol': [1e-3, 1e-4, 1e-5, 1e-6]},
n_iter=100,
cv=kfold_splitter,
n_jobs=-1,
random_state=42,
scoring='r2')
svr_random.fit(X_train, y_train)
print(svr_random.best_params_)
Error:
ValueError: Invalid parameter kernel for estimator MultiOutputRegressor(estimator=SVR()). Check the list of available parameters with `estimator.get_params().keys()`.
After getting the optimum parameters:
SVR_model = svr_regr (kernel='rbf',C=10,
coef0=0.01,degree=3,
gamma='auto',tol=1e-6,random_state=42)
SVR_model.fit(X_train, y_train)
SVR_model_y_predict = SVR_model.predict((X_test))
SVR_model_y_predict
Error after getting the optimum parameters:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/var/folders/mm/r4gnnwl948zclfyx12w803040000gn/T/ipykernel_96269/769104914.py in <module>
----> 1 SVR_model = svr_regr (estimator__kernel='rbf',estimator__C=10,
2 estimator__coef0=0.01,estimator__degree=3,
3 estimator__gamma='auto',estimator__tol=1e-6,random_state=42)
4
5
TypeError: 'MultiOutputRegressor' object is not callable
I tried to reproduce a simple example of MultiOutputRegressor without using GridSearchCV (i.e. just the fit and predict methods), which seemed to work fine. The error message:
Check the list of available parameters with estimator.get_params().keys()
suggests that the parameters that you are optimising in GridSearchCV, i.e. through param_distributions, don't match the parameters accepted by MultiOutputRegressor. Looking at the API reference, there are only a few parameters that MultiOutputRegressor takes, and the parameters you are trying to pass through to SVR, e.g. C and tol belong to the support vector machine estimator.
You may be able to pass through parameters to SVR via nested parameters similar to how it's done in a pipeline.

"Not enough values to unpack" in sklearn.fit

Here's the piece of the code:
from sklearn.model_selection import StratifiedKFold
from sklearn.linear_model import LogisticRegressionCV
skf = StratifiedKFold(n_splits=5)
skf_1 = skf.split(titanic_dataset, surv_titanic)
ls_1 = np.logspace(-1.0, 2.0, num=500)
clf = LogisticRegressionCV(Cs=ls_1, cv = skf_1, scoring = "roc_auc", n_jobs=-1, random_state=17)
clf_model = clf.fit(x_train, y_train)
This says:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-130-b99a5912ff5a> in <module>
----> 1 clf_model = clf.fit(x_train, y_train)
H:\Anaconda_3\lib\site-packages\sklearn\linear_model\_logistic.py in fit(self, X, y, sample_weight)
2098 # (n_classes, n_folds, n_Cs . n_l1_ratios) or
2099 # (1, n_folds, n_Cs . n_l1_ratios)
-> 2100 coefs_paths, Cs, scores, n_iter_ = zip(*fold_coefs_)
2101 self.Cs_ = Cs[0]
2102 if multi_class == 'multinomial':
ValueError: not enough values to unpack (expected 4, got 0)
The train and test datasets had been prepared before, and they behave nicely with other classifiers.
Such a generic error message tells me nothing. What is the problem here?
In short, the issue was that you passed the result of skf.split(titanic_dataset, surv_titanic) to the cv argument on LogisticRegressionCV when you needed to pass StratifiedKFold(n_splits=5) directly instead.
Below I show the code that reproduced your error, and below that I show two alternative methods that accomplish what I believe you were trying to do.
# Some example data
data = load_breast_cancer()
X = data['data']
y = data['target']
# Set up the stratifiedKFold
skf = StratifiedKFold(n_splits=5)
# Don't do this... only here to reproduce the error
skf_indicies = skf.split(X, y)
# Some regularization
ls_1 = np.logspace(-1.0, 2.0, num=5)
# This creates your error
clf_error = LogisticRegressionCV(Cs=ls_1,
cv = skf_indicies,
scoring = "roc_auc",
n_jobs=-1,
random_state=17)
# Error created by passing result of skf.split to cv
clf_model = clf_error.fit(X, y)
# This is probably what you meant to do
clf_using_skf = LogisticRegressionCV(Cs=ls_1,
cv = skf,
scoring = "roc_auc",
n_jobs=-1,
random_state=17,
max_iter=1_000)
# This will now fit without the error
clf_model_skf = clf_using_skf.fit(X, y)
# This is the easiest method, and from the docs also does the
# same thing as StratifiedKFold
# https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegressionCV.html
clf_easiest = LogisticRegressionCV(Cs=ls_1,
cv = 5,
scoring = "roc_auc",
n_jobs=-1,
random_state=17,
max_iter=1_000)
# This will now fit without the error
clf_model_easiest = clf_easiest.fit(X, y)

RANSAC algorithm using scikit-learn's RANSACRegressor

I tried to use the code below for fitting a robust regression model using RANSAC
from sklearn.linear_model import RANSACRegressor
ransac = RANSACRegressor(LinearRegression(),
max_trials=100,
min_samples=50,
residual_metric=lambda x: np.sum(np.abs(x), axis=1),
residual_threshold=5.0,
random_state=0)
ransac.fit(X,y)
And I get the following error below:
TypeError Traceback (most recent call last)
<ipython-input-38-832d8b5d351b> in <module>
5 residual_metric=lambda x: np.sum(np.abs(x), axis=1),
6 residual_threshold=5.0,
----> 7 random_state=0)
8 ransac.fit(X,y)
TypeError: __init__() got an unexpected keyword argument 'residual_metric'
Can you help me know what's wrong?
Most likely you got this code that was using an old version of ransac. The input residual_metric is deprecated. If you run without that, it works ok:
from sklearn.linear_model import RANSACRegressor, LinearRegression
ransac = RANSACRegressor(LinearRegression(),
max_trials=100,
min_samples=50,
residual_threshold=5.0,
random_state=0)
ransac
RANSACRegressor(base_estimator=LinearRegression(), min_samples=50,
random_state=0, residual_threshold=5.0)

How can I define my own scoring strategy sklearn.model_selection.GridSearchCV?

I would like to define a new scoring in GridSearchCV as it is said here http://scikit-learn.org/stable/modules/model_evaluation.html#implementing-your-own-scoring-object . This is my code:
from sklearn.model_selection import GridSearchCV
def pe_score(estimator,x,y):
clf=estimator
clf.fit(x,y)
z=clf.predict(x)
pe=prob_error(z, y)
return pe
pe_error=pe_score(SVC(),xTrain,yTrain)
grid = GridSearchCV(SVC(), param_grid={'kernel':('linear', 'rbf'), 'C':[1, 10, 100,1000,10000]}, scoring=pe_error)
where prob_error(z,y) is the function that computes the error which I would like to minimize, being z the prediction of the training set and y the true values of the training set. However, I got the following error:
---> 18 clf.fit(xTrain, yTrain)
TypeError: 'numpy.float64' object is not callable
I don't know if the format of pe_error it is well defined. How can I solve it? Thank you.
Score functions should have the format score_func(y, y_pred, **kwargs)
You can then use the make_scorer function to take your scoring function and get it to work with GridSearchCV.
So, in this case it would be:
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import make_scorer
clf = estimator
clf.fit(x,y)
z = clf.predict(x)
def pe_score(y, y_pred):
pe = prob_error(y_pred, y)
return pe
pe_error = make_scorer(pe_score)
grid = GridSearchCV(SVC(), param_grid={'kernel':('linear', 'rbf'), 'C':[1, 10, 100,1000,10000]}, scoring= pe_error)
(I'm assuming you have prob_error implemented or imported somewhere else in your code)
Documentation: http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html

'KNeighborsClassifier' object is not callable

I have a feature set Xtrain with dimensions (n_obs,n_features) and responses ytrain with dim (n_obs) . I am attempting to use KNN as a classifier.
from sklearn.neighbors import KNeighborsClassifier
neigh = KNeighborsClassifier()
clf = neigh(n_neighbors = 10)
clf.fit(Xtrain,ytrain)
I get error message:
TypeError
Traceback (most recent call last)
22 clf = neigh(n_neighbors = 10)
23 # Fit best model to data
24 clf.fit(Xtrain, ytrain)
TypeError: 'KNeighborsClassifier' object is not callable
Not sure what the problem is...any help appreciated.
Try:
clf = KNeighborsClassifier(n_neighbors = 10)
clf.fit(Xtrain,ytrain)
Classifier parameters go inside the constructor. You where trying to create a new object with an already instantiated classifier.
The following:
from sklearn.neighbors import KNeighborsClassifier
neigh = KNeighborsClassifier
clf = neigh(n_neighbors = 10)
clf.fit(Xtrain, ytrain)
would also work.

Categories

Resources