RANSAC algorithm using scikit-learn's RANSACRegressor - python

I tried to use the code below for fitting a robust regression model using RANSAC
from sklearn.linear_model import RANSACRegressor
ransac = RANSACRegressor(LinearRegression(),
max_trials=100,
min_samples=50,
residual_metric=lambda x: np.sum(np.abs(x), axis=1),
residual_threshold=5.0,
random_state=0)
ransac.fit(X,y)
And I get the following error below:
TypeError Traceback (most recent call last)
<ipython-input-38-832d8b5d351b> in <module>
5 residual_metric=lambda x: np.sum(np.abs(x), axis=1),
6 residual_threshold=5.0,
----> 7 random_state=0)
8 ransac.fit(X,y)
TypeError: __init__() got an unexpected keyword argument 'residual_metric'
Can you help me know what's wrong?

Most likely you got this code that was using an old version of ransac. The input residual_metric is deprecated. If you run without that, it works ok:
from sklearn.linear_model import RANSACRegressor, LinearRegression
ransac = RANSACRegressor(LinearRegression(),
max_trials=100,
min_samples=50,
residual_threshold=5.0,
random_state=0)
ransac
RANSACRegressor(base_estimator=LinearRegression(), min_samples=50,
random_state=0, residual_threshold=5.0)

Related

Error while doing SVR for multiple outputs

Trying to do SVR for multiple outputs. Started by hyper-parameter tuning which worked for me. Now I want to create the model using the optimum parameters but I am getting an error. How to fix this?
from sklearn.svm import SVR
from sklearn.model_selection import GridSearchCV
from sklearn.multioutput import MultiOutputRegressor
svr = SVR()
svr_regr = MultiOutputRegressor(svr)
from sklearn.model_selection import KFold
kfold_splitter = KFold(n_splits=6, random_state = 0,shuffle=True)
svr_gs = GridSearchCV(svr_regr,
param_grid = {'estimator__kernel': ('linear','poly','rbf','sigmoid'),
'estimator__C': [1,1.5,2,2.5,3,3.5,4,4.5,5,5.5,6,6.5,7,7.5,8,8.5,9,9.5,10],
'estimator__degree': [3,8],
'estimator__coef0': [0.01,0.1,0.5],
'estimator__gamma': ('auto','scale'),
'estimator__tol': [1e-3, 1e-4, 1e-5, 1e-6]},
cv=kfold_splitter,
n_jobs=-1,
scoring='r2')
svr_gs.fit(X_train, y_train)
print(svr_gs.best_params_)
#print(gs.best_score_)
Output:
{'estimator__C': 10, 'estimator__coef0': 0.01, 'estimator__degree': 3, 'estimator__gamma': 'auto', 'estimator__kernel': 'rbf', 'estimator__tol': 1e-06}
Trying to create a model using the output:
SVR_model = svr_regr (kernel='rbf',C=10,
coef0=0.01,degree=3,
gamma='auto',tol=1e-6,random_state=42)
SVR_model.fit(X_train, y_train)
SVR_model_y_predict = SVR_model.predict((X_test))
SVR_model_y_predict
Error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/var/folders/mm/r4gnnwl948zclfyx12w803040000gn/T/ipykernel_96269/769104914.py in <module>
----> 1 SVR_model = svr_regr (estimator__kernel='rbf',estimator__C=10,
2 estimator__coef0=0.01,estimator__degree=3,
3 estimator__gamma='auto',estimator__tol=1e-6,random_state=42)
4
5
TypeError: 'MultiOutputRegressor' object is not callable
Please consult the MultiOutputRegressor docs.
The regressor you got back is the model.
It is not a method, but it does offer
a bunch of fun methods that you can call,
such as .fit(), .predict(), and .score().
You are trying to specify kernel and a few
other parameters.
It appears you wanted to offer those
to SVR(), at the top of your code.

Trying to do SVR for Multi-outputs

Since SVR supports only a single output, I am trying to employ SVR on my model which has 6 inputs and 19 outputs using MultiOutputRegressor.
I am starting with hyper-parameter tuning. However, I am getting the error below. How can I modify my code to support MultiOutputRegressor?
from sklearn.svm import SVR
from sklearn.model_selection import RandomizedSearchCV
svr = SVR()
svr_regr = MultiOutputRegressor(svr)
from sklearn.model_selection import KFold
kfold_splitter = KFold(n_splits=6, random_state = 0,shuffle=True)
#On each iteration, the algorithm will choose a difference combination of the features.
svr_random = RandomizedSearchCV(svr_regr,
param_distributions = {'kernel': ('linear','poly','rbf','sigmoid'),
'C': [1,1.5,2,2.5,3,3.5,4,4.5,5,5.5,6,6.5,7,7.5,8,8.5,9,9.5,10],
'degree': [3,8],
'coef0': [0.01,0.1,0.5],
'gamma': ('auto','scale')
'tol': [1e-3, 1e-4, 1e-5, 1e-6]},
n_iter=100,
cv=kfold_splitter,
n_jobs=-1,
random_state=42,
scoring='r2')
svr_random.fit(X_train, y_train)
print(svr_random.best_params_)
Error:
ValueError: Invalid parameter kernel for estimator MultiOutputRegressor(estimator=SVR()). Check the list of available parameters with `estimator.get_params().keys()`.
After getting the optimum parameters:
SVR_model = svr_regr (kernel='rbf',C=10,
coef0=0.01,degree=3,
gamma='auto',tol=1e-6,random_state=42)
SVR_model.fit(X_train, y_train)
SVR_model_y_predict = SVR_model.predict((X_test))
SVR_model_y_predict
Error after getting the optimum parameters:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/var/folders/mm/r4gnnwl948zclfyx12w803040000gn/T/ipykernel_96269/769104914.py in <module>
----> 1 SVR_model = svr_regr (estimator__kernel='rbf',estimator__C=10,
2 estimator__coef0=0.01,estimator__degree=3,
3 estimator__gamma='auto',estimator__tol=1e-6,random_state=42)
4
5
TypeError: 'MultiOutputRegressor' object is not callable
I tried to reproduce a simple example of MultiOutputRegressor without using GridSearchCV (i.e. just the fit and predict methods), which seemed to work fine. The error message:
Check the list of available parameters with estimator.get_params().keys()
suggests that the parameters that you are optimising in GridSearchCV, i.e. through param_distributions, don't match the parameters accepted by MultiOutputRegressor. Looking at the API reference, there are only a few parameters that MultiOutputRegressor takes, and the parameters you are trying to pass through to SVR, e.g. C and tol belong to the support vector machine estimator.
You may be able to pass through parameters to SVR via nested parameters similar to how it's done in a pipeline.

numpy.float64' object is not callable - hyperparameter tuning

I'm trying to do hyperparameter tuning and every time I run this code.
from sklearn.model_selection import GridSearchCV
param_grid = {'C':[0,1,1,100,1000], 'kernel':['rbf','poly','sigmoid','linear'],'degree':[1,2,3,4,5,6]}
grid =GridSearchCV(svc.sc(),param_grid)
grid.fit(X_train,y_train)
I get this error
TypeError Traceback (most recent call last)
<ipython-input-64-74de9eeb3cae> in <module>
3
4 param_grid = {'C':[0,1,1,100,1000], 'kernel':['rbf','poly','sigmoid','linear'],'degree':[1,2,3,4,5,6]}
----> 5 grid =GridSearchCV(svc.sc(),param_grid)
6 grid.fit(X_train,y_train)
TypeError: 'numpy.float64' object is not callable
Any idea what to do? Also svc.sc is the way defined the model.
What is svc.sc()? Either way, you're probably not meant to call it at that point, just pass it as the callback to GridSearchCV, i.e. drop the parentheses:
grid = GridSearchCV(svc.sc, param_grid)

AttributeError: 'KMeans' object has no attribute 'setK'

Example from https://runawayhorse001.github.io/LearningApacheSpark/clustering.html
caused strange error while I decided to test the clustering example for Spark.
Example:
from sklearn.cluster import KMeans
import numpy as np
cost = np.zeros(20)
for k in range(2,20):
kmeans = KMeans()\
.setK(k)\
.setSeed(1) \
.setFeaturesCol("indexedFeatures")\
.setPredictionCol("cluster")
model = kmeans.fit(data)
cost[k] = model.computeCost(data)
And it caused Error in Kmeans attributes despite of fit already implemented.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-22-296a7d54514a> in <module>
2 cost = np.zeros(20)
3 for k in range(2,20):
----> 4 kmeans = KMeans()\
5 .setK(k)\
6 .setSeed(1) \
AttributeError: 'KMeans' object has no attribute 'setK'
I had similar issues in the past and .fit() solved them, but now it is not working.
You're importing the wrong KMeans. I believe that KMeans refer to the one in Spark ML, not in scikit-learn.
from pyspark.ml.clustering import KMeans

Object has no attribute in scikit-learn, how can I access it?

I would like to use different parameters of scikit's SVC classifier with cross-vlidation, so I tried the following:
Then, let's use SVC algorithm:
from sklearn import svm
print('Support vector machine(SVM): {:.2f}'.format(metrics.accuracy_score(
y, stratified_cv(X, y, svm.SVC(kernel='linear')))))
But it seems I can not access to the object:
AttributeError Traceback (most recent call last)
<ipython-input-16-dacd8d429376> in <module>()
5
6 print('Support vector machine(SVM): {:.2f}'.format(metrics.accuracy_score(
----> 7 y, stratified_cv(X, y, svm.SVC(kernel='linear')))))
8
AttributeError: 'SVC' object has no attribute 'SVC'
Interestingly, when I try this:
print('Support vector machine(SVM): {:.2f}'.format(metrics.accuracy_score(
y, stratified_cv(X, y, svm.SVC))))
I get:
Support vector machine(SVM): 0.46
What could be happening?...any idea of given the above cross validation strategy, how to set up my own SVM configuration?. Thanks in advance guys!
You need a partial from python. In general, your function requires you to pass something that can be called with clf_class(**kwargs), so if you pass a particular object (obtained through clf = SVC(kernel='linear')) it won't work, as you try to do
SVC(kernel='linear')(**kwargs) # error!
you want to call
SVC(kernel='linear', **kwargs)
so you can declare the partial function in python
from functools import partial
linear_svm = partial(svm.SVC, kernel='linear')
and now you can call
linear_svm(**kwargs)

Categories

Resources