I am trying to run XGboost with with calibrated classifier, below is the snippet of code where I am facing the error:
from sklearn.calibration import CalibratedClassifierCV
from xgboost import XGBClassifier
import numpy as np
x_train =np.array([1,2,2,3,4,5,6,3,4,10,]).reshape(-1,1)
y_train = np.array([1,1,1,1,1,3,3,3,3,3])
x_cfl=XGBClassifier(n_estimators=1)
x_cfl.fit(x_train,y_train)
sig_clf = CalibratedClassifierCV(x_cfl, method="sigmoid")
sig_clf.fit(x_train, y_train)
Error:
TypeError: predict_proba() got an unexpected keyword argument 'X'"
Full Trace:
TypeError Traceback (most recent call last)
<ipython-input-48-08dd0b4ae8aa> in <module>
----> 1 sig_clf.fit(x_train, y_train)
~/anaconda3/lib/python3.8/site-packages/sklearn/calibration.py in fit(self, X, y, sample_weight)
309 parallel = Parallel(n_jobs=self.n_jobs)
310
--> 311 self.calibrated_classifiers_ = parallel(
312 delayed(_fit_classifier_calibrator_pair)(
313 clone(base_estimator), X, y, train=train, test=test,
~/anaconda3/lib/python3.8/site-packages/joblib/parallel.py in __call__(self, iterable)
1039 # remaining jobs.
1040 self._iterating = False
-> 1041 if self.dispatch_one_batch(iterator):
1042 self._iterating = self._original_iterator is not None
1043
~/anaconda3/lib/python3.8/site-packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
857 return False
858 else:
--> 859 self._dispatch(tasks)
860 return True
861
~/anaconda3/lib/python3.8/site-packages/joblib/parallel.py in _dispatch(self, batch)
775 with self._lock:
776 job_idx = len(self._jobs)
--> 777 job = self._backend.apply_async(batch, callback=cb)
778 # A job can complete so quickly than its callback is
779 # called before we get here, causing self._jobs to
~/anaconda3/lib/python3.8/site-packages/joblib/_parallel_backends.py in apply_async(self, func, callback)
206 def apply_async(self, func, callback=None):
207 """Schedule a func to be run"""
--> 208 result = ImmediateResult(func)
209 if callback:
210 callback(result)
~/anaconda3/lib/python3.8/site-packages/joblib/_parallel_backends.py in __init__(self, batch)
570 # Don't delay the application, to avoid keeping the input
571 # arguments in memory
--> 572 self.results = batch()
573
574 def get(self):
~/anaconda3/lib/python3.8/site-packages/joblib/parallel.py in __call__(self)
260 # change the default number of processes to -1
261 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 262 return [func(*args, **kwargs)
263 for func, args, kwargs in self.items]
264
~/anaconda3/lib/python3.8/site-packages/joblib/parallel.py in <listcomp>(.0)
260 # change the default number of processes to -1
261 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 262 return [func(*args, **kwargs)
263 for func, args, kwargs in self.items]
264
~/anaconda3/lib/python3.8/site-packages/sklearn/utils/fixes.py in __call__(self, *args, **kwargs)
220 def __call__(self, *args, **kwargs):
221 with config_context(**self.config):
--> 222 return self.function(*args, **kwargs)
~/anaconda3/lib/python3.8/site-packages/sklearn/calibration.py in _fit_classifier_calibrator_pair(estimator, X, y, train, test, supports_sw, method, classes, sample_weight)
443 n_classes = len(classes)
444 pred_method = _get_prediction_method(estimator)
--> 445 predictions = _compute_predictions(pred_method, X[test], n_classes)
446
447 sw = None if sample_weight is None else sample_weight[test]
~/anaconda3/lib/python3.8/site-packages/sklearn/calibration.py in _compute_predictions(pred_method, X, n_classes)
499 (X.shape[0], 1).
500 """
--> 501 predictions = pred_method(X=X)
502 if hasattr(pred_method, '__name__'):
503 method_name = pred_method.__name__
TypeError: predict_proba() got an unexpected keyword argument 'X'
I am quite surprised by this, as it was running for me till yesterday, same code is running when I use some other Classifier.
from sklearn.calibration import CalibratedClassifierCV
from xgboost import XGBClassifier
import numpy as np
x_train = np.array([1,2,2,3,4,5,6,3,4,10,]).reshape(-1,1)
y_train = np.array([1,1,1,1,1,3,3,3,3,3])
x_cfl=LGBMClassifier(n_estimators=1)
x_cfl.fit(x_train,y_train)
sig_clf = CalibratedClassifierCV(x_cfl, method="sigmoid")
sig_clf.fit(x_train, y_train)
Output:
CalibratedClassifierCV(base_estimator=LGBMClassifier(n_estimators=1))
Is there a problem with my Xgboost installation?? I use conda for installation and last I remember I had uninstalled xgboost yesterday and installed it again.
my xgboost version:
1.3.0
I believe that the problem comes from XGBoost.
It's explained here: https://github.com/dmlc/xgboost/pull/6555
XGBoost defined:
predict_proba(self, data, ...
instead of:
predict_proba(self, X, ...
And since sklearn 0.24 calls clf.predict_proba(X=X), an exception is thrown.
Here is an idea to fix the problem without changing the version of your packages: Create a class that inherits XGBoostClassifier to override predict_proba with the right argument names and call super().
It's fixed now, seems like there is a bug in scikit-learn= 0.24
I downgraded to 0.22.2.post1 and it was fixed!
Th best way is to upgrade your xgboost
run the following from the jupyter notebook
pip install xgboost --upgrade
Related
I am trying to train a StackingClassifier in Sklearn, but I keep running into this error where the fit method seems to be complaining about me having passed it numpy arrays. To my knowledge, this is how all the fit methods in sklearn are supposed to work. I read and followed the example from the documentation and expanded on it to include a more complex and comprehensive pipeline that would process categorical, ordinal, scalar, and text data.
Sorry in advance for the lengthy code sample, but I felt it was necessary to provide a complete reproducible example. Simply breaking down the pipeline into its constituent estimators and test those individually did not raise any exceptions, so I figure that the error somehow comes from the gestalt estimator.
Select Features
categorical_data = [
"race",
"gender",
"admission_type_id",
"discharge_disposition_id",
"admission_source_id",
"insulin",
"diabetesMed",
"change",
"payer_code",
"A1Cresult",
"metformin",
"repaglinide",
"nateglinide",
"chlorpropamide",
"glimepiride",
"glipizide",
"glyburide",
"tolbutamide",
"pioglitazone",
"rosiglitazone",
"acarbose",
"miglitol",
"tolazamide",
"glyburide.metformin",
"glipizide.metformin",
]
ordinal_data = [
"age"
]
scalar_data = [
"num_medications",
"time_in_hospital",
"num_lab_procedures",
"num_procedures",
"number_outpatient",
"number_emergency",
"number_inpatient",
"number_diagnoses",
]
text_data = [
"diag_1_desc",
"diag_2_desc",
"diag_3_desc"
]
Create Column Transformers
impute_trans = compose.make_column_transformer(
(
impute.SimpleImputer(
strategy="constant",
fill_value="missing"
),
categorical_data
)
)
encode_trans = compose.make_column_transformer(
(
preprocessing.OneHotEncoder(
sparse=False,
handle_unknown="ignore"
),
categorical_data
),
(
preprocessing.OrdinalEncoder(),
ordinal_data
)
)
scalar_trans = compose.make_column_transformer(
(preprocessing.StandardScaler(), scalar_data),
)
text_trans = compose.make_column_transformer(
(TfidfVectorizer(ngram_range=(1,2)), "diag_1_desc"),
(TfidfVectorizer(ngram_range=(1,2)), "diag_2_desc"),
(TfidfVectorizer(ngram_range=(1,2)), "diag_3_desc"),
)
Create Estimators
cat_pre_pipe = make_pipeline(impute_trans, encode_trans)
logreg = LogisticRegression(
solver = "saga",
penalty="elasticnet",
l1_ratio=0.5,
max_iter=1000
)
text_pipe = make_pipeline(text_trans, logreg)
scalar_pipe = make_pipeline(scalar_trans, logreg)
cat_pipe = make_pipeline(cat_pre_pipe, logreg)
estimators = [
("cat", cat_pipe),
("text", text_pipe),
("scalar", scalar_pipe)
]
Create Stacking Classifier
stack_clf = StackingClassifier(
estimators=estimators,
final_estimator=logreg
)
diabetes_data = pd.read_csv("8k_diabetes.csv", delimiter=',')
x_train, x_test, y_train, y_test = train_test_split(
pd.concat([
preprocess_dataframe(diabetes_data[text_data]),
diabetes_data[categorical_data + scalar_data]
], axis=1),
diabetes_data["readmitted"].astype(int)
)
stack_clf.fit(x_train, y_train)
Complete Stack Trace
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
File ~/anaconda3/envs/assignment2/lib/python3.8/site-packages/sklearn/utils/__init__.py:409, in _get_column_indices(X, key)
408 try:
--> 409 all_columns = X.columns
410 except AttributeError:
AttributeError: 'numpy.ndarray' object has no attribute 'columns'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
Input In [19], in <cell line: 1>()
----> 1 stack_clf.fit(x_train, y_train)
File ~/anaconda3/envs/assignment2/lib/python3.8/site-packages/sklearn/ensemble/_stacking.py:488, in StackingClassifier.fit(self, X, y, sample_weight)
486 self._le = LabelEncoder().fit(y)
487 self.classes_ = self._le.classes_
--> 488 return super().fit(X, self._le.transform(y), sample_weight)
File ~/anaconda3/envs/assignment2/lib/python3.8/site-packages/sklearn/ensemble/_stacking.py:158, in _BaseStacking.fit(self, X, y, sample_weight)
153 stack_method = [self.stack_method] * len(all_estimators)
155 # Fit the base estimators on the whole training data. Those
156 # base estimators will be used in transform, predict, and
157 # predict_proba. They are exposed publicly.
--> 158 self.estimators_ = Parallel(n_jobs=self.n_jobs)(
159 delayed(_fit_single_estimator)(clone(est), X, y, sample_weight)
160 for est in all_estimators
161 if est != "drop"
162 )
164 self.named_estimators_ = Bunch()
165 est_fitted_idx = 0
File ~/anaconda3/envs/assignment2/lib/python3.8/site-packages/joblib/parallel.py:1043, in Parallel.__call__(self, iterable)
1034 try:
1035 # Only set self._iterating to True if at least a batch
1036 # was dispatched. In particular this covers the edge
(...)
1040 # was very quick and its callback already dispatched all the
1041 # remaining jobs.
1042 self._iterating = False
-> 1043 if self.dispatch_one_batch(iterator):
1044 self._iterating = self._original_iterator is not None
1046 while self.dispatch_one_batch(iterator):
File ~/anaconda3/envs/assignment2/lib/python3.8/site-packages/joblib/parallel.py:861, in Parallel.dispatch_one_batch(self, iterator)
859 return False
860 else:
--> 861 self._dispatch(tasks)
862 return True
File ~/anaconda3/envs/assignment2/lib/python3.8/site-packages/joblib/parallel.py:779, in Parallel._dispatch(self, batch)
777 with self._lock:
778 job_idx = len(self._jobs)
--> 779 job = self._backend.apply_async(batch, callback=cb)
780 # A job can complete so quickly than its callback is
781 # called before we get here, causing self._jobs to
782 # grow. To ensure correct results ordering, .insert is
783 # used (rather than .append) in the following line
784 self._jobs.insert(job_idx, job)
File ~/anaconda3/envs/assignment2/lib/python3.8/site-packages/joblib/_parallel_backends.py:208, in SequentialBackend.apply_async(self, func, callback)
206 def apply_async(self, func, callback=None):
207 """Schedule a func to be run"""
--> 208 result = ImmediateResult(func)
209 if callback:
210 callback(result)
File ~/anaconda3/envs/assignment2/lib/python3.8/site-packages/joblib/_parallel_backends.py:572, in ImmediateResult.__init__(self, batch)
569 def __init__(self, batch):
570 # Don't delay the application, to avoid keeping the input
571 # arguments in memory
--> 572 self.results = batch()
File ~/anaconda3/envs/assignment2/lib/python3.8/site-packages/joblib/parallel.py:262, in BatchedCalls.__call__(self)
258 def __call__(self):
259 # Set the default nested backend to self._backend but do not set the
260 # change the default number of processes to -1
261 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 262 return [func(*args, **kwargs)
263 for func, args, kwargs in self.items]
File ~/anaconda3/envs/assignment2/lib/python3.8/site-packages/joblib/parallel.py:262, in <listcomp>(.0)
258 def __call__(self):
259 # Set the default nested backend to self._backend but do not set the
260 # change the default number of processes to -1
261 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 262 return [func(*args, **kwargs)
263 for func, args, kwargs in self.items]
File ~/anaconda3/envs/assignment2/lib/python3.8/site-packages/sklearn/utils/fixes.py:216, in _FuncWrapper.__call__(self, *args, **kwargs)
214 def __call__(self, *args, **kwargs):
215 with config_context(**self.config):
--> 216 return self.function(*args, **kwargs)
File ~/anaconda3/envs/assignment2/lib/python3.8/site-packages/sklearn/ensemble/_base.py:42, in _fit_single_estimator(estimator, X, y, sample_weight, message_clsname, message)
40 else:
41 with _print_elapsed_time(message_clsname, message):
---> 42 estimator.fit(X, y)
43 return estimator
File ~/anaconda3/envs/assignment2/lib/python3.8/site-packages/sklearn/pipeline.py:390, in Pipeline.fit(self, X, y, **fit_params)
364 """Fit the model.
365
366 Fit all the transformers one after the other and transform the
(...)
387 Pipeline with fitted steps.
388 """
389 fit_params_steps = self._check_fit_params(**fit_params)
--> 390 Xt = self._fit(X, y, **fit_params_steps)
391 with _print_elapsed_time("Pipeline", self._log_message(len(self.steps) - 1)):
392 if self._final_estimator != "passthrough":
File ~/anaconda3/envs/assignment2/lib/python3.8/site-packages/sklearn/pipeline.py:348, in Pipeline._fit(self, X, y, **fit_params_steps)
346 cloned_transformer = clone(transformer)
347 # Fit or load from cache the current transformer
--> 348 X, fitted_transformer = fit_transform_one_cached(
349 cloned_transformer,
350 X,
351 y,
352 None,
353 message_clsname="Pipeline",
354 message=self._log_message(step_idx),
355 **fit_params_steps[name],
356 )
357 # Replace the transformer of the step with the fitted
358 # transformer. This is necessary when loading the transformer
359 # from the cache.
360 self.steps[step_idx] = (name, fitted_transformer)
File ~/anaconda3/envs/assignment2/lib/python3.8/site-packages/joblib/memory.py:349, in NotMemorizedFunc.__call__(self, *args, **kwargs)
348 def __call__(self, *args, **kwargs):
--> 349 return self.func(*args, **kwargs)
File ~/anaconda3/envs/assignment2/lib/python3.8/site-packages/sklearn/pipeline.py:893, in _fit_transform_one(transformer, X, y, weight, message_clsname, message, **fit_params)
891 with _print_elapsed_time(message_clsname, message):
892 if hasattr(transformer, "fit_transform"):
--> 893 res = transformer.fit_transform(X, y, **fit_params)
894 else:
895 res = transformer.fit(X, y, **fit_params).transform(X)
File ~/anaconda3/envs/assignment2/lib/python3.8/site-packages/sklearn/pipeline.py:434, in Pipeline.fit_transform(self, X, y, **fit_params)
432 fit_params_last_step = fit_params_steps[self.steps[-1][0]]
433 if hasattr(last_step, "fit_transform"):
--> 434 return last_step.fit_transform(Xt, y, **fit_params_last_step)
435 else:
436 return last_step.fit(Xt, y, **fit_params_last_step).transform(Xt)
File ~/anaconda3/envs/assignment2/lib/python3.8/site-packages/sklearn/compose/_column_transformer.py:672, in ColumnTransformer.fit_transform(self, X, y)
670 self._check_n_features(X, reset=True)
671 self._validate_transformers()
--> 672 self._validate_column_callables(X)
673 self._validate_remainder(X)
675 result = self._fit_transform(X, y, _fit_transform_one)
File ~/anaconda3/envs/assignment2/lib/python3.8/site-packages/sklearn/compose/_column_transformer.py:352, in ColumnTransformer._validate_column_callables(self, X)
350 columns = columns(X)
351 all_columns.append(columns)
--> 352 transformer_to_input_indices[name] = _get_column_indices(X, columns)
354 self._columns = all_columns
355 self._transformer_to_input_indices = transformer_to_input_indices
File ~/anaconda3/envs/assignment2/lib/python3.8/site-packages/sklearn/utils/__init__.py:411, in _get_column_indices(X, key)
409 all_columns = X.columns
410 except AttributeError:
--> 411 raise ValueError(
412 "Specifying the columns using strings is only "
413 "supported for pandas DataFrames"
414 )
415 if isinstance(key, str):
416 columns = [key]
ValueError: Specifying the columns using strings is only supported for pandas DataFrames
Full Pipeline Diagram
Your categorical pipeline chains two column transformers together. After the first one, the output is a numpy array, but then the second one cannot select transformers by column name as you've requested. Notice the final error message is more informative here, ValueError: Specifying the columns using strings is only supported for pandas DataFrames.
I'd suggest using one column transformer with separate pipelines instead of one pipeline with multiple columntransformers for this reason.
I'm trying to utilize make_pipeline() from scikit-learn along with GridSearchCV(). The Pipeline is simple and only includes two steps, a StandardScaler() and an MLPRegressor(). The GridSearchCV()is also pretty simple with the slight wrinkle that I'm using TimeSeriesSplit() for cross-validation.
The error I'm getting is as follows:
ValueError: Invalid parameter MLPRegressor for estimator Pipeline(steps=[('standardscaler', StandardScaler()),('mlpregressor', MLPRegressor())]). Check the list of available parameters with estimator.get_params().keys().
Can someone help me understand how I can rectify this problem so I can use the make_pipeline() framework with both GridSearchCV() and MLPRegressor() .
from sklearn.neural_network import MLPRegressor
...: from sklearn.preprocessing import StandardScaler
...: from sklearn.model_selection import TimeSeriesSplit, GridSearchCV
...: from sklearn.pipeline import make_pipeline
...: import numpy as np
In [2]: tscv = TimeSeriesSplit(n_splits = 5)
In [3]: pipe = make_pipeline(StandardScaler(), MLPRegressor())
In [4]: param_grid = {'MLPRegressor__hidden_layer_sizes': [(16,16,), (64,64,), (
...: 128,128,)], 'MLPRegressor__activation': ['identity', 'logistic', 'tanh',
...: 'relu'],'MLPRegressor__solver': ['adam','sgd']}
In [5]: grid = GridSearchCV(pipe, param_grid = param_grid, cv = tscv)
In [6]: features = np.random.random([1000,10])
In [7]: target = np.random.normal(0,10,1000)
In [8]: grid.fit(features, target)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-8-7233f9f2005e> in <module>
----> 1 grid.fit(features, target)
~/opt/miniconda3/envs/practice/lib/python3.9/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
~/opt/miniconda3/envs/practice/lib/python3.9/site-packages/sklearn/model_selection/_search.py in fit(self, X, y, groups, **fit_params)
839 return results
840
--> 841 self._run_search(evaluate_candidates)
842
843 # multimetric is determined here because in the case of a callable
~/opt/miniconda3/envs/practice/lib/python3.9/site-packages/sklearn/model_selection/_search.py in _run_search(self, evaluate_candidates)
1294 def _run_search(self, evaluate_candidates):
1295 """Search all candidates in param_grid"""
-> 1296 evaluate_candidates(ParameterGrid(self.param_grid))
1297
1298
~/opt/miniconda3/envs/practice/lib/python3.9/site-packages/sklearn/model_selection/_search.py in evaluate_candidates(candidate_params, cv, more_results)
793 n_splits, n_candidates, n_candidates * n_splits))
794
--> 795 out = parallel(delayed(_fit_and_score)(clone(base_estimator),
796 X, y,
797 train=train, test=test,
~/opt/miniconda3/envs/practice/lib/python3.9/site-packages/joblib/parallel.py in __call__(self, iterable)
1039 # remaining jobs.
1040 self._iterating = False
-> 1041 if self.dispatch_one_batch(iterator):
1042 self._iterating = self._original_iterator is not None
1043
~/opt/miniconda3/envs/practice/lib/python3.9/site-packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
857 return False
858 else:
--> 859 self._dispatch(tasks)
860 return True
861
~/opt/miniconda3/envs/practice/lib/python3.9/site-packages/joblib/parallel.py in _dispatch(self, batch)
775 with self._lock:
776 job_idx = len(self._jobs)
--> 777 job = self._backend.apply_async(batch, callback=cb)
778 # A job can complete so quickly than its callback is
779 # called before we get here, causing self._jobs to
~/opt/miniconda3/envs/practice/lib/python3.9/site-packages/joblib/_parallel_backends.py in apply_async(self, func, callback)
206 def apply_async(self, func, callback=None):
207 """Schedule a func to be run"""
--> 208 result = ImmediateResult(func)
209 if callback:
210 callback(result)
~/opt/miniconda3/envs/practice/lib/python3.9/site-packages/joblib/_parallel_backends.py in __init__(self, batch)
570 # Don't delay the application, to avoid keeping the input
571 # arguments in memory
--> 572 self.results = batch()
573
574 def get(self):
~/opt/miniconda3/envs/practice/lib/python3.9/site-packages/joblib/parallel.py in __call__(self)
260 # change the default number of processes to -1
261 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 262 return [func(*args, **kwargs)
263 for func, args, kwargs in self.items]
264
~/opt/miniconda3/envs/practice/lib/python3.9/site-packages/joblib/parallel.py in <listcomp>(.0)
260 # change the default number of processes to -1
261 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 262 return [func(*args, **kwargs)
263 for func, args, kwargs in self.items]
264
~/opt/miniconda3/envs/practice/lib/python3.9/site-packages/sklearn/utils/fixes.py in __call__(self, *args, **kwargs)
220 def __call__(self, *args, **kwargs):
221 with config_context(**self.config):
--> 222 return self.function(*args, **kwargs)
~/opt/miniconda3/envs/practice/lib/python3.9/site-packages/sklearn/model_selection/_validation.py in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, return_estimator, split_progress, candidate_progress, error_score)
584 cloned_parameters[k] = clone(v, safe=False)
585
--> 586 estimator = estimator.set_params(**cloned_parameters)
587
588 start_time = time.time()
~/opt/miniconda3/envs/practice/lib/python3.9/site-packages/sklearn/pipeline.py in set_params(self, **kwargs)
148 self
149 """
--> 150 self._set_params('steps', **kwargs)
151 return self
152
~/opt/miniconda3/envs/practice/lib/python3.9/site-packages/sklearn/utils/metaestimators.py in _set_params(self, attr, **params)
52 self._replace_estimator(attr, name, params.pop(name))
53 # 3. Step parameters and other initialisation arguments
---> 54 super().set_params(**params)
55 return self
56
~/opt/miniconda3/envs/practice/lib/python3.9/site-packages/sklearn/base.py in set_params(self, **params)
228 key, delim, sub_key = key.partition('__')
229 if key not in valid_params:
--> 230 raise ValueError('Invalid parameter %s for estimator %s. '
231 'Check the list of available parameters '
232 'with `estimator.get_params().keys()`.' %
ValueError: Invalid parameter MLPRegressor for estimator Pipeline(steps=[('standardscaler', StandardScaler()),
('mlpregressor', MLPRegressor())]). Check the list of available parameters with `estimator.get_params().keys()`.
Solution
Yes. Make the pipeline first. Then treat the pipeline as your model and pass it to GridSearchCV.
Your problem is in the following line (you had it mislabeled):
Replace MLPRegressor__ with mlpregressor__.
The Fix:
The pipeline named_step for MLPRegressor estimator was mislabeled as MLPRegressor__ in the param_grid.
Changing it to mlpregressor__ fixed the problem.
You may run and check it in this colab notebook.
# INCORRECT
param_grid = {
'MLPRegressor__hidden_layer_sizes': [(16, 16,), (64, 64,), (128, 128,)],
'MLPRegressor__activation': ['identity', 'logistic', 'tanh', 'relu'],
'MLPRegressor__solver': ['adam', 'sgd'],
}
# CORRECTED
param_grid = {
'mlpregressor__hidden_layer_sizes': [(16, 16,), (64, 64,), (128, 128,)],
'mlpregressor__activation': ['identity', 'logistic', 'tanh', 'relu'],
'mlpregressor__solver': ['adam', 'sgd'],
}
Note
The key to understand what was wrong here, was to observe the last two lines of the error stack.
ValueError: Invalid parameter MLPRegressor for estimator Pipeline(steps=[('standardscaler', StandardScaler()),
('mlpregressor', MLPRegressor())]). Check the list of available parameters with `estimator.get_params().keys()`.
My Code:
from sklearn.model_selection import GridSearchCV
from gensim.sklearn_api import W2VTransformer
from sklearn.metrics import accuracy_score, make_scorer
s_obj = W2VTransformer()
params_grid = {
'size': [100,200,300],
'window':[10,15,20],
'min_count': [1,2,3,4,5,6],
'workers': [10,20],
'sg':[0,1],
'negative': [2,3,4,6,5],
'sample':[1e-5]
}
s_model = GridSearchCV(s_obj, params_grid, cv=3,
scoring=make_scorer(accuracy_score))
s_model.fit(sentences)
print(s_model.best_params_)
Error is : " TypeError: _score() missing 1 required positional argument: 'y_true' "
PS: I reached the point that the error is showing something about
y_true i.e needed a labelled data then I am not having labelled data,
working on unsupervised learning, so if I am correct then do we have
any other library to tune the unsupervised model?
full traceback
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-230-37cafb83162e> in <module>
14
15 s_model = GridSearchCV(s_obj,params_grid,cv=3,scoring=make_scorer(accuracy_score))
---> 16 s_model.fit(train)
17
18 print(s_model.best_params_)
~/anaconda3/lib/python3.8/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
70 FutureWarning)
71 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72 return f(**kwargs)
73 return inner_f
74
~/anaconda3/lib/python3.8/site-packages/sklearn/model_selection/_search.py in fit(self, X, y, groups, **fit_params)
734 return results
735
--> 736 self._run_search(evaluate_candidates)
737
738 # For multi-metric evaluation, store the best_index_, best_params_ and
~/anaconda3/lib/python3.8/site-packages/sklearn/model_selection/_search.py in _run_search(self, evaluate_candidates)
1186 def _run_search(self, evaluate_candidates):
1187 """Search all candidates in param_grid"""
-> 1188 evaluate_candidates(ParameterGrid(self.param_grid))
1189
1190
~/anaconda3/lib/python3.8/site-packages/sklearn/model_selection/_search.py in evaluate_candidates(candidate_params)
706 n_splits, n_candidates, n_candidates * n_splits))
707
--> 708 out = parallel(delayed(_fit_and_score)(clone(base_estimator),
709 X, y,
710 train=train, test=test,
~/anaconda3/lib/python3.8/site-packages/joblib/parallel.py in __call__(self, iterable)
1039 # remaining jobs.
1040 self._iterating = False
-> 1041 if self.dispatch_one_batch(iterator):
1042 self._iterating = self._original_iterator is not None
1043
~/anaconda3/lib/python3.8/site-packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
857 return False
858 else:
--> 859 self._dispatch(tasks)
860 return True
861
~/anaconda3/lib/python3.8/site-packages/joblib/parallel.py in _dispatch(self, batch)
775 with self._lock:
776 job_idx = len(self._jobs)
--> 777 job = self._backend.apply_async(batch, callback=cb)
778 # A job can complete so quickly than its callback is
779 # called before we get here, causing self._jobs to
~/anaconda3/lib/python3.8/site-packages/joblib/_parallel_backends.py in apply_async(self, func, callback)
206 def apply_async(self, func, callback=None):
207 """Schedule a func to be run"""
--> 208 result = ImmediateResult(func)
209 if callback:
210 callback(result)
~/anaconda3/lib/python3.8/site-packages/joblib/_parallel_backends.py in __init__(self, batch)
570 # Don't delay the application, to avoid keeping the input
571 # arguments in memory
--> 572 self.results = batch()
573
574 def get(self):
~/anaconda3/lib/python3.8/site-packages/joblib/parallel.py in __call__(self)
260 # change the default number of processes to -1
261 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 262 return [func(*args, **kwargs)
263 for func, args, kwargs in self.items]
264
~/anaconda3/lib/python3.8/site-packages/joblib/parallel.py in <listcomp>(.0)
260 # change the default number of processes to -1
261 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 262 return [func(*args, **kwargs)
263 for func, args, kwargs in self.items]
264
~/anaconda3/lib/python3.8/site-packages/sklearn/model_selection/_validation.py in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, return_estimator, error_score)
558 else:
559 fit_time = time.time() - start_time
--> 560 test_scores = _score(estimator, X_test, y_test, scorer)
561 score_time = time.time() - start_time - fit_time
562 if return_train_score:
~/anaconda3/lib/python3.8/site-packages/sklearn/model_selection/_validation.py in _score(estimator, X_test, y_test, scorer)
603 scorer = _MultimetricScorer(**scorer)
604 if y_test is None:
--> 605 scores = scorer(estimator, X_test)
606 else:
607 scores = scorer(estimator, X_test, y_test)
~/anaconda3/lib/python3.8/site-packages/sklearn/metrics/_scorer.py in __call__(self, estimator, *args, **kwargs)
85 for name, scorer in self._scorers.items():
86 if isinstance(scorer, _BaseScorer):
---> 87 score = scorer._score(cached_call, estimator,
88 *args, **kwargs)
89 else:
TypeError: _score() missing 1 required positional argument: 'y_true'
Can anyone help me to solve this issue?
I'm trying to use TransformedTargetRegressor in a model pipeline and run a GridSearchCV on top of it.
Here is a minimal working example:
from sklearn.datasets import make_regression
from sklearn.ensemble import RandomForestRegressor
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.compose import TransformedTargetRegressor
X,y = make_regression()
model_pipe = Pipeline([
('model', TransformedTargetRegressor(RandomForestRegressor()))
])
params={'model__n_estimators': [1, 10, 50]}
model = GridSearchCV(model_pipe, param_grid= params)
model.fit(X,y)
This model results in the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-48-828bdf0e7ede> in <module>
17 model = GridSearchCV(model_pipe, param_grid= params)
18
---> 19 model.fit(X,y)
~/miniconda3/envs/gymbo/lib/python3.6/site-packages/sklearn/model_selection/_search.py in fit(self, X, y, groups, **fit_params)
686 return results
687
--> 688 self._run_search(evaluate_candidates)
689
690 # For multi-metric evaluation, store the best_index_, best_params_ and
~/miniconda3/envs/gymbo/lib/python3.6/site-packages/sklearn/model_selection/_search.py in _run_search(self, evaluate_candidates)
1147 def _run_search(self, evaluate_candidates):
1148 """Search all candidates in param_grid"""
-> 1149 evaluate_candidates(ParameterGrid(self.param_grid))
1150
1151
~/miniconda3/envs/gymbo/lib/python3.6/site-packages/sklearn/model_selection/_search.py in evaluate_candidates(candidate_params)
665 for parameters, (train, test)
666 in product(candidate_params,
--> 667 cv.split(X, y, groups)))
668
669 if len(out) < 1:
~/miniconda3/envs/gymbo/lib/python3.6/site-packages/joblib/parallel.py in __call__(self, iterable)
1001 # remaining jobs.
1002 self._iterating = False
-> 1003 if self.dispatch_one_batch(iterator):
1004 self._iterating = self._original_iterator is not None
1005
~/miniconda3/envs/gymbo/lib/python3.6/site-packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
832 return False
833 else:
--> 834 self._dispatch(tasks)
835 return True
836
~/miniconda3/envs/gymbo/lib/python3.6/site-packages/joblib/parallel.py in _dispatch(self, batch)
751 with self._lock:
752 job_idx = len(self._jobs)
--> 753 job = self._backend.apply_async(batch, callback=cb)
754 # A job can complete so quickly than its callback is
755 # called before we get here, causing self._jobs to
~/miniconda3/envs/gymbo/lib/python3.6/site-packages/joblib/_parallel_backends.py in apply_async(self, func, callback)
199 def apply_async(self, func, callback=None):
200 """Schedule a func to be run"""
--> 201 result = ImmediateResult(func)
202 if callback:
203 callback(result)
~/miniconda3/envs/gymbo/lib/python3.6/site-packages/joblib/_parallel_backends.py in __init__(self, batch)
580 # Don't delay the application, to avoid keeping the input
581 # arguments in memory
--> 582 self.results = batch()
583
584 def get(self):
~/miniconda3/envs/gymbo/lib/python3.6/site-packages/joblib/parallel.py in __call__(self)
254 with parallel_backend(self._backend, n_jobs=self._n_jobs):
255 return [func(*args, **kwargs)
--> 256 for func, args, kwargs in self.items]
257
258 def __len__(self):
~/miniconda3/envs/gymbo/lib/python3.6/site-packages/joblib/parallel.py in <listcomp>(.0)
254 with parallel_backend(self._backend, n_jobs=self._n_jobs):
255 return [func(*args, **kwargs)
--> 256 for func, args, kwargs in self.items]
257
258 def __len__(self):
~/miniconda3/envs/gymbo/lib/python3.6/site-packages/sklearn/model_selection/_validation.py in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, return_estimator, error_score)
501 train_scores = {}
502 if parameters is not None:
--> 503 estimator.set_params(**parameters)
504
505 start_time = time.time()
~/miniconda3/envs/gymbo/lib/python3.6/site-packages/sklearn/pipeline.py in set_params(self, **kwargs)
162 self
163 """
--> 164 self._set_params('steps', **kwargs)
165 return self
166
~/miniconda3/envs/gymbo/lib/python3.6/site-packages/sklearn/utils/metaestimators.py in _set_params(self, attr, **params)
48 self._replace_estimator(attr, name, params.pop(name))
49 # 3. Step parameters and other initialisation arguments
---> 50 super().set_params(**params)
51 return self
52
~/miniconda3/envs/gymbo/lib/python3.6/site-packages/sklearn/base.py in set_params(self, **params)
231
232 for key, sub_params in nested_params.items():
--> 233 valid_params[key].set_params(**sub_params)
234
235 return self
~/miniconda3/envs/gymbo/lib/python3.6/site-packages/sklearn/base.py in set_params(self, **params)
222 'Check the list of available parameters '
223 'with `estimator.get_params().keys()`.' %
--> 224 (key, self))
225
226 if delim:
ValueError: Invalid parameter n_estimators for estimator TransformedTargetRegressor(check_inverse=True, func=None, inverse_func=None,
regressor=RandomForestRegressor(bootstrap=True,
criterion='mse',
max_depth=None,
max_features='auto',
max_leaf_nodes=None,
min_impurity_decrease=0.0,
min_impurity_split=None,
min_samples_leaf=1,
min_samples_split=2,
min_weight_fraction_leaf=0.0,
n_estimators='warn',
n_jobs=None,
oob_score=False,
random_state=None,
verbose=0,
warm_start=False),
transformer=None). Check the list of available parameters with `estimator.get_params().keys()`.
This model runs when I remove TransformedTargetRegressor from the pipeline and just pass the random forest. Why is this? How can I use TransformedTargetRegressor in a pipeline as I have shown above?
The RandomForestRegressor is stored as regressor param in TransformedTargetRegressor.
Hence, the right way to define the params for GridSearchCV is
params={'model__regressor__n_estimators': [1, 10, 50]}
Seems like people are having issues with zeros in y. Consider the following using log1p and expm1. See another worked example here
X,y = make_regression()
model_pipe = Pipeline([
('model', TransformedTargetRegressor(regressor=RandomForestRegressor(),
func=np.log1p,
inverse_func=np.expm1))
])
params={'model__regressor__n_estimators': [1, 10, 50]}
model = GridSearchCV(model_pipe, param_grid= params)
model.fit(X,y)
I've found out the answer. The TransformedTargetregressor needs to be applied to the grid search estimator as so
from sklearn.datasets import make_regression
from sklearn.ensemble import RandomForestRegressor
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.compose import TransformedTargetRegressor
X,y = make_regression()
model_pipe = Pipeline([
('model', RandomForestRegressor())
])
params={'model__n_estimators': [1, 10, 50]}
model = TransformedTargetRegressor(GridSearchCV(model_pipe, param_grid= params), func=np.log, inverse_func=np.exp)
model.fit(X,y)
I am trying to find the best hyperparameters for my SVM using Grid Search. When doing it the following way:
from sklearn.model_selection import GridSearchCV
param_grid = {'coef0': [10, 5, 0.5, 0.001], 'C': [100, 50, 1, 0.001]}
poly_svm_search = SVC(kernel="poly", degree="2")
grid_search = GridSearchCV(poly_svm_search, param_grid, cv=5, scoring='f1')
grid_search.fit(train_data, train_labels)
I get this error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-72-dadf5782618c> in <module>
8
----> 9 grid_search.fit(train_data, train_labels)
~/.local/lib/python3.6/site-packages/sklearn/model_selection/_search.py in fit(self, X, y, groups, **fit_params)
720 return results_container[0]
721
--> 722 self._run_search(evaluate_candidates)
723
724 results = results_container[0]
~/.local/lib/python3.6/site-packages/sklearn/model_selection/_search.py in _run_search(self, evaluate_candidates)
1189 def _run_search(self, evaluate_candidates):
1190 """Search all candidates in param_grid"""
-> 1191 evaluate_candidates(ParameterGrid(self.param_grid))
1192
1193
~/.local/lib/python3.6/site-packages/sklearn/model_selection/_search.py in evaluate_candidates(candidate_params)
709 for parameters, (train, test)
710 in product(candidate_params,
--> 711 cv.split(X, y, groups)))
712
713 all_candidate_params.extend(candidate_params)
~/.local/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __call__(self, iterable)
981 # remaining jobs.
982 self._iterating = False
--> 983 if self.dispatch_one_batch(iterator):
984 self._iterating = self._original_iterator is not None
985
~/.local/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in dispatch_one_batch(self, iterator)
823 return False
824 else:
--> 825 self._dispatch(tasks)
826 return True
827
~/.local/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in _dispatch(self, batch)
780 with self._lock:
781 job_idx = len(self._jobs)
--> 782 job = self._backend.apply_async(batch, callback=cb)
783 # A job can complete so quickly than its callback is
784 # called before we get here, causing self._jobs to
~/.local/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py in apply_async(self, func, callback)
180 def apply_async(self, func, callback=None):
181 """Schedule a func to be run"""
--> 182 result = ImmediateResult(func)
183 if callback:
184 callback(result)
~/.local/lib/python3.6/site-packages/sklearn/externals/joblib/_parallel_backends.py in __init__(self, batch)
543 # Don't delay the application, to avoid keeping the input
544 # arguments in memory
--> 545 self.results = batch()
546
547 def get(self):
~/.local/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in __call__(self)
259 with parallel_backend(self._backend):
260 return [func(*args, **kwargs)
--> 261 for func, args, kwargs in self.items]
262
263 def __len__(self):
~/.local/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py in <listcomp>(.0)
259 with parallel_backend(self._backend):
260 return [func(*args, **kwargs)
--> 261 for func, args, kwargs in self.items]
262
263 def __len__(self):
~/.local/lib/python3.6/site-packages/sklearn/model_selection/_validation.py in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, return_estimator, error_score)
526 estimator.fit(X_train, **fit_params)
527 else:
--> 528 estimator.fit(X_train, y_train, **fit_params)
529
530 except Exception as e:
~/.local/lib/python3.6/site-packages/sklearn/svm/base.py in fit(self, X, y, sample_weight)
210
211 seed = rnd.randint(np.iinfo('i').max)
--> 212 fit(X, y, sample_weight, solver_type, kernel, random_seed=seed)
213 # see comment on the other call to np.iinfo in this file
214
~/.local/lib/python3.6/site-packages/sklearn/svm/base.py in _sparse_fit(self, X, y, sample_weight, solver_type, kernel, random_seed)
291 sample_weight, self.nu, self.cache_size, self.epsilon,
292 int(self.shrinking), int(self.probability), self.max_iter,
--> 293 random_seed)
294
295 self._warn_from_fit_status()
sklearn/svm/libsvm_sparse.pyx in sklearn.svm.libsvm_sparse.libsvm_sparse_train()
TypeError: an integer is required
My train_labels variable contains a list of booleans, so I have a binary classification problem. train_data is a <class'scipy.sparse.csr.csr_matrix'>, basically containing all scaled and One-Hot encoded features.
What did I do wrong? It's hard for me to track down what the issue is here. I thank you for any help in advance ;).
When you initialize the SVC using this line:
poly_svm_search = SVC(kernel="poly", degree="2")
You are supplying degree param with a string, due to inverted commas around it. But according to the documentation, degree takes an integer as value.
degree : int, optional (default=3) Degree of the polynomial kernel
function (‘poly’). Ignored by all other kernels.
So you need to do this:
poly_svm_search = SVC(kernel="poly", degree=2)
Notice how I did not use inverted commas here.