I am getting the following error when i perform classification of new data with the following command in Python:
classifier.predict(new_data)
AttributeError: python 'SVC' object has no attribute _dual_coef_
In my laptop though, the command works fine! What's wrong?
I had this exact error
AttributeError: python 'SVC' object has no attribute _dual_coef_
with a model trained using scikit-learn version 0.15.2, when I tried to run it in scikit-learn version 0.16.1. I did solve it by re-training the model in the latest scikit-learn 0.16.1.
Make sure you are loading the right version of the package.
Have you loaded the model based on which you try to predict?
In this case it can be a version conflict, try to re-learn the model using the same sklearn version.
You can see a similar problem here: Sklearn error: 'SVR' object has no attribute '_impl'
I had the same problem,I use Sklearn version 0.23.02 but I was trying to run an archive trained with a version 0.18... and my error said: "'SVC' object has no attribute 'break_ties'", I just retrained the model with my version and fix the problem I generate other svc.pickle to run with the 0.23.02 version and replace the oldie.
"""
X = X_train
y = y_train
"""
X = X_test
y = y_test
# Instantiate and train the classifier
from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier(n_neighbors=1)
clf.fit(X, y)
# Check the results using metrics
from sklearn import metrics
y_pred = clf.predict(X)
print(metrics.confusion_matrix(y_pred, y))
Related
I used Decision Tree from sklearn, normally there is log_loss
classifier = DecisionTreeClassifier(random_state = 42,class_weight ='balanced' ,criterion='log_loss')
classifier.fit(X_train, y_train)
error :
KeyError: 'log_loss'
The log_loss option for the parameter criterion was added only in the latest scikit-learn version 1.1.2:
criterion{“gini”, “entropy”, “log_loss”}, default=”gini”
It is not there in either of the two previous ones, version 1.0.2 or version 0.24.2:
criterion{“gini”, “entropy”}, default=”gini”
The error suggests that you are using an older version; you can check your scikit-learn version with
import sklearn
print(sklearn.__version__)
So, you will need to upgrade scikit-learn to v1.1.2.
log_loss criterion is applicable for the case when we have 2 classes in our target column.
Otherwise, if we have more than 2 classes then we can use entropy as our criterion for keeping the same impurity measure.
I built a XGBClassifier model using Xgboost 1.4.2 version and saved in S3 in pickle format.
from xgboost import XGBClassifier
xgb_model = XGBClassifier()
xgb_model.fit(x_Traintfidf, y_Train)
xgb_predictions = xgb_model.predict(x_Testtfidf)
xgb_predictions = [round(value) for value in xgb_predictions]
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_Test.to_list(), xgb_predictions)
print("Accuracy: %.2f%%" % (accuracy * 100.0))
# Save model to s3 as pickle file.
Next, I read back in the pickled model from s3 and when I try to do predictions, it throws the error:
AttributeError: 'XGBModel' object has no attribute
'enable_categorical'
I have a tf-idf transformed matrix, I am passing in to get predictions.
Any idea why I get the error above that when I unpickle the model and do predictions?
You might want to double check the xgboost version in your virtual env using pip list | grep xgboost to make sure its actually 1.4.2.
As mentioned by #bill-the-lizard, enable_categorical is new in version 1.5.0 of XGBoost, so you will correctly receive the error mentioned in your question at version 1.4.x
when I'm using:
gb_explainer = shap.TreeExplainer
I get this error:
AttributeError: module 'shap' has no attribute 'TreeExplainer'
The full code:
def create_shap_tree_explainer(self):
self.gb_explainer = shap.TreeExplainer(self.gb_model)
self.shap_values_X_test = self.gb_explainer.shap_values(self.X_test)
self.shap_values_X_train = self.gb_explainer.shap_values(self.X_train)
The gradient boosting classifier model is:
gbc_model = Create_Gradient_Boosting_Classifier(X_train, y_train, ps)
Which SHAP do you use?
Please check it.
print(shap.__version__)
Also, did you install SHAP via pip or conda? Where your python access when you run the script? I think, after such checks you will get the answer what is going on.
I'm using the show_prediction function in the eli5 package to understand how my XGBoost classifier arrived at a prediction. For some reason I seem to be getting a regression score instead of a probability for my model.
Below is a fully reproducible example with a public dataset.
from sklearn.datasets import load_breast_cancer
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from eli5 import show_prediction
# Load dataset
data = load_breast_cancer()
# Organize our data
label_names = data['target_names']
labels = data['target']
feature_names = data['feature_names']
features = data['data']
# Split the data
train, test, train_labels, test_labels = train_test_split(
features,
labels,
test_size=0.33,
random_state=42
)
# Define the model
xgb_model = XGBClassifier(
n_jobs=16,
eval_metric='auc'
)
# Train the model
xgb_model.fit(
train,
train_labels
)
show_prediction(xgb_model.get_booster(), test[0], show_feature_values=True, feature_names=feature_names)
This gives me the following result. Note the score of 3.7, which is definitely not a probability.
The official eli5 documentation correctly shows a probability though.
The missing probability seems to be related to my use of xgb_model.get_booster(). Looks like the official documentation doesn't use that and passes the model as-is instead, but when I do that I get TypeError: 'str' object is not callable, so that doesn't seem to be an option.
I'm also concerned that eli5 is not explaining the prediction by traversing the xgboost trees. It appears that the "score" I'm getting is actually just a sum of all the feature contributions, like I would expect if eli5 wasn't actually traversing the tree but fitting a linear model instead. Is that true? How can I also make eli5 traverse the tree?
Fixed my own problem. According to this Github Issue eli5 only supports an older version of XGBoost (<=0.6). I was using XGBoost version 0.80 and eli5 version 0.8.
Posting the solution from the issue:
import eli5
from xgboost import XGBClassifier, XGBRegressor
def _check_booster_args(xgb, is_regression=None):
# type: (Any, bool) -> Tuple[Booster, bool]
if isinstance(xgb, eli5.xgboost.Booster): # patch (from "xgb, Booster")
booster = xgb
else:
booster = xgb.get_booster() # patch (from "xgb.booster()" where `booster` is now a string)
_is_regression = isinstance(xgb, XGBRegressor)
if is_regression is not None and is_regression != _is_regression:
raise ValueError(
'Inconsistent is_regression={} passed. '
'You don\'t have to pass it when using scikit-learn API'
.format(is_regression))
is_regression = _is_regression
return booster, is_regression
eli5.xgboost._check_booster_args = _check_booster_args
And then replacing the last line of my question's code snippet with:
show_prediction(xgb_model, test[0], show_feature_values=True, feature_names=feature_names)
fixed my problem.
I am using python 3.5 with tensorflow 0.11 and sklearn 0.18.
I wrote a simple example code to calculate the cross-validation score with iris data using tensorflow. I used the skflow as the wrapper.
import tensorflow.contrib.learn as skflow
from sklearn import datasets
from sklearn import cross_validation
iris=datasets.load_iris()
feature_columns = skflow.infer_real_valued_columns_from_input(iris.data)
classifier = skflow.DNNClassifier(hidden_units=[10, 10, 10], n_classes=3, feature_columns=feature_columns)
print(cross_validation.cross_val_score(classifier, iris.data, iris.target, cv=2, scoring = 'accuracy'))
But I got an error like below. It seems that skflow is not compatible with cross_val_score of sklearn.
TypeError: Cannot clone object '' (type ): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' methods.
Is there any other way to deal with this problem?