'XGBModel' object 'enable_categorical' - Attribute Error - python

I built a XGBClassifier model using Xgboost 1.4.2 version and saved in S3 in pickle format.
from xgboost import XGBClassifier
xgb_model = XGBClassifier()
xgb_model.fit(x_Traintfidf, y_Train)
xgb_predictions = xgb_model.predict(x_Testtfidf)
xgb_predictions = [round(value) for value in xgb_predictions]
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_Test.to_list(), xgb_predictions)
print("Accuracy: %.2f%%" % (accuracy * 100.0))
# Save model to s3 as pickle file.
Next, I read back in the pickled model from s3 and when I try to do predictions, it throws the error:
AttributeError: 'XGBModel' object has no attribute
'enable_categorical'
I have a tf-idf transformed matrix, I am passing in to get predictions.
Any idea why I get the error above that when I unpickle the model and do predictions?

You might want to double check the xgboost version in your virtual env using pip list | grep xgboost to make sure its actually 1.4.2.

As mentioned by #bill-the-lizard, enable_categorical is new in version 1.5.0 of XGBoost, so you will correctly receive the error mentioned in your question at version 1.4.x

Related

Why does sklearn say log_loss is not supported for SGDclassifier while it does say so in the docs? [duplicate]

I used Decision Tree from sklearn, normally there is log_loss
classifier = DecisionTreeClassifier(random_state = 42,class_weight ='balanced' ,criterion='log_loss')
classifier.fit(X_train, y_train)
error :
KeyError: 'log_loss'
The log_loss option for the parameter criterion was added only in the latest scikit-learn version 1.1.2:
criterion{“gini”, “entropy”, “log_loss”}, default=”gini”
It is not there in either of the two previous ones, version 1.0.2 or version 0.24.2:
criterion{“gini”, “entropy”}, default=”gini”
The error suggests that you are using an older version; you can check your scikit-learn version with
import sklearn
print(sklearn.__version__)
So, you will need to upgrade scikit-learn to v1.1.2.
log_loss criterion is applicable for the case when we have 2 classes in our target column.
Otherwise, if we have more than 2 classes then we can use entropy as our criterion for keeping the same impurity measure.

Unable to open pickled Sagemaker XGBoost model

I'm trying to open a pickled XGBoost model I created in AWS Sagemaker to look at feature importances in the model. I'm trying to follow the answers in this post. However, I get an the error shown below. When I try to call Booster.save_model, I get an error saying 'Estimator' object has no attribute 'save_model'. How can I resolve this?
# Build initial model
sess = sagemaker.Session()
s3_input_train = sagemaker.s3_input(s3_data='s3://{}/{}/train/'.format(bucket, prefix), content_type='csv')
xgb_cont = get_image_uri(region, 'xgboost', repo_version='0.90-1')
xgb = sagemaker.estimator.Estimator(xgb_cont, role, train_instance_count=1, train_instance_type='ml.m4.4xlarge',
output_path='s3://{}/{}'.format(bucket, prefix), sagemaker_session=sess)
xgb.set_hyperparameters(eval_metric='rmse', objective='reg:squarederror', num_round=100)
ts = strftime("%Y-%m-%d-%H-%M-%S", gmtime())
xgb_name = 'xgb-initial-' + ts
xgb.set_hyperparameters(eta=0.1, alpha=0.5, max_depth=10)
xgb.fit({'train': s3_input_train}, job_name=xgb_name)
# Load model to get feature importances
model_path = 's3://{}/{}//output/model.tar.gz'.format(bucket, prefix, xgb_name)
fs = s3fs.S3FileSystem()
with fs.open(model_path, 'rb') as f:
with tarfile.open(fileobj=f, mode='r') as tar_f:
with tar_f.extractfile('xgboost-model') as extracted_f:
model = pickle.load(extracted_f)
XGBoostError: [19:16:42] /workspace/src/learner.cc:682: Check failed: header == serialisation_header_:
If you are loading a serialized model (like pickle in Python) generated by older
XGBoost, please export the model by calling `Booster.save_model` from that version
first, then load it back in current version. There's a simple script for helping
the process. See:
https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html
for reference to the script, and more details about differences between saving model and
serializing.
Which version of XGBoost are you using in the notebook? The model format has changed in XGBoost 1.0. See https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html. Short version: if you're using 1.0 in the notebook, you can't load a pickled model.
Here's a working example using XGBoost in script mode (which is much more flexible than the built in algo):
https://gitlab.com/juliensimon/dlnotebooks/-/blob/master/sagemaker/09-XGBoost-script-mode.ipynb
https://gitlab.com/juliensimon/dlnotebooks/-/blob/master/sagemaker/xgb.py

eli5 show_prediction not showing probability

I'm using the show_prediction function in the eli5 package to understand how my XGBoost classifier arrived at a prediction. For some reason I seem to be getting a regression score instead of a probability for my model.
Below is a fully reproducible example with a public dataset.
from sklearn.datasets import load_breast_cancer
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from eli5 import show_prediction
# Load dataset
data = load_breast_cancer()
# Organize our data
label_names = data['target_names']
labels = data['target']
feature_names = data['feature_names']
features = data['data']
# Split the data
train, test, train_labels, test_labels = train_test_split(
features,
labels,
test_size=0.33,
random_state=42
)
# Define the model
xgb_model = XGBClassifier(
n_jobs=16,
eval_metric='auc'
)
# Train the model
xgb_model.fit(
train,
train_labels
)
show_prediction(xgb_model.get_booster(), test[0], show_feature_values=True, feature_names=feature_names)
This gives me the following result. Note the score of 3.7, which is definitely not a probability.
The official eli5 documentation correctly shows a probability though.
The missing probability seems to be related to my use of xgb_model.get_booster(). Looks like the official documentation doesn't use that and passes the model as-is instead, but when I do that I get TypeError: 'str' object is not callable, so that doesn't seem to be an option.
I'm also concerned that eli5 is not explaining the prediction by traversing the xgboost trees. It appears that the "score" I'm getting is actually just a sum of all the feature contributions, like I would expect if eli5 wasn't actually traversing the tree but fitting a linear model instead. Is that true? How can I also make eli5 traverse the tree?
Fixed my own problem. According to this Github Issue eli5 only supports an older version of XGBoost (<=0.6). I was using XGBoost version 0.80 and eli5 version 0.8.
Posting the solution from the issue:
import eli5
from xgboost import XGBClassifier, XGBRegressor
def _check_booster_args(xgb, is_regression=None):
# type: (Any, bool) -> Tuple[Booster, bool]
if isinstance(xgb, eli5.xgboost.Booster): # patch (from "xgb, Booster")
booster = xgb
else:
booster = xgb.get_booster() # patch (from "xgb.booster()" where `booster` is now a string)
_is_regression = isinstance(xgb, XGBRegressor)
if is_regression is not None and is_regression != _is_regression:
raise ValueError(
'Inconsistent is_regression={} passed. '
'You don\'t have to pass it when using scikit-learn API'
.format(is_regression))
is_regression = _is_regression
return booster, is_regression
eli5.xgboost._check_booster_args = _check_booster_args
And then replacing the last line of my question's code snippet with:
show_prediction(xgb_model, test[0], show_feature_values=True, feature_names=feature_names)
fixed my problem.

cross_val_score fails with tensorflow(skflow)

I am using python 3.5 with tensorflow 0.11 and sklearn 0.18.
I wrote a simple example code to calculate the cross-validation score with iris data using tensorflow. I used the skflow as the wrapper.
import tensorflow.contrib.learn as skflow
from sklearn import datasets
from sklearn import cross_validation
iris=datasets.load_iris()
feature_columns = skflow.infer_real_valued_columns_from_input(iris.data)
classifier = skflow.DNNClassifier(hidden_units=[10, 10, 10], n_classes=3, feature_columns=feature_columns)
print(cross_validation.cross_val_score(classifier, iris.data, iris.target, cv=2, scoring = 'accuracy'))
But I got an error like below. It seems that skflow is not compatible with cross_val_score of sklearn.
TypeError: Cannot clone object '' (type ): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' methods.
Is there any other way to deal with this problem?

Python error in SVM classifier.predict()

I am getting the following error when i perform classification of new data with the following command in Python:
classifier.predict(new_data)
AttributeError: python 'SVC' object has no attribute _dual_coef_
In my laptop though, the command works fine! What's wrong?
I had this exact error
AttributeError: python 'SVC' object has no attribute _dual_coef_
with a model trained using scikit-learn version 0.15.2, when I tried to run it in scikit-learn version 0.16.1. I did solve it by re-training the model in the latest scikit-learn 0.16.1.
Make sure you are loading the right version of the package.
Have you loaded the model based on which you try to predict?
In this case it can be a version conflict, try to re-learn the model using the same sklearn version.
You can see a similar problem here: Sklearn error: 'SVR' object has no attribute '_impl'
I had the same problem,I use Sklearn version 0.23.02 but I was trying to run an archive trained with a version 0.18... and my error said: "'SVC' object has no attribute 'break_ties'", I just retrained the model with my version and fix the problem I generate other svc.pickle to run with the 0.23.02 version and replace the oldie.
"""
X = X_train
y = y_train
"""
X = X_test
y = y_test
# Instantiate and train the classifier
from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier(n_neighbors=1)
clf.fit(X, y)
# Check the results using metrics
from sklearn import metrics
y_pred = clf.predict(X)
print(metrics.confusion_matrix(y_pred, y))

Categories

Resources