How to slice a XGBClassifier/XGBRegressor model into sub-models? - python

This document shows that a XGBoost API trained model can be sliced by following code:
from sklearn.datasets import make_classification
import xgboost as xgb
booster = xgb.train({
'num_parallel_tree': 4, 'subsample': 0.5, 'num_class': 3},
num_boost_round=num_boost_round, dtrain=dtrain)
sliced: xgb.Booster = booster[3:7]
I tried it and it worked.
Since XGBoost provides Scikit-Learn Wrapper interface, I tried something like this:
from xgboost import XGBClassifier
clf_xgb = XGBClassifier().fit(X_train, y_train)
clf_xgb_sliced: clf_xgb.Booster = booster[3:7]
But got following error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-18-84155815d877> in <module>
----> 1 clf_xgb_sliced: clf_xgb.Booster = booster[3:7]
AttributeError: 'XGBClassifier' object has no attribute 'Booster'
Since XGBClassifier has no attribute 'Booster', is there any way to slice a Scikit-Learn Wrapper interface trained XGBClassifier(/XGBRegressor) model?

The problem is with the type hint you are giving clf_xgb.Booster which does not match an existing argument. Try:
clf_xgb_sliced: xgb.Booster = clf_xgb.get_booster()[3:7]
instead.

Related

AttributeError: 'KMeans' object has no attribute 'setK'

Example from https://runawayhorse001.github.io/LearningApacheSpark/clustering.html
caused strange error while I decided to test the clustering example for Spark.
Example:
from sklearn.cluster import KMeans
import numpy as np
cost = np.zeros(20)
for k in range(2,20):
kmeans = KMeans()\
.setK(k)\
.setSeed(1) \
.setFeaturesCol("indexedFeatures")\
.setPredictionCol("cluster")
model = kmeans.fit(data)
cost[k] = model.computeCost(data)
And it caused Error in Kmeans attributes despite of fit already implemented.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-22-296a7d54514a> in <module>
2 cost = np.zeros(20)
3 for k in range(2,20):
----> 4 kmeans = KMeans()\
5 .setK(k)\
6 .setSeed(1) \
AttributeError: 'KMeans' object has no attribute 'setK'
I had similar issues in the past and .fit() solved them, but now it is not working.
You're importing the wrong KMeans. I believe that KMeans refer to the one in Spark ML, not in scikit-learn.
from pyspark.ml.clustering import KMeans

PySpark logistic regression missing method

I am new to Pyspark. I am using logistic regression API. I followed some tutorials and worked this way :
from pyspark.ml.classification import LogisticRegression
train, test = df.randomSplit([0.80, 0.20], seed = some_seed)
LR = LogisticRegression(featuresCol = 'features', labelCol = 'label', maxIter=some_iter)
LR_model = LR.fit(train)
When I call
trainingSummary = LR_model.summary
trainingSummary.roc
I get
--------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-319-bf79768ab64e> in <module>()
1 trainingSummary = LR_model.summary
2
----> 3 trainingSummary.roc
AttributeError: 'LogisticRegressionTrainingSummary' object has no attribute 'roc'
Someone has an idea ?

Turi Create Error : 'module' object not callable

I am trying to implement nearest neighbor classifier in Turi Create, however I am unsure of this error I am getting. This error occurs when I create the actual model. I am using python 3.6 if that helps.
Error:
Traceback (most recent call last):
File "/Users/PycharmProjects/turi/turi.py", line 51, in <module>
iris_cross()
File "/Users/PycharmProjects/turi/turi.py", line 37, in iris_cross
clf = tc.nearest_neighbor_classifier(train_data, target='4', features=features)
TypeError: 'module' object is not callable
Code:
import turicreate as tc
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn import datasets
import time
import numpy as np
#Iris Classification Cross Validation
def iris_cross():
iris = datasets.load_iris()
features = ['0','1','2','3']
target = iris.target_names
x = iris.data
y = iris.target.astype(int)
undata = np.column_stack((x,y))
data = tc.SFrame(pd.DataFrame(undata))
print(data)
train_data, test_data = data.random_split(.8)
clf = tc.nearest_neighbor_classifier(train_data, target='4', features=features)
print('done')
iris_cross()
You have to actually call the create() method of the nearest_neighbor_classifier. See the library API.
Run the following line of code instead:
clf = tc.nearest_neighbor_classifier.create(train_data, target='4', features=features)

coremltools cannot convert XGBoost classifier into coreML model

I have a question about coremltools.
I want to convert trained xgboost classifier model into coreML Model.
import coremltools
import xgboost as xgb
X, y = get_data()
xgb_model = xgb.XGBClassifier()
xib_model.train(X, y)
coreml_model = coremltools.converters.xgboost.convert(xgb_model)
coremltools.save('my_model.mlmodel')
Error is as follows:
>>> coremltools.converters.xgboost.convert(xgb_model)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/karas/.pyenv/versions/anaconda2-4.3.0/lib/python2.7/site-packages/coremltools/converters/xgboost/_tree.py", line 51, in convert
return _MLModel(_convert_tree_ensemble(model, feature_names, target, force_32bit_float = force_32bit_float))
File "/Users/karas/.pyenv/versions/anaconda2-4.3.0/lib/python2.7/site-packages/coremltools/converters/xgboost/_tree_ensemble.py", line 143, in convert_tree_ensemble
raise TypeError("Unexpected type. Expecting XGBoost model.")
TypeError: Unexpected type. Expecting XGBoost model.
Quick solution:
coreml_model = coremltools.converters.xgboost.convert(xgb_model._Booster)
More about this converter:
I just encountered this problem so I debug into _tree_ensemble.py and here is what I found:
The first parameter 'model' should be _xgboost.core.Booster or _xgboost.XGBRegressor or the path of .json file of the previous two data. Also, if you use the .json file, the second parameter feature_names must be provided.
Plus, according to the python examples on github, there is another way you can get a model:
import numpy as np
import scipy.sparse
import pickle
import xgboost as xgb
### simple example
# load file from text file, also binary buffer generated by xgboost
dtrain = xgb.DMatrix('../data/agaricus.txt.train')
dtest = xgb.DMatrix('../data/agaricus.txt.test')
# specify parameters via map, definition are same as c++ version
param = {'max_depth':2, 'eta':1, 'silent':1, 'objective':'binary:logistic'}
# specify validations set to watch performance
watchlist = [(dtest, 'eval'), (dtrain, 'train')]
num_round = 2
booster = xgb.train(param, dtrain, num_round, watchlist)
Note the booster here is _xgboost.core.Booster.
Then you can do
import coremltools
coreml_model = coremltools.converters.xgboost.convert(booster)
coreml_model.save('my_model.mlmodel')

AttributeError: LinearRegression object has no attribute 'coef_'

I've been attempting to fit this data by a Linear Regression, following a tutorial on bigdataexaminer. Everything was working fine up until this point. I imported LinearRegression from sklearn, and printed the number of coefficients just fine. This was the code before I attempted to grab the coefficients from the console.
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import sklearn
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression
boston = load_boston()
bos = pd.DataFrame(boston.data)
bos.columns = boston.feature_names
bos['PRICE'] = boston.target
X = bos.drop('PRICE', axis = 1)
lm = LinearRegression()
After I had all this set up I ran the following command, and it returned the proper output:
In [68]: print('Number of coefficients:', len(lm.coef_)
Number of coefficients: 13
However, now if I ever try to print this same line again, or use 'lm.coef_', it tells me coef_ isn't an attribute of LinearRegression, right after I JUST used it successfully, and I didn't touch any of the code before I tried it again.
In [70]: print('Number of coefficients:', len(lm.coef_))
Traceback (most recent call last):
File "<ipython-input-70-5ad192630df3>", line 1, in <module>
print('Number of coefficients:', len(lm.coef_))
AttributeError: 'LinearRegression' object has no attribute 'coef_'
The coef_ attribute is created when the fit() method is called. Before that, it will be undefined:
>>> import numpy as np
>>> import pandas as pd
>>> from sklearn.datasets import load_boston
>>> from sklearn.linear_model import LinearRegression
>>> boston = load_boston()
>>> lm = LinearRegression()
>>> lm.coef_
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-22-975676802622> in <module>()
7
8 lm = LinearRegression()
----> 9 lm.coef_
AttributeError: 'LinearRegression' object has no attribute 'coef_'
If we call fit(), the coefficients will be defined:
>>> lm.fit(boston.data, boston.target)
>>> lm.coef_
array([ -1.07170557e-01, 4.63952195e-02, 2.08602395e-02,
2.68856140e+00, -1.77957587e+01, 3.80475246e+00,
7.51061703e-04, -1.47575880e+00, 3.05655038e-01,
-1.23293463e-02, -9.53463555e-01, 9.39251272e-03,
-5.25466633e-01])
My guess is that somehow you forgot to call fit() when you ran the problematic line.
I also got the same problem while dealing with linear regression the problem object has no attribute 'coef'.
There are just slight changes in the syntax only.
linreg = LinearRegression()
linreg.fit(X,y) # fit the linesr model to the data
print(linreg.intercept_)
print(linreg.coef_)
I Hope this will help you Thanks

Categories

Resources