How can i combine xgboost with adaboost? - python

I have combined random forest with adaboost as
clf = AdaBoostClassifier(n_estimators=10, base_estimator=RandomForestClassifier(n_estimators=10,max_depth=20))
now i want to combine adaboost with xgboost and i have tried like this:
from sklearn.ensemble import AdaBoostClassifier
from xgboost import XGBClassifier
clf = AdaBoostClassifier(base_estimator=XGBClassifier(eval_metric='mlogloss'))
and it is not working correctly. How to do this?

use would just use it like this
import lib1, lib2, lib3, lib4, lib5

Related

Doing a ShuffleSplit with GridSearchCV

I'm trying to do a ShuffleSplit with a GridSearchCV in scikit-learn.
Here's my MWE, which is a modification to what one can find in the Deep Learning with Python book, by François Chollet. In the book, he doesn't use scikit-learn.
from keras import models
from keras import layers
import numpy as np
from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import ShuffleSplit
from keras.datasets import boston_housing
(train_data,train_targets),(test_data,test_targets)=boston_housing.load_data()
mean=np.mean(train_data)
std=np.std(train_data)
train_data_norm=(train_data-mean)/std
test_data_norm=(test_data-mean)/std
def build_model():
model=models.Sequential()
model.add(layers.Dense(64,activation="relu",
input_shape=(train_data_norm.shape[1],)))
model.add(layers.Dense(64,activation="relu"))
model.add(layers.Dense(1))
model.compile(optimizer='rmsprop',loss="mse",metrics=["mae"])
return model
model=KerasRegressor(build_fn=build_model,epochs=30,verbose=0)
param_grid = {"epochs":range(1,11)}
ss = ShuffleSplit(n_splits=4, test_size=0.1, random_state=0)
grid_model=GridSearchCV(model,param_grid,cv=ss,n_jobs=-1,scoring='neg_mean_squared_error')
grid_model.fit(train_data, train_targets)
mean_squared_error(grid_model.predict(test_data),test_targets)
One thing I find strange is that I when using ShuffleSplit, I have to define again the size of my test data, which I'll apply only to (train_data,train_targets) when fitting the model. Also, I thought that using ShuffleSplit would stabilise the MSE prediction performance, when compared to a simple CV, but the opposite happens. If I use
grid_model=GridSearchCV(model,param_grid,cv=4,n_jobs=-1,scoring='neg_mean_squared_error')
instead, then I'll have a smaller MSE when predicting for the test_data.
Am I coding correctly the use of a ShuffleSplit in GridSearchCV?

How to use KNeighborsClassifier in BaggingClassifier & How to solve "KNN doesn't support sample weights issue"

I am new to Sklearn, and I am trying to combine KNN, Decision Tree, SVM, and Gaussian NB for BaggingClassifier.
Part of my code looks like this:
best_KNN = KNeighborsClassifier(n_neighbors=5, p=1)
best_KNN.fit(X_train, y_train)
majority_voting = VotingClassifier(estimators=[('KNN', best_KNN), ('DT', best_DT), ('SVM', best_SVM), ('gaussian', gaussian_NB)], voting='hard')
majority_voting.fit(X_train, y_train)
bagging = BaggingClassifier(base_estimator=majority_voting)
bagging.fit(X_train, y_train)
But this causes an error saying:
TypeError: Underlying estimator KNeighborsClassifier does not support sample weights.
The "bagging" part worked fine if I remove KNN.
Does anyone have any idea to solve this issue? Thank you for your time.
In BaggingClassifier you can only use base estimators that support sample weights because it relies on score method, which takes in sample_weightparam.
You can list all the available classifiers like:
import inspect
from sklearn.utils.testing import all_estimators
for name, clf in all_estimators(type_filter='classifier'):
if 'sample_weight' in inspect.getargspec(clf.fit)[0]:
print(name)

How to use the imbalanced library with sklearn pipeline?

I am trying to solve a text classification problem. I want to create baseline model using MultinomialNB
my data is highly imbalnced for few categories, hence decided to use the imbalanced library with sklearn pipeline and referring the tutorial.
The model is failing and giving error after introducing the two stages in pipeline as suggested in docs.
from imblearn.pipeline import make_pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.naive_bayes import MultinomialNB
from imblearn.under_sampling import (EditedNearestNeighbours,
RepeatedEditedNearestNeighbours)
# Create the samplers
enn = EditedNearestNeighbours()
renn = RepeatedEditedNearestNeighbours()
pipe = make_pipeline_imb([('vect', CountVectorizer(max_features=100000,\
ngram_range= (1, 2),tokenizer=tokenize_and_stem)),\
('tfidf', TfidfTransformer(use_idf= True)),\
('enn', EditedNearestNeighbours()),\
('renn', RepeatedEditedNearestNeighbours()),\
('clf-gnb', MultinomialNB()),])
Error:
TypeError: Last step of Pipeline should implement fit. '[('vect', CountVectorizer(analyzer='word', binary=False, decode_error='strict',
Can someone please help here. I am also open to use different way of (Boosting/SMOTE) implementation as well ?
It seems that the pipeline from ìmblearn doesn't support naming like the one in sklearn. From imblearn documentation :
*steps : list of estimators.
You should modify your code to :
pipe = make_pipeline_imb( CountVectorizer(max_features=100000,\
ngram_range= (1, 2),tokenizer=tokenize_and_stem),\
TfidfTransformer(use_idf= True),\
EditedNearestNeighbours(),\
RepeatedEditedNearestNeighbours(),\
MultinomialNB())

Using Scikit-Learn's pipelines to combine a transformers and estimator

I try to use Scikit-Learn's Pipeline function to organize our transformers and estimator, and having problem with building a pipeline that combines one_hot_transformer with a LinearRegression() estimator. It is challenging to connect the following ones
from sklearn.preprocessing import OneHotEncoder
cat_feats = np.array([[1,10],[2,20],[3,10],[4,20],[3,10],[2,20],[1,10]])
OneHotEncoder(sparse=False).fit_transform(cat_feats)
one_hot_transformer = OneHotEncoder(sparse=False).fit_transform(X,y)
from sklearn.pipeline import Pipeline
linear_est = Pipeline([one_hot_transformer], LinearRegression())
linear_est.fit(X,y)
predicted = linear_est.predict(X)
grader.score('intro_ml__linear_model', linear_est.predict)

Include customized feature extraction methods in sklearn Pipeline

In sklearn, it is possible to define a pipeline in the following way:
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.decomposition import PCA
estimators = [('reduce_dim', PCA()), ('clf', SVC())]
pipe = Pipeline(estimators)
Is it also possible to include custom feature extraction methods like
extract_features(image, cspace='RGB',
pix_per_cell=128, cell_per_block=32,
hog_channel=10)
and how would I do that?

Categories

Resources