Is there any way I can use partial_fit() in a BagginClassifier() which contains multiple MLPClassifier()?
My problem is binary classification, something like this:
clf = MLPClassifier()
model = BaggingClassifier(base_estimator=clf)
model.partial_fit(x, y, classes=[0, 1])
It keeps me giving this error:
AttributeError: 'BaggingClassifier' object has no attribute 'partial_fit'
Seems like it isn't. The documentation of sklearn gave the following list of modules that support partial_fit:
sklearn.naive_bayes.MultinomialNB
sklearn.naive_bayes.BernoulliNB
sklearn.linear_model.Perceptron
sklearn.linear_model.SGDClassifier
sklearn.linear_model.PassiveAggressiveClassifier
sklearn.linear_model.SGDRegressor
sklearn.linear_model.PassiveAggressiveRegressor
sklearn.cluster.MiniBatchKMeans
sklearn.decomposition.MiniBatchDictionaryLearning
sklearn.cluster.MiniBatchKMeans
Related
I am trying to normalize my data at each step of the cross-validation and I came across this question
As suggested, I went to the scikit-learn documentation and found this example:
from sklearn.pipeline import make_pipeline
clf = make_pipeline(preprocessing.StandardScaler(), svm.SVC(C=1))
cross_val_score(clf, X, y, cv=cv)
This looks indeed like what I am trying to achieve, however, my intention is to use a z-scorer instead of the StandardScaler, so I tried this:
clf = make_pipeline(stats.zscore(), DecisionTreeClassifier())
But I get an error saying this:
TypeError: zscore() missing 1 required positional argument: 'a'
What should be the argument of zscore()?
Welcome to Stack Overflow! There are several ways of using custom functionality in sklearn pipelines — I think FunctionTransformer could fit your case.
Create a transformer that uses zscore and pass the transformer to make_pipeline instead of calling zscore directly.
I hope this helps!
I'm using sklearn linear implementation of SVM classifier LinearSVM.
I didn't use it directly but I wrap it with CalibratedClassifierCV to get the probabilities in the prediction time, like:
model = CalibratedClassifierCV(LinearSVC(random_state=0))
After fitting the model, I tried to get the coef_ to print the Top features, following this post Visualising Top Features in Linear SVM with Scikit Learn and Matplotlib, but this I got this error:
coef = classifier.coef_.ravel()
AttributeError: 'CalibratedClassifierCV' object has no attribute 'coef_'
How can I get the coef in the case I wrap the classifier with a calibrator?, I'm not totally interested in this way, thus if there is another way to get the features importance, it will be welcomed.
coef_ is not an attribute of CalibratedClassifierCV however, it is an attribute of the base_estimator which is a LinearSVC in your case. You can access your base estimator via the calibrated_classifiers_ which is a list of the fitted models (which depends on the number of models you fit based on your cv value). I have shown a sample code which you can refer to for your need.
from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
from sklearn.calibration import CalibratedClassifierCV
from sklearn.svm import LinearSVC
iris = datasets.load_iris()
model = CalibratedClassifierCV(LinearSVC(random_state=0))
model.fit(iris.data, iris.target)
model.calibrated_classifiers_
[<sklearn.calibration._CalibratedClassifier at 0x7f15d0c57550>,
<sklearn.calibration._CalibratedClassifier at 0x7f15d0c57c18>,
<sklearn.calibration._CalibratedClassifier at 0x7f15d0aec080>]
In this case my cv is three so I have three models built, so I would simple loop through them and taken an average.
coef_avg = 0
for i in model.calibrated_classifiers_:
coef_avg = coef_avg + i.base_estimator.coef_
coef_avg = coef_avg/len(model.calibrated_classifiers_)
array([[ 0.16464871, 0.45680981, -0.77801375, -0.4170196 ],
[ 0.1238834 , -0.89117967, 0.35451826, -0.89231957],
[-0.83826029, -0.9237139 , 1.30772955, 1.67592916]])
Note: Starting from sklearn version 0.24, CalibratedClassifierCV constructor exposes an ensemble argument, that, if set to False (assuming cv is not set to "prefit"), makes CalibratedClassifierCV expose only one calibrated classifier trained using all training data. This means we no longer need to loop over all calibrated_classifiers_ at prediction time:
model = CalibratedClassifierCV(LinearSVC(random_state=0), ensemble=False)
model.fit(iris.data, iris.target)
model.calibrated_classifiers_
# Returns a list with one element, [<sklearn.calibration._CalibratedClassifier at 0x7f15d0c57550>]
(using an example above, given by Parthasarathy)
When using a decision tree classifier from scikit-learn, the docs show you reassigning the variable storing the classifier to the output of itself calling the fit() method:
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, Y)
However, now if I call the predict method:
clf.predict([[1,1]])
Pycharm warms me:
Unresolved attribute reference 'predict' for class 'object'
You can look up the declaration for fit() in Pycharm easily, and the method merely returns self, so the reassignment is not necessary and you can remove it so that I have instead:
clf = tree.DecisionTreeClassifier()
clf.fit(X, Y)
Everything runs smoothly both ways, but Pycharm doesn't give me a warning with the latter. I'm curious, because I'm fairly new to Python and Pycharm, why does it give me this warning? Is there a way to make this IDE recognize that the method returns self and therefore is still the same type with the same method predict()? Otherwise, is there any way to remove this warning?
You would want to do something like this instead to avoid the warning
km = KMeans(n_clusters=size)
km.fit(x)
predictions = km.predict(x)
For some reason, fit returns a object type. I am unsure if it's intentional. KMeans initializer returns the correct type and fit is in-place.
Can Label Propagation be used for semi-supervised regression tasks in scikit-learn?
According to its API, the answer is YES.
http://scikit-learn.org/stable/modules/label_propagation.html
However, I got the error message when I tried to run the following code.
from sklearn import datasets
from sklearn.semi_supervised import label_propagation
import numpy as np
rng=np.random.RandomState(0)
boston = datasets.load_boston()
X=boston.data
y=boston.target
y_30=np.copy(y)
y_30[rng.rand(len(y))<0.3]=-999
label_propagation.LabelSpreading().fit(X,y_30)
It shows that "ValueError: Unknown label type: 'continuous'" in the label_propagation.LabelSpreading().fit(X,y_30) line.
How should I solve the problem? Thanks a lot.
It looks like the error in the documentation, code itself clearly is classification only (beggining of the .fit call of the BasePropagation class):
check_classification_targets(y)
# actual graph construction (implementations should override this)
graph_matrix = self._build_graph()
# label construction
# construct a categorical distribution for classification only
classes = np.unique(y)
classes = (classes[classes != -1])
In theory you could remove the "check_classification_targets" call and use "regression like method", but it will not be the true regression since you will never "propagate" any value which is not encountered in the training set, you will simply treat the regression value as the class identifier. And you will be unable to use value "-1" since it is a codename for "unlabeled"...
The main aim is to add a deep learning classification method like CNN as an individual in ensemble in python.
The following code works fine:
clf1=CNN()
eclf1=VotingClassifier(estimators=[('lr', clf1)], voting='soft')
eclf1=eclf1.fit(XTrain,YTrain)
But, the error:
'NoneType' object has no attribute 'predict_proba'
comes up once running eclf1=eclf1.predict(XTest).
Just in case, The CNN consists of _fit_ function for training, and the following function:
def predict_proba(self,XTest):
#prediction=np.mean(np.argmax(teY, axis=1) == predict(teX))
teX=XTest.reshape(len(XTest),3,112,112)
p=predict(teX)
i = np.zeros((p.shape[0],p.max()+1))
for x,y in enumerate(p):
i[x,y] = 1
return i
Can you elaborate better what you did and which error you came across?
By your question only I can assume you tried to call 'predic_proba' after the line eclf1=eclf1.predict(XTest). And of course this will turn on an error because the eclf1.predict(XTest) returns an array, which doesn't have a predict() method.
Try just changing it to:
pred_results=eclf1.predict(XTest)
pred_result_probs = eclf1.predict_proba(XTest)