I'm looking for a good implementation for logistic regression (not regularized) in Python. I'm looking for a package that can also get weights for each vector. Can anyone suggest a good implementation / package?
Thanks!
I notice that this question is quite old now but hopefully this can help someone. With sklearn, you can use the SGDClassifier class to create a logistic regression model by simply passing in 'log' as the loss:
sklearn.linear_model.SGDClassifier(loss='log', ...).
This class implements weighted samples in the fit() function:
classifier.fit(X, Y, sample_weight=weights)
where weights is a an array containing the sample weights that must be (obviously) the same length as the number of data points in X.
See http://scikit-learn.org/dev/modules/generated/sklearn.linear_model.SGDClassifier.html for full documentation.
The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(class_weight='balanced')
model = model.fit(X, y)
EDIT
Sample Weights can be added in the fit method. You just have to pass an array of n_samples. Check out documentation -
http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression.fit
Hope this does it...
I think what you want is statsmodels. It has great support for GLM and other linear methods. If you're coming from R, you'll find the syntax very familiar.
statsmodels weighted regression
getting started w/ statsmodels
Have a look at scikits.learn logistic regression implementation
Do you know Numpy? If no, take a look also to Scipy and matplotlib.
Related
I'm trying to understand the difference between RidgeClassifier and LogisticRegression in sklearn.linear_model. I couldn't find it in the documentation.
I think I understand quite well what the LogisticRegression does.It computes the coefficients and intercept to minimise half of sum of squares of the coefficients + C times the binary cross-entropy loss, where C is the regularisation parameter. I checked against a naive implementation from scratch, and results coincide.
Results of RidgeClassifier differ and I couldn't figure out, how the coefficients and intercept are computed there? Looking at the Github code, I'm not experienced enough to untangle it.
The reason why I'm asking is that I like the RidgeClassifier results -- it generalises a bit better to my problem. But before I use it, I would like to at least have an idea where does it come from.
Thanks for possible help.
RidgeClassifier() works differently compared to LogisticRegression() with l2 penalty. The loss function for RidgeClassifier() is not cross entropy.
RidgeClassifier() uses Ridge() regression model in the following way to create a classifier:
Let us consider binary classification for simplicity.
Convert target variable into +1 or -1 based on the class in which it belongs to.
Build a Ridge() model (which is a regression model) to predict our target variable. The loss function is MSE + l2 penalty
If the Ridge() regression's prediction value (calculated based on decision_function() function) is greater than 0, then predict as positive class else negative class.
For multi-class classification:
Use LabelBinarizer() to create a multi-output regression scenario, and then train independent Ridge() regression models, one for each class (One-Vs-Rest modelling).
Get prediction from each class's Ridge() regression model (a real number for each class) and then use argmax to predict the class.
I want to have the formula of the model in order to use it in other languages/projects. Is there a way to export the formula from the model?
I will use sklearn linear regression model.
What I want to do eventually: given a formula f(), and data set 'd', I will have java script code that will give me predictions on d based on f().
The formula can be described essentially by the learned coefficients. The coefficients can be obtained using the attributes coef_ and intercept_. The dot product between the coefficients and the input vector plus the intercept gives the output of the model.
The actual code that implements this "formula" in scikit-learn is something like:
return safe_sparse_dot(X, self.coef_.T,
dense_output=True) + self.intercept_
which should not be too difficult for you to port over to your other project.
I use linear SVM from scikit learn (LinearSVC) for binary classification problem. I understand that LinearSVC can give me the predicted labels, and the decision scores but I wanted probability estimates (confidence in the label). I want to continue using LinearSVC because of speed (as compared to sklearn.svm.SVC with linear kernel) Is it reasonable to use a logistic function to convert the decision scores to probabilities?
import sklearn.svm as suppmach
# Fit model:
svmmodel=suppmach.LinearSVC(penalty='l1',C=1)
predicted_test= svmmodel.predict(x_test)
predicted_test_scores= svmmodel.decision_function(x_test)
I want to check if it makes sense to obtain Probability estimates simply as [1 / (1 + exp(-x)) ] where x is the decision score.
Alternately, are there other options wrt classifiers that I can use to do this efficiently?
Thanks.
scikit-learn provides CalibratedClassifierCV which can be used to solve this problem: it allows to add probability output to LinearSVC or any other classifier which implements decision_function method:
svm = LinearSVC()
clf = CalibratedClassifierCV(svm)
clf.fit(X_train, y_train)
y_proba = clf.predict_proba(X_test)
User guide has a nice section on that. By default CalibratedClassifierCV+LinearSVC will get you Platt scaling, but it also provides other options (isotonic regression method), and it is not limited to SVM classifiers.
I took a look at the apis in sklearn.svm.* family. All below models, e.g.,
sklearn.svm.SVC
sklearn.svm.NuSVC
sklearn.svm.SVR
sklearn.svm.NuSVR
have a common interface that supplies a
probability: boolean, optional (default=False)
parameter to the model. If this parameter is set to True, libsvm will train a probability transformation model on top of the SVM's outputs based on idea of Platt Scaling. The form of transformation is similar to a logistic function as you pointed out, however two specific constants A and B are learned in a post-processing step. Also see this stackoverflow post for more details.
I actually don't know why this post-processing is not available for LinearSVC. Otherwise, you would just call predict_proba(X) to get the probability estimate.
Of course, if you just apply a naive logistic transform, it will not perform as well as a calibrated approach like Platt Scaling. If you can understand the underline algorithm of platt scaling, probably you can write your own or contribute to the scikit-learn svm family. :) Also feel free to use the above four SVM variations that support predict_proba.
If you want speed, then just replace the SVM with sklearn.linear_model.LogisticRegression. That uses the exact same training algorithm as LinearSVC, but with log-loss instead of hinge loss.
Using [1 / (1 + exp(-x))] will produce probabilities, in a formal sense (numbers between zero and one), but they won't adhere to any justifiable probability model.
If what your really want is a measure of confidence rather than actual probabilities, you can use the method LinearSVC.decision_function(). See the documentation.
Just as an extension for binary classification with SVMs: You could also take a look at SGDClassifier which performs a gradient Descent with a SVM by default. For estimation of the binary-probabilities it uses the modified huber loss by
(clip(decision_function(X), -1, 1) + 1) / 2)
An example would look like:
from sklearn.linear_model import SGDClassifier
svm = SGDClassifier(loss="modified_huber")
svm.fit(X_train, y_train)
proba = svm.predict_proba(X_test)
I've been attempting to use weighted samples in scikit-learn while training a Random Forest classifier. It works well when I pass a sample weights to the classifier directly, e.g. RandomForestClassifier().fit(X,y,sample_weight=weights), but when I tried a grid search to find better hyperparameters for the classifier, I hit a wall:
To pass the weights when using the grid parameter, the usage is:
grid_search = GridSearchCV(RandomForestClassifier(), params, n_jobs=-1,
fit_params={"sample_weight"=weights})
The problem is that the cross-validator isn't aware of sample weights and so doesn't resample them together with the the actual data, so calling grid_search.fit(X,y) fails: the cross-validator creates subsets of X and y, sub_X and sub_y and eventually a classifier is called with classifier.fit(sub_X, sub_y, sample_weight=weights) but now weights hasn't been resampled so an exception is thrown.
For now I've worked around the issue by over-sampling high-weight samples before training the classifier, but it's a temporary work-around. Any suggestions on how to proceed?
I have too little reputation so I can't comment on #xenocyon. I'm using sklearn 0.18.1 and I'm using also pipeline in the code. The solution that worked for me was:
fit_params={'classifier__sample_weight': w} where w is the weight vector and classifier is the step name in the pipeline.
Edit: the scores I see from the below don't seem quite right. This is possibly because, as mentioned above, even when weights are used in fitting they might not be getting used in scoring.
It appears that this has been fixed now. I am running sklearn version 0.15.2. My code looks something like this:
model = SGDRegressor()
parameters = {'alpha':[0.01, 0.001, 0.0001]}
cv = GridSearchCV(model, parameters, fit_params={'sample_weight': weights})
cv.fit(X, y)
Hope that helps (you and others who see this post).
I would suggest writing own cross validation parameters selection, as it is just 10-15 lines of code (especially using the kfold object from scikit-learn) in python, while oversampling is possibly a great bottleneck.
i've been trying to find this information around and couldnt found any help.
What i want to do is get a float number as output from sklearn svm in order to work as input for a sub classifier.
Is it possible to get output from svm like 0,89898 instead of 1, given that a class is more closely to be classified as 1?
Thank you
Platt scaling can help to achieve what you want. It fits a logistic sigmoid curve on top of the output of SVM in a post-hoc fashion.
To do this in sklearn, you'll need to fit your SVM with probability parameter set to True. Then, you can use the fitted model's predict_proba() method to get a floating point output. More documentations can be found here. You'll also find related discussions in this thread.