l1 regularized support for Multinomial Logistic Regresion - python

The current sklearn LogisticRegression supports the multinomial setting but only allows for an l2 regularization since the solvers l-bfgs-b and newton-cg only support that. Andrew Ng has a paper that discusses why l2 regularization shouldn't be used with l-bfgs-b.
If I were to use sklearn's SGDClassifier with log loss and l1 penalty, would that be the same as multinomial logistic regression with l1 regularization minimized by stochastic gradient descent? If not, are there any open source python packages that support l1 regularized loss for multinomial logistic regression?

According to the SGD documentation:
For multi-class classification, a “one versus all” approach is used.
So I think using SGDClassifier cannot perform multinomial logistic regression either.
You can use statsmodels.discrete.discrete_model.MNLogit, which has a method fit_regularized which supports L1 regularization.
The example below is modified from this example:
import numpy as np
import statsmodels.api as sm
from sklearn.datasets import load_iris
from sklearn.cross_validation import train_test_split
iris = load_iris()
X = iris.data
y = iris.target
X = sm.add_constant(X, prepend=False) # An interecept is not included by default and should be added by the user.
X_train, X_test, y_train, y_test = train_test_split(X, y)
mlogit_mod = sm.MNLogit(y_train, X_train)
alpha = 1 * np.ones((mlogit_mod.K, mlogit_mod.J - 1)) # The regularization parameter alpha should be a scalar or have the same shape as as results.params
alpha[-1, :] = 0 # Choose not to regularize the constant
mlogit_l1_res = mlogit_mod.fit_regularized(method='l1', alpha=alpha)
y_pred = np.argmax(mlogit_l1_res.predict(X_test), 1)
Admittedly, the interface of this library is not as easy to use as scikit-learn, but it provides more advanced stuff in statistics.

The package Lighting has support for multinomial logit via SGD for l1 regularization.

Related

Adjust threshold in cros_val_score sklearn for Lasso

I saw this post that trained a LGBM model but i would like to know how to adapt it for Lasso. I know the prediction may not necessarily be between 0 and 1, but I would like to try this model. I have tried this but it doesnt work:
import numpy as np
from lightgbm import LGBMClassifier
from sklearn.datasets import make_classification
from sklearn.linear_model import Lasso
X, y = make_classification(n_features=10, random_state=0, n_classes=2, n_samples=1000, n_informative=8)
class Lasso(Lasso):
def predict(self,X, threshold=0.5):
result = super(Lasso, self).predict_proba(X)
predictions = [1 if p>threshold else 0 for p in result[:,0]]
return predictions
clf = Lasso(alpha=0.05)
clf.fit(X,y)
precision = cross_val_score(Lasso(),X,y,cv=5,scoring='precision')
I get
UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan
The specific model class you chose (Lasso()) is actually used for regression problems, as it minimizes penalized square loss, which is not appropriate in your case. Instead, you can use LogisticRegression() with L1 penalty to optimize a logistic function with a Lasso penalty. To control the regularization strength, the C= parameter is used (see docs).
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
X, y = make_classification(
n_features=10, random_state=0, n_classes=2, n_samples=1000, n_informative=8
)
class LassoLR(LogisticRegression):
def predict(self,X, threshold=0.5):
result = super(LassoLR, self).predict_proba(X)
predictions = [1 if p>threshold else 0 for p in result[:,0]]
return predictions
clf = LassoLR(penalty='l1', solver='liblinear', C=1.)
clf.fit(X,y)
precision = cross_val_score(LassoLR(),X,y,cv=5,scoring='precision')
print(precision)
# array([0.04166667, 0.08163265, 0.1010101 , 0.125 , 0.05940594])

Choosing best model regarding to k-fold cross validation

I want to take Iris data and choose best logistic model based on GridSearchCV function.
My work so far
import numpy as np
from sklearn import datasets
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LogisticRegression
iris = datasets.load_iris()
X = iris.data[:, :2]
y = iris.target
# Logistic regression
reg_log = LogisticRegression()
# Penalties
pen = ['l1', 'l2','none']
#Regularization strength (numbers from -10 up to 10)
C = np.logspace(-10, 10, 100)
# Possibilities for those parameters
parameters= dict(C=C, penalty=pen)
# choosing best model based on 5-fold cross validation
Model = GridSearchCV(reg_log, parameters, cv=5)
# Fitting best model
Best_model = Model.fit(X, y)
And I get a lot of errors. Do you know maybe what I'm doing wrong ?
Since you are choosing different regularization, you can see on the help page:
The ‘newton-cg’, ‘sag’, and ‘lbfgs’ solvers support only L2
regularization with primal formulation, or no regularization. The
‘liblinear’ solver supports both L1 and L2 regularization, with a dual
formulation only for the L2 penalty. The Elastic-Net regularization is
only supported by the ‘saga’ solver.
I am not quite sure if you want to do a grid search with penalization = 'none' and penalization scores. So if you use saga and increase the iteration:
reg_log = LogisticRegression(solver="saga",max_iter=1000)
pen = ['l1', 'l2']
C = [0.1,0.001]
parameters= dict(C=C, penalty=pen)
Model = GridSearchCV(reg_log, parameters, cv=5)
Best_model = Model.fit(X, y)
res = pd.DataFrame(Best_model.cv_results_)
res[['param_C','param_penalty','mean_test_score']]
param_C param_penalty mean_test_score
0 0.1 l1 0.753333
1 0.1 l2 0.833333
2 0.001 l1 0.333333
3 0.001 l2 0.700000
It works pretty ok. If you get more errors with your penalization values.. try to look at them and make sure they are not some crazy values.

Scikit-Learn vs Keras (Tensorflow) for multinomial logistic regression

I am trying simple multinomial logistic regression using Keras, but the results are quite different compared to standard scikit-learn approach.
For example with iris data:
import numpy as np
import pandas as pd
df = pd.read_csv("./data/iris.data", header=None)
from sklearn.model_selection import train_test_split
df_train, df_test = train_test_split(df, test_size=0.3, random_state=52)
X_train = df_train.drop(4, axis=1)
y_train = df_train[4]
X_test = df_test.drop(4, axis=1)
y_test = df_test[4]
Using scikit-learn:
from sklearn.linear_model import LogisticRegression
scikit_model = LogisticRegression(multi_class='multinomial', solver ='saga', max_iter=500)
scikit_model.fit(X_train, y_train)
the average weighted f1-score on test set:
y_test_pred = scikit_model.predict(X_test)
from sklearn.metrics import classification_report
print(classification_report(y_test, y_test_pred, scikit_model.classes_))
is 0.96.
Then with Keras:
from sklearn.preprocessing import LabelEncoder
from keras.utils import np_utils
# first we have to encode class values as integers
encoder = LabelEncoder()
encoder.fit(y_train)
y_train_encoded = encoder.transform(y_train)
Y_train = np_utils.to_categorical(y_train_encoded)
y_test_encoded = encoder.transform(y_test)
Y_test = np_utils.to_categorical(y_test_encoded)
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from keras.regularizers import l2
#model construction
input_dim = 4 # 4 variables
output_dim = 3 # 3 possible outputs
def classification_model():
model = Sequential()
model.add(Dense(output_dim, input_dim=input_dim, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
return model
#training
keras_model = classification_model()
keras_model.fit(X_train, Y_train, epochs=500, verbose=0)
the average weighted f1-score on test set:
classes = np.argmax(keras_model.predict(X_test), axis = 1)
y_test_pred = encoder.inverse_transform(classes)
from sklearn.metrics import classification_report
print(classification_report(y_test, y_test_pred, encoder.classes_))
is 0.89.
Is it possible to perform identical (or at least as much as possible) logistic regression with Keras as with scikit-learn?
I tried to run your examples and noticed a couple of potential sources:
The test set is incredibly small, only 45 instances. This means that to get from accuracy of .89 to .96, the model only needs to predict just three more instances correctly. Due to randomness in training, your Keras results can oscillate quite a bit.
As explained by #meowongac https://stackoverflow.com/a/59643522/1467943, you're using a different optimizer. One point is that scikit's algorithm will automatically set its learning rate. For SGD in Keras, tweaking learning rate and/or number of epochs could lead to improvements.
Scikit learn quietly uses L2 regularization by default.
Using your code, I was able to get accuracy ranging from .89 to .96 by running SGD with learning rate set to .05. When switching to Adam (also with this quite high learning rate), I got more stable results ranging from .92 to .96 (although this is more of an impression as I didn't run too many trials).
One obvious difference is saga (a variant of SAG) is used in LogisticRegression while SGD is used in your NN. As far as I know, LogisticRegression doesn't support SGD. Alternatively you can use SGDRegressor or SGDClassifier instead of LogisticRegression. And here is a blog discussing the differences between them.

Finding regression equation from support vector regression

I'm currently using Python's scikit-learn to create a support vector regression model, and I was wondering how one would go about finding the explicit regression equation of our target variable in terms of our predictors. It doesn't have to be simple or pretty, but is there a method Python has to output this (for a polynomial kernel, specifically)? I am fairly new to using SVR, and I am not certain of what to expect a regression equation to look like used in the prediction from a test observation after the regression is fit.
I've already fit an SVR model that predicts with a performance I'm happy with, and I've used GridSearchCV to tune hyper-parameters. However, I need an explicit form of my target variable in terms of the predictors for an independent optimization, and don't know how to find this equation.
from sklearn.svm import SVR
svr = SVR(kernel = 'poly', C = best_params['C'], epsilon = best_params['epsilon'], gamma = best_params['gamma'], coef0 = 0.1, shrinking = True, tol = 0.001, cache_size = 200, verbose = False, max_iter = -1)
svr.fit(x,y)
Where x is my matrix of observations, y is my vector of target values from the observations, and best_params is the output (optimal hyperparameters) found by GridSearchCV.
Does Python have any method for outputting the resulting equation of the SVR model used in predicting future target values from a set of predictors? Or is there a straightforward way of using values found by SVR to create an equation myself if I specify the kernel to be of polynomial type?
Thank you!
If you use a linear kernel, then you can output your coefficient.
For example
from sklearn.svm import SVR
import numpy as np
n_samples, n_features = 1000, 5
rng = np.random.RandomState(0)
coef = [1,2,3,4,5]
X = rng.randn(n_samples, n_features)
y = coef * X
y = y.sum(axis = 1) + rng.randn(n_samples)
clf = SVR(kernel = 'linear', gamma='scale', C=1.0, epsilon=0.2)
clf.fit(X, y)
clf.coef_
array([[0.97626634, 2.00013793, 2.96205576, 4.00651352, 4.95923782]])

accuracy difference between svm and logistic regression in python

I have two classifier in python such as svm and logistic regression.
from sklearn import preprocessing
from sklearn.linear_model import LogisticRegression
from sklearn import svm
scaler = preprocessing.StandardScaler()
scaler.fit(synthetic_data)
synthetic_data = scaler.transform(synthetic_data)
test_data = scaler.transform(test_data)
svc = svm.SVC(tol=0.0001, C=100.0).fit(synthetic_data, synthetic_label)
predictedSVM = svc.predict(test_data)
print(accuracy_score(test_label, predictedSVM))
LRmodel = LogisticRegression(penalty='l2', tol=0.0001, C=100.0, random_state=1,max_iter=1000, n_jobs=-1)
predictedLR = LRmodel.fit(synthetic_data, synthetic_label).predict(test_data)
print(accuracy_score(test_label, predictedLR))
I use same input but their accuracy is so different. svm sometimes predicts all predicted svm as 1. Accuracy of svm is 0.45 and accuracy of logistic regression is 0.75. I changed parameters of C in a different ways, but I have still some problems.
It is because SVC by default uses radial kernel (http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html), which is something different than linear classification.
If you want to use linear kernel add parameter kernel='linear' to SVC.
If you want to keep using radial kernel, I suggest to also change gamma parameter.

Categories

Resources