Lasso Regression: The continuous heavy step function - python

From many documents, I have learned the recipe of Ridge regression that is:
loss_Ridge = loss_function + lambda x L2 norm of slope
and the recipe of Lasso regression that is:
loss_Lasso = loss_function + lambda x L1 norm of slope
When I have read topic "Implementing Lasso and Ridge Regression" in "TensorFlow Machine Learning Cookbook", its author explained that:
"...we will use a continuous approximation to a step function, called
the continuous heavy step function..."
and its author also provided lines of code here.
I don't understand about which is called 'the continuous heavy step function' in this context. Please help me.

From the link that you provided,
if regression_type == 'LASSO':
# Declare Lasso loss function
# Lasso Loss = L2_Loss + heavyside_step,
# Where heavyside_step ~ 0 if A < constant, otherwise ~ 99
lasso_param = tf.constant(0.9)
heavyside_step = tf.truediv(1., tf.add(1., tf.exp(tf.multiply(-50., tf.subtract(A, lasso_param)))))
regularization_param = tf.multiply(heavyside_step, 99.)
loss = tf.add(tf.reduce_mean(tf.square(y_target - model_output)), regularization_param)
This heavyside_step function is very close to a logistic function which in turn can be a continuous approximation for a step function.
You use continuous approximation because the loss function needs to be differentiable with respect to the parameters of your model.
To get an intuition about read the constrained formulation section 1.6 in https://www.cs.ubc.ca/~schmidtm/Documents/2005_Notes_Lasso.pdf
You can see that in your code if A < 0.9 then regularization_param vanishes, so optimization will constrain A in that range.

If you want to normalize features using Lasso Regression here you have one example:
from sklearn.feature_selection import SelectFromModel
from sklearn.linear_model import Lasso
estimator = Lasso()
featureSelection = SelectFromModel(estimator)
featureSelection.fit(features_vector, target)
selectedFeatures = featureSelection.transform(features_vector)
print(selectedFeatures)

Related

What is the parameter Alpha in Ridge Regression?

Can someone give me an understandable explanation of the parameter Alpha in SKlearn's Ridge Regression? How does it influence the function etc.?
Examples would be helpful :)
Ridge regression minimizes the objective function:
||y - Xw||^2_2 + alpha * ||w||^2_2
This model solves a regression model where the loss function is the linear least squares function and regularization is given by the l2-norm. In simple words, alpha is a parameter of how much should ridge regression tries to prevent overfitting!
Let say you have three parameter W = [w1, w2, w3]. In overfitting situation, the loss function can fit a model with W=[0.95, 0.001, 0.0004] which means it is highly biased to the first parameter. However, alpha * ||w||^2_2 increases the loss function in those cases and tries to keep all parameters in some sort of boundaries to prevent overfitting. For instance, with a regularizer, the W could be W=[0.5, 0.2, 0.33]. When you increase alpha you are pushing the Ridge regression to be more robust against overfitting, but might be getting larger training error.

Difficulty Running Bayesian Gamma Regression with PyMC3

PyMC3 has excellent functionality for dealing with Bayesian regressions, so I've been trying to leverage that to run a Bayesian Gamma Regression using PyMC3 where the likelihood would be Gamma.
From what I understand, running any sort of Bayesian Regression in PyMC3 requires the pymc3.glm.GLM() function, which takes in a model formula in Patsy form (e.g. y ~ x_1 + x_2 + ... + x_m), the dataframe, and a distribution.
However, the issue is that the pymc3.glm.GLM() function requires a pymc3..families object (https://github.com/pymc-devs/pymc3/blob/master/pymc3/glm/families.py) for the distribution. But the Gamma distribution doesn't show up as one of the families built into the package so I'm stuck. Or is the Gamma function family hidden somewhere? Would appreciate any help in this matter!
For context:
I have a dataframe of features [x_1, x_2, ..., x_m] (call it X) and a target variable (call it y). This is the code I have prepared so far, but just need to figure out how to get the Gamma distribution in as my likelihood.
import pymc3 as pm
# Combine X and y into a single dataframe
patsy_DF = X
patsy_DF['y'] = y
# Get Patsy Formula
all_columns = "+".join(X.columns)
patsy_formula = "y~" + all_columns
# Instantiate model
model = pm.Model()
# Construct Model
with model:
# Fit Bayesian Gamma Regression
pm.glm.GLM(patsy_formula, df_dummied, family=pm.families.Gamma())
# !!! ... but pm.families.Gamma() doesn't exist ... !!!
# Get MAP Estimate and Trace
map_estimate = pm.find_MAP(model=model)
trace = pm.sample(draws=2000, chains=3, start = map_estimate)
# Get regression results summary (coefficient estimates,
pm.summary(trace).round(3)

Why does sklearn logistic regression regularize both the weights and the intercept?

The regularization parameter C in logistic regression
(see http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) is used allow the function to be fitted to be well defined and avoid either overfitting or problems with step functions (see https://datascience.stackexchange.com/questions/10805/does-scikit-learn-use-regularization-by-default/10806).
However, regularization in logistic regression should only concern the weights for the features, not the intercept (also explained here: http://aimotion.blogspot.com/2011/11/machine-learning-with-python-logistic.html)
But is seems that sklearn.linear_model.LogisticRegression actually regularizes the intercept as well. Here is why:
1) Conside above link carefully (https://datascience.stackexchange.com/questions/10805/does-scikit-learn-use-regularization-by-default/10806): the sigmod is moved slightly to the left, closer to the intercept 0.
2) I tried to fit data points with a logistic curve and a manual maximum likelihood function. Including the intercept into the L2 norm gives identical results as sklearn's function.
Two questions please:
1) Did I get this wrong, is this a bug, or is there a well-justified reason for regularizing the intercept?
2) Is there a way to use sklearn and specify to regularize all parameters except the intercepts?
Thanks!
import numpy as np
from sklearn.linear_model import LogisticRegression
C = 1e1
model = LogisticRegression(C=C)
x = np.arange(100, 110)
x = x[:, np.newaxis]
y = np.array([0]*5 + [1]*5)
print x
print y
model.fit(x, y)
a = model.coef_[0][0]
b = model.intercept_[0]
b_modified = -b/a # without regularization, b_modified should be 104.5 (as for C=1e10)
print "a, b:", a, -b/a
# OUTPUT:
# [[100]
# [101]
# [102]
# [103]
# [104]
# [105]
# [106]
# [107]
# [108]
# [109]]
# [0 0 0 0 0 1 1 1 1 1]
# a, b: 0.0116744221756 100.478968664
scikit-learn has default regularized logistic regression.
The change in intercept_scaling parameter value in sklearn.linear_model.LogisticRegression has similar effect on the result if only C parameter is changed.
In case of modification in intercept_scaling parameter, regularization has an impact on the estimation of bias in logistic regression. When this parameter's value is on higher side then the regularization impact on bias is reduced. Per official documentation:
The intercept becomes intercept_scaling * synthetic_feature_weight.
Note! the synthetic feature weight is subject to l1/l2 regularization
as all other features. To lessen the effect of regularization on
synthetic feature weight (and therefore on the intercept)
intercept_scaling has to be increased.
Hope it helps!
Thanks #Prem, this is indeed the solution:
C = 1e1
intercept_scaling=1e3 # very high numbers make it unstable in practice
model = LogisticRegression(C=C, intercept_scaling=intercept_scaling)

how to use sklearn when target variable is a proportion

There are standard ways of predicting proportions such as logistic regression (without thresholding) and beta regression. There have already been discussions about this:
http://scikit-learn-general.narkive.com/4dSCktaM/using-logistic-regression-on-a-continuous-target-variable
http://scikit-learn-general.narkive.com/lLVQGzyl/beta-regression
I cannot tell if there exists a work-around within the sklearn framework.
There exists a workaround, but it is not intrinsically within the sklearn framework.
If you have a proportional target variable (value range 0-1) you run into two basic difficulties with scikit-learn:
Classifiers (such as logistic regression) deal with class labels as target variables only. As a workaround you could simply threshold your probabilities to 0/1 and interpret them as class labels, but you would lose a lot of information.
Regression models (such as linear regression) do not restrict the target variable. You can train them on proportional data, but there is no guarantee that the output on unseen data will be restricted to the 0/1 range. However, in this situation, there is a powerful work-around (below).
There are different ways to mathematically formulate logistic regression. One of them is the generalized linear model, which basically defines the logistic regression as a normal linear regression on logit-transformed probabilities. Normally, this approach requires sophisticated mathematical optimization because the probabilities are unknown and need to be estimated along with the regression coefficients.
In your case, however, the probabilities are known. This means you can simply transform them with y = log(p / (1 - p)). Now they cover the full range from -oo to oo and can serve as the target variable for a LinearRegression model [*]. Of course, the model output then needs to be transformed again to result in probabilities p = 1 / (exp(-y) + 1).
import numpy as np
from sklearn.linear_model import LinearRegression
class LogitRegression(LinearRegression):
def fit(self, x, p):
p = np.asarray(p)
y = np.log(p / (1 - p))
return super().fit(x, y)
def predict(self, x):
y = super().predict(x)
return 1 / (np.exp(-y) + 1)
if __name__ == '__main__':
# generate example data
np.random.seed(42)
n = 100
x = np.random.randn(n).reshape(-1, 1)
noise = 0.1 * np.random.randn(n).reshape(-1, 1)
p = np.tanh(x + noise) / 2 + 0.5
model = LogitRegression()
model.fit(x, p)
print(model.predict([[-10], [0.0], [1]]))
# [[ 2.06115362e-09]
# [ 5.00000000e-01]
# [ 8.80797078e-01]]
There are also numerous other alternatives. Some non-linear regression models can work naturally in the 0-1 range. For example Random Forest Regressors will never exceed the target variables' range they were trained with. Simply put probabilities in and you will get probabilities out. Neural networks with appropriate output activation functions (tanh, I guess) will also work well with probabilities, but if you want to use those there are more specialized libraries than sklearn.
[*] You could in fact plug in any linear regression model which can make the method more powerful, but then it no longer is exactly equivalent to logistic regression.

Logistic regression using python

I want to implement Logisitic regression from scratch in python. Following are the functions in it:
sigmoid
cost
fminunc
Evaluating Logistic regression
I would like to know, what would be a great start to this to start from scratch in python. Any guidance on how and what would be good. I know the theory of those functions but looking for a better pythonic answer.
I used octave and I got it all right but dont know how to start in python as OCtave already has those packages setup to do the work.
You may want to try to translate your octave code to python and see what's going on. You can also use the python package to do this for you. Check out scikit-learn on logistic regression. There is also an easy example in this blog.
In order to implement Logistic Regression, You may consider the following 2 approaches:
Consider How Linear Regression Works. Apply Sigmoid Function to the Hypothesis of Linear Regression and run gradient Descent until convergence. OR Apply the Exponential based Softmax function to rule out lower possibility of occurrence.
def logistic_regression(x, y,alpha=0.05,lamda=0):
'''
Logistic regression for datasets
'''
m,n=np.shape(x)
theta=np.ones(n)
xTrans = x.transpose()
oldcost=0.0
value=True
while(value):
hypothesis = np.dot(x, theta)
logistic=hypothesis/(np.exp(-hypothesis)+1)
reg = (lamda/2*m)*np.sum(np.power(theta,2))
loss = logistic - y
cost = np.sum(loss ** 2)
#print(cost)
# avg cost per example (the 2 in 2*m doesn't really matter here.
# But to be consistent with the gradient, I include it)
# avg gradient per example
gradient = np.dot(xTrans, loss)/m
# update
if(reg):
cost=cost+reg
theta = (theta - (alpha) * (gradient+reg))
else:
theta=theta -(alpha/m) * gradient
if(oldcost==cost):
value=False
else:
oldcost=cost
print(accuracy(theta,m,y,x))
return theta,accuracy(theta,m,y,x)

Categories

Resources