Python or SQL Logistic Regression - python

Given time-series data, I want to find the best fitting logarithmic curve. What are good libraries for doing this in either Python or SQL?
Edit: Specifically, what I'm looking for is a library that can fit data resembling a sigmoid function, with upper and lower horizontal asymptotes.

If your data were categorical, then you could use a logistic regression to fit the probabilities of belonging to a class (classification).
However, I understand you are trying to fit the data to a sigmoid curve, which means you just want to minimize the mean squared error of the fit.
I would redirect you to the SciPy function called scipy.optimize.leastsq: it is used to perform least squares fits.

Related

How does SciKit Learn QuantileTransformer work?

I'm looking into the QuantileTransformer object in the Scikit-Learn Python library, in an attempt to "uniformize" residuals from an ARIMA model as part of a copula model. The idea is to feed my Kendall correlation matrix of residuals into a Student's t copula, and then apply the reverse transformation of the simulated residuals, in order to get values that are on the scale of the original data.
My question is this: What underlies this mechanism? I'm struggling to understand how the QuantileTransformer uniformizes values without knowledge of the true distribution. How does this happen without a percentile point function (PPF)? Or is there an assumed PPF that is simply not stated in the documentation (here)?

regression model for skewed distribution in python

enter image description here
I would like to make a linear regression model.
Predictor variable: 'Sud grenoblois / Vif PM10' has a decaying exponent distribution. you could see on the graph. As fas as I know, regression supposes
normal distribution of the predictor. Should I use a transform of variables or another type of regression?
You can apply logarithm transformation to reduce right skewness.
If the tail is to the left of data, then the common transformations include square, cube root and logarithmic.
I don't think you need to change your regression model.

Python library to plot regression residuals against each predictor and fitted values

In R, when doing linear modeling I often use the residualPlots method from the cars library to plot my model residuals against my fitted values and against each numeric/applicable regressor. This is done all at once with a single function call. For example,
Is there an equivalent library in Python that plots residuals against fitted values and against my predictors, all at once?
I'm aware I can plot small multiples in matplotlib using a loop, I'm looking for something that does it in one go. I know statsmodels.graphics.plot_partregress does this for partial regression plots, but I haven't been able to find its equivalent just for straight residuals. Integration with statsmodels, and ability to compute other residuals (externally studentized) as part of plotting a big bonus.

Using Lasso for non-linear regression (Python)

I have a set of independent data points X, and a set of dependent points Y, and I would like to find a model of the form:
(a0+a1*x1+a2*x2+...+amxm)(am+1*xm+1+am+2*xm+2)
I know I can use scipy's curve_fit, but to avoid overfitting, I want to use Lasso for the linear part (i.e. the part in the first set of parenthesis).
Is there a simple way of doing that in Python?
You can fit a lasso regressor to the whole lot, multiplying out your brackets giving you 2m+2 coefficients. Then by performing a change of variables you can make this a linear regression problem again.
See this link for more details:
http://scikit-learn.org/stable/modules/linear_model.html#polynomial-regression

error in Python gradient measurement

I need to fit a straight line to my data to find out if there is a gradient.
I am currently doing this with scipy.stats.linregress.
I'm a little confused though, because one of the outputs of linregress is the "standard error", but I'm not sure how linregress calculated this, as the uncertainty of your data points is not given as an input.
Surely the uncertainty on the data points influence how uncertain the given gradient is?
Thank you!
The standard error of a linear regression is the standard deviation of the serie obtained by substracting the fitted model to your data points. It indicates how well your data points can be fitted by a linear model.

Categories

Resources