Python LeastSquares plot - python

I have to draw plot using least squares method in Python 3. I have list of x and y values:
y = [186,273,308,484]
x = [2.25,2.34,2.47,2.56]
There are many more values for x and for y, there is only a shortcut. And now, I know, that f(x)=y should be a linear function. I can get cofactor „a” and „b” of this function, by calculating:
delta_x = x[len(x)]-x[0] and delta_y = y[len(y)]-y[0]
Etc, using tangent function. I know, how to do it.
But there are also uncertainties of y, about 2 percent of y. So I have y_errors table, which contains all uncertainties of y.
But what now, how I can draw least squares?
Of course I have been used Google, I saw docs.scipy.org/doc/scipy/reference/tutorial/optimize.html#least-square-fitting-leastsq, but there are some problems.
I tried to edit example from scipy.org to my own purpose. So I edited x, y, y_meas variables, putting here my own lists. But now, I dont know, what is p0 variable in this example. And what should I must edit to make my example working.
Of course I can edit also residuals function. It must get only one variable - y_true. In addition to this I dont understand arquments of leastsq function.
Sorry for my english and for asking such newbie question. But I dont understand this method. Thank You in advance.

I believe you are trying to fit a set of {x, y} (and possibly sigma_y: the uncertainties in y) values to a linear expression. This is known as linear regression, and For linear regression (or indeed, for regression of any polynomial) you can use numpy's polyfit. The uncertainties can be used for the weights::
weight = 1/sigma_y
where sigma_y is the standard deviation in y.
The least-squares routines in scipy.optimize allow you to fit a non-linear function to data, but you have to write the function that computes the "residual" (data - model) in terms of variables that are to be adjusted in order to minimize the calculated residual.

Related

Is there any function for calculating k and b coefficients for linear regression model with only one independent variable?

I know I can just write needed method by myself but there must be a function for this because this problem is so common as heck. If somebody does't understand what I am talking about take a look at the following formula
{Image must be here}
For example, I have a function y = kx+b where y is dependent variable, and x is independent. I need to calculate k (slope) and b (intercept), and I have formulas from the picture, and everything those formulas need. Is there any function in common data science libraries which can help calculate those ones? I mentioned "only one independent variable" because sometimes there are multiple independent vars which leads to multidimentional plots
Googling gives nothing. I already use my own implementation of those ones, but I prefer native functions from packages such as scipy and numpy, or sklearn
not sure to fully understand the question (especially, what do you mean by "one independent variable"?), so I try to reformulate. If you have two variables, x andy, both represented by samples (x_1,..., x_n), (y_1,..., y_n) and that you suspect a linear relationship between them, y = a*x +b, then you can use numpy.polyfit to find the coefficients a and b. Here is an example:
import numpy as np
n = 100
x = np.linspace(0, 1, n)
y = 2*x + 0.3
a, b = np.polyfit(x, y, 1)
print(f"a={a}, b={b}")
Returns
a=2.0, b=0.30000000000000016
Hope that helps!

Constraining OLS (or WLS) coeffecients using statsmodels

I have a regression of the form model = sm.GLM(y, X, w = weight).
Which ends up being a simple weighted OLS. (note that specificying w as the error weights array actually works in sm.GLM identically to sm.WLS despite it not being in the documentation).
I'm using GLM because this allows me to fit with some additional constraints using fit_constrained(). My X consists of 6 independent variables, 2 of which i want to constrain the resulting coeffecients to be positive. But i can not seem to figure out the syntax to get fit_constrained() to work. The documentation is extremely bare and i can not find any good examples anywhere. All i really need is the correct syntax for imputing these constraints. Thanks!
The function you see is meant for linear constraints, that is a combination of your coefficients fulfill some linear equalities, not meant for defining boundaries.
The closest you can get is using scipy least squares and defining the boundaries, for example, we set up some dataset with 6 coefficients:
from scipy.optimize import least_squares
import numpy as np
np.random.seed(100)
x = np.random.uniform(0,1,(30,6))
y = np.random.normal(0,2,30)
The function to basically matrix multiply and return error:
def fun(b, x, y):
return b[0] + np.matmul(x,b[1:]) - y
The first coefficient is the intercept. Let's say we require the 2nd and 6th to be always positive:
res_lsq = least_squares(fun, [1,1,1,1,1,1,1], args=(x, y),
bounds=([-np.inf,0,-np.inf,-np.inf,-np.inf,-np.inf,0],+np.inf))
And we check the result:
res_lsq.x
array([-1.74342242e-01, 2.09521327e+00, -2.02132481e-01, 2.06247855e+00,
-3.65963504e+00, 6.52264332e-01, 5.33657765e-20])

creating a function that changes equations at certain slope, usable in curve_fit

I am currently working with fitting decline curves to real-world production data. I have had good luck with creating a hyperbolic and using curve_fit from scipy.optimize. The current function I use:
def hyp_func(x,qi,b,di):
return qi*(1.0-b*di*x)**(-1.0/b)
What I would like to do now, is at a certain rate of decline, transition to an exponential function. How would i go about this and still be able to use in curve_fit (I think below works)? I am trying the code below, is this the way to do it? or is there a better way?
def hyp_func2(x,qi,b,di):
dlim = -0.003
hy = qi*(1.0-b*di*x)**(-1.0/b)
hdy = di/(1.0-b*di*x)
ex = x[hdy>dlim]
qlim = qi*(dlim/di)**(1/b)
xlim = ((qi/qlim)**b-1)/(b*-di)
ey = qlim*np.exp(dlim*(ex-xlim))
y = np.concatenate((hy[hdy<dlim],ey))
return y
hy is the hyperbolic equation
hdy is the hy derivative
ex is the part of x after derivative hits dlim
ey is the exponential equation
I am still working out the equations, I am not getting a continuous function.
edit: data here, and updated equations
Sorry to be the bearer of bad news, but if I understand what you are trying to do, I think it is very difficult to have scipy.optimize.curve_fit, or any of the other methods from scipy.optimize do what you are hoping to do.
Most fitting algorithms are designed to work with continuous variables, and usually (and curve_fit for sure) start off by making very small changes in parameter values to find the right direction and step size to take to improve the result.
But what you're looking for is a discrete variable as the breakpoint between one functional form (roughly, "power law") to another ("exponential") The algorithm won't normally make a large enough change in your di parameter to make a difference for which value is used as that breakpoint, and may decide that di does not affect the fit (your model used di in other ways too, so you might get lucky and di might have an affect on the fit.
Assuming that qi>0 the slope is actually positive, so I do not get the choice of -0.003. Moreover I think the derivative is wrong.
You can calculate exactly the value where the lope reaches a critical value.
Now, from my experience you have two choices. If you define a piecewise function yourself, you usually run into trouble with function calls using numpy arrays. I typically use scipy.optimize.leastsq with a self-defined residual function. A second option is a continuous transition between the two functions. You can make that as sharp as you want, as value and slope already fit, by definition.
The two solutions look as follows
import matplotlib.pyplot as plt
import numpy as np
def hy(x,b,qi,di):
return qi*(1.0-b*di*x)**(-1.0/b)
def abshy(x,b,qi,di):#same as hy but defined for all x
return qi*abs(1.0-b*di*x)**(-1.0/b)
def dhy(x,b,qi,di):#derivative of hy
return qi*di*(1.0-b*di*x)**(-(b+1.0)/b)
def get_x_from_slope(s,b,qi,di):#self explaining
return (1.0-(s/(qi*di))**(-b/(b+1.0)))/(b*di)
def exh(x,xlim,qlim,dlim):#exponential part (actually no free parameters)
return qlim*np.exp(dlim*(x-xlim))
def trans(x,b,qi,di, s0):#piecewise function
x0=get_x_from_slope(s0,b,qi,di)
if x<x0:
out= hy(x,b,qi,di)
else:
H0=hy(x0,b,qi,di)
out=exh(x,x0,H0,s0/H0)
return out
def no_if_trans(x,b,qi,di, s0,sharpness=10):#continuous transition between the two functions
x0=get_x_from_slope(s0,b,qi,di)
H0=hy(x0,b,qi,di)
weight=0.5*(1+np.tanh(sharpness*(x-x0)))
return weight*exh(x,x0,H0,s0/H0)+(1.0-weight)*abshy(x,b,qi,di)
xList=np.linspace(0,5.5,90)
hyList=np.fromiter(( hy(x,2.2,1.2,.1) for x in xList ) ,np.float)
t1List=np.fromiter(( trans(x,2.2,1.2,.1,3.59) for x in xList ) ,np.float)
nt1List=np.fromiter(( no_if_trans(x,2.2,1.2,.1,3.59) for x in xList ) ,np.float)
fig1=plt.figure(1)
ax=fig1.add_subplot(1,1,1)
ax.plot(xList,hyList)
ax.plot(xList,t1List,linestyle='--')
ax.plot(xList,nt1List,linestyle=':')
ax.set_ylim([1,10])
ax.set_yscale('log')
plt.show()
There is almost no differences in the two solutions, but your options for using scipy fitting functions are slightly different. The second solution should easily work with curve_fit

finding regression to second order poynomial python

I have a piece of code below and want to find the regression to the line (how good the data points match this line). I want my fit to be second order polynomial. How can i do this? and is there a method which takes errors into consideration.
plt.errorbar(x,y,fmt='*')
z = np.polyfit(x, y, 2)
xxx=np.linspace(0.65,2,10)
ppp = np.poly1d(z)
plt.plot(xxx,ppp(xxx))
According to the documentation on numpy.polyfit, it can also return residuals, which are the errors you are looking for. Look at the Returns section. And you can set the polynomial degree that you want with the parameter deg.

Extrapolating data from a curve using Python

I am trying to extrapolate future data points from a data set that contains one continuous value per day for almost 600 days. I am currently fitting a 1st order function to the data using numpy.polyfit and numpy.poly1d. In the graph below you can see the curve (blue) and the 1st order function (green). The x-axis is days since beginning. I am looking for an effective way to model this curve in Python in order to extrapolate future data points as accurately as possible. A linear regression isnt accurate enough and Im unaware of any methods of nonlinear regression that can work in this instance.
This solution isnt accurate enough as if I feed
x = dfnew["days_since"]
y = dfnew["nonbrand"]
z = numpy.polyfit(x,y,1)
f = numpy.poly1d(z)
x_new = future_days
y_new = f(x_new)
plt.plot(x,y, '.', x_new, y_new, '-')
EDIT:
I have now tried the curve_fit using a logarithmic function as the curve and data behaviour seems to conform to:
def func(x, a, b):
return a*numpy.log(x)+b
x = dfnew["days_since"]
y = dfnew["nonbrand"]
popt, pcov = curve_fit(func, x, y)
plt.plot( future_days, func(future_days, *popt), '-')
However when I plot it, my Y-values are way off:
The very general rule of thumb is that if your fitting function is not fitting well enough to your actual data then either:
You are using the function wrong, e.g. You are using 1st order polynomials - So if you are convinced that it is a polynomial then try higher order polynomials.
You are using the wrong function, it is always worth taking a look at:
your data curve &
what you know about the process that is generating the data
to come up with some speculation/theorem/guesses about what sort of model might fit better.
Might your process be a logarithmic one, a saturating on, etc. try them!
Finally, if you are not getting a consistent long term trend then you might be able to justify using cubic splines.

Categories

Resources