Get the least squares straight line for a set of points - python

For a set of points, I want to get the straight line that is the closest approximation of the points using a least squares fit.
I can find a lot of overly complex solutions here on SO and elsewhere but I have not been able to find something simple. And this should be very simple.
x = np. array([1, 2, 3, 4])
y = np. array([23, 31, 42, 43 ])
slope, intercept = leastSquares(x, y)
Is there some library function that implements the above leastSquares()?

numpy.linalg.lstsq can compute such a fit for you. There is an example in the documentation that does exactly what you need.
https://numpy.org/doc/stable/reference/generated/numpy.linalg.lstsq.html#numpy-linalg-lstsq
To summarize it here …
>>> x = np.array([0, 1, 2, 3])
>>> y = np.array([-1, 0.2, 0.9, 2.1])
>>> A = np.stack([x, np.ones(len(x))]).T
>>> m, c = np.linalg.lstsq(A, y, rcond=None)[0]
>>> m, c
(1.0 -0.95) # may vary

Well for one, I think for an ordinary least squares fit with a single line you can derive a closed-form solution for the coefficients, if I'm not utterly mistaken. Though there's some pitfalls with numerical stability.
If you look for least squares in general, you'll find more general and thus more complex solutions, because least squares can be done for many more models than just the linear one.
But maybe the sklearn package with its LinearRegression model might do easily what you want to do? https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html
or for more detailed control the scipy package, https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.lstsq.html
import numpy as np
from scipy.linalg import lstq
# Turn x into 2d array; raise it to powers 0 (for y-axis-intercept)
# and 1 (for the slope part)
M = x[:, np.newaxis] ** [0, 1]
p, res, rnk, s = lstq(M, y)
intercept, slope = p[0], p[1]

Here's one way to implement the least squares regression:
import numpy as np
x = np. array([1, 2, 3, 4])
y = np. array([23, 31, 42, 43 ])
def leastSquares(x, y):
A = np.vstack([x, np.ones(len(x))]).T
y = y[:, np.newaxis]
slope, intercept = np.dot((np.dot(np.linalg.inv(np.dot(A.T,A)),A.T)),y)
return slope, intercept
slope, intercept = leastSquares(x, y)

You can try with Moore-Penrose pseudo-inverse:
from scipy import linalg
x = np. array([1, 2, 3, 4])
y = np. array([23, 31, 42, 43 ])
x = np.array([x, np.ones(len(x))])
B = linalg.pinv(x)
sol = np.reshape(y,(1,len(y))) # B
slope, intercept = sol[0,0], sol[0,1]

Related

Python array optimization with two constraints

I have an optimization problem where I'm trying to find an array that needs to optimize two functions simultaneously.
In the minimal example below I have two known arrays w and x and an unknown array y. I initialize array y to contains only 1s.
I then specify function np.sqrt(np.sum((x-np.array)**2) and want to find the array y where
np.sqrt(np.sum((x-y)**2) approaches 5
np.sqrt(np.sum((w-y)**2) approaches 8
The code below can be used to successfully optimize y with respect to a single array, but I would like to find that the solution that optimizes y with respect to both x and y simultaneously, but am unsure how to specify the two constraints.
y should only consist of values greater than 0.
Any ideas on how to go about this ?
w = np.array([6, 3, 1, 0, 2])
x = np.array([3, 4, 5, 6, 7])
y = np.array([1, 1, 1, 1, 1])
def func(x, y):
z = np.sqrt(np.sum((x-y)**2)) - 5
return np.zeros(x.shape[0],) + z
r = opt.root(func, x0=y, method='hybr')
print(r.x)
# array([1.97522498 3.47287981 5.1943792 2.10120135 4.09593969])
print(np.sqrt(np.sum((x-r.x)**2)))
# 5.0
One option is to use scipy.optimize.minimize instead of root, Here you have multiple solver options and some of them (ie SLSQP) allow you to specify multiple constraints. Note that I changed the variable names so that x is the array you want to optimise and y and z define the constraints.
from scipy.optimize import minimize
import numpy as np
x0 = np.array([1, 1, 1, 1, 1])
y = np.array([6, 3, 1, 0, 2])
z = np.array([3, 4, 5, 6, 7])
constraint_x = dict(type='ineq',
fun=lambda x: x) # fulfilled if > 0
constraint_y = dict(type='eq',
fun=lambda x: np.linalg.norm(x-y) - 5) # fulfilled if == 0
constraint_z = dict(type='eq',
fun=lambda x: np.linalg.norm(x-z) - 8) # fulfilled if == 0
res = minimize(fun=lambda x: np.linalg.norm(x), constraints=[constraint_y, constraint_z], x0=x0,
method='SLSQP', options=dict(ftol=1e-8)) # default 1e-6
print(res.x) # [1.55517124 1.44981672 1.46921122 1.61335466 2.13174483]
print(np.linalg.norm(res.x-y)) # 5.00000000137866
print(np.linalg.norm(res.x-z)) # 8.000000000930026
This is a minimizer so besides the constraints it also wants a function to minimize, I chose just the norm of y, but setting the function to a constant (ie lambda x: 1) would have also worked.
Note also that the constraints are not exactly fulfilled, you can increase the accuracy by setting optional argument ftol to a smaller value ie 1e-10.
For more information see also the documentation and the corresponding sections for each solver.

I want to calculate slope and intercept of a linear fit using pykalman module

Consider the linear regression of Y on X, where (xi, yi) = (2, 7), (0, 2), (5, 14) for i = 1, 2, 3. The solution is (a, b) = (2.395, 2.079), obtained using the regression function on a hand-held calculator.
I want to calculate the slope and the intercept of a linear fit using
the pykalman module. I'm getting
ValueError: The shape of all parameters is not consistent. Please re-check their values.
I'd really appreciate if someone would help me.
Here is my code :
from pykalman import KalmanFilter
import numpy as np
measurements = np.asarray([[7], [2], [14]])
initial_state_matrix = [[1], [1]]
transition_matrix = [[1, 0], [0, 1]]
observation_covariance_matrix = [[1, 0],[0, 1]]
observation_matrix = [[2, 1], [0, 1], [5, 1]]
kf1 = KalmanFilter(n_dim_state=2, n_dim_obs=6,
transition_matrices=transition_matrix,
observation_matrices=observation_matrix,
initial_state_mean=initial_state_matrix,
observation_covariance=observation_covariance_matrix)
kf1 = kf1.em(measurements, n_iter=0)
(smoothed_state_means, smoothed_state_covariances) = kf1.smooth(measurements)
print smoothed_state_means
Here's the code snippet:
from pykalman import KalmanFilter
import numpy as np
kf = KalmanFilter()
(filtered_state_means, filtered_state_covariances) = kf.filter_update(filtered_state_mean = [[0],[0]], filtered_state_covariance = [[90000,0],[0,90000]], observation=np.asarray([[7],[2],[14]]),transition_matrix = np.asarray([[1,0],[0,1]]), observation_matrix = np.asarray([[2,1],[0,1],[5,1]]), observation_covariance = np.asarray([[.1622,0,0],[0,.1622,0],[0,0,.1622]]))
print filtered_state_means
print filtered_state_covariances
for x in range(0, 1000):
(filtered_state_means, filtered_state_covariances) = kf.filter_update(filtered_state_mean = filtered_state_means, filtered_state_covariance = filtered_state_covariances, observation=np.asarray([[7],[2],[14]]),transition_matrix = np.asarray([[1,0],[0,1]]), observation_matrix = np.asarray([[2,1],[0,1],[5,1]]), observation_covariance = np.asarray([[.1622,0,0],[0,.1622,0],[0,0,.1622]]))
print filtered_state_means
print filtered_state_covariances
filtered_state_covariance was chosen large because we have no idea where our filter_state_mean is initially and the observations are just [[y1],[y2],[y3]]. Observation_matrix is [[x1,1],[x2,1],[x3,1]] thus giving second element as our intercept. Imagine it like this y1 = m*x1+c where m and c are slope and intercept respectively. In our case filtered_state_mean = [[m],[c]]. Notice that the new filtered_state_means is used as filtered_state_mean for new kf.filter_update() (in iterating loop) because we now know where mean lies with filtered_state_covariance = filtered_state_covariances. Iterating it 1000 times converges the mean to real value. If you want to know about the function/method used the link is: https://pykalman.github.io/
If the system state does not change between measurements (also called vacuous movement step), then transition_matrix φ = I.
I'm not sure if what I'm going to say now is true or not. So please correct me if I am wrong
observation_covariance matrix must be of size m x m where m is the number of observations (in our case = 3). The diagonal elements are just variances I believe variance_y1, variance_y2 and variance_y3 and off-diagonal elements are covariances. For example element (1,2) in matrix is standard deviation of y1,(COMMA NOT PRODUCT) standard deviation of y2 and is equal to element (2,1). Similarly for other elements. Can someone help me include uncertainty in x1, x2 and x3. I mean how do you implement uncertainties in x in the above code.

Creating a Smooth Line based on Points

I have the following dataset:
x = [1, 6, 11, 21, 101]
y = [5, 4, 3, 2, 1]
and my goal is to create a smooth curve that looks like this:
Is there a way to do it in Python?
I have attempted using the method shown in here, and here is the code:
from scipy.interpolate import spline
import matplotlib.pyplot as plt
import numpy as np
x = [1, 6, 11, 21, 101]
y = [5, 4, 3, 2, 1]
xnew = np.linspace(min(x), max(x), 100)
y_smooth = spline(x, y, xnew)
plt.plot(xnew, y_smooth)
plt.show()
but the output shows a weird line.
First, interpolate.spline() has been deprecated, so you should probably not use that. Instead use interpolate.splrep() and interpolate.splev(). It's not a difficult conversion:
y_smooth = interpolate.spline(x, y, xnew)
becomes
tck = interpolate.splrep(x, y)
y_smooth = interpolate.splev(xnew, tck)
But, that's not really the issue here. By default, scipy tries to fit a polynomial of degree 3 to your data, which doesn't really fit your data. But since there's so few points, it can fit your data fairly well even though it's a non-intuitive approximation. You can set the degree of polynomial that it tries to fit with a k=... argument to splrep(). But the same is true even of a polynomial of degree 2; it's trying to fit a parabola, and your data could possibly fit a parabola where there is a bow in the middle (which is what it does now, since the slope is so steep at the beginning and there's no datapoints in the middle).
In your case, your data is much more accurately represented as an exponential, so it'd be best to fit an exponential. I'd recommend using scipy.optimize.curve_fit(). It lets you specify your own fitting function which contains parameters and it'll fit the parameters for you:
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import numpy as np
x = [1, 6, 11, 21, 101]
y = [5, 4, 3, 2, 1]
xnew = np.linspace(min(x), max(x), 100)
def expfunc(x, a, b, c):
return a * np.exp(-b * x) + c
popt, pcov = curve_fit(expfunc, x, y)
plt.plot(xnew, expfunc(xnew, *popt))
plt.show()

interpolation between arrays in python

What is the easiest and fastest way to interpolate between two arrays to get new array.
For example, I have 3 arrays:
x = np.array([0,1,2,3,4,5])
y = np.array([5,4,3,2,1,0])
z = np.array([0,5])
x,y corresponds to data-points and z is an argument. So at z=0 x array is valid, and at z=5 y array valid. But I need to get new array for z=1. So it could be easily solved by:
a = (y-x)/(z[1]-z[0])*1+x
Problem is that data is not linearly dependent and there are more than 2 arrays with data. Maybe it is possible to use somehow spline interpolation?
This is a univariate to multivariate regression problem. Scipy supports univariate to univariate regression, and multivariate to univariate regression. But you can instead iterate over the outputs, so this is not such a big problem. Below is an example of how it can be done. I've changed the variable names a bit and added a new point:
import numpy as np
from scipy.interpolate import interp1d
X = np.array([0, 5, 10])
Y = np.array([[0, 1, 2, 3, 4, 5],
[5, 4, 3, 2, 1, 0],
[8, 6, 5, 1, -4, -5]])
XX = np.array([0, 1, 5]) # Find YY for these
YY = np.zeros((len(XX), Y.shape[1]))
for i in range(Y.shape[1]):
f = interp1d(X, Y[:, i])
for j in range(len(XX)):
YY[j, i] = f(XX[j])
So YY are the result for XX. Hope it helps.

Calculating "generating functions" with numpy

In mathematics, a "generating function" is defined from a sequence of numbers c0, c1, c2, ..., cn by c0+c1*x+c2*x^2 + ... + cn*x^n. These come as "moment generating functions", "probability generating functions" and various other types, depending on the source of the coefficient.
I have an array of the coefficients and I'd like a quick way to create the corresponding generating function.
I could do
import numpy as np
myArray = np.array([1,2,3,4])
x=0.2
sum([c*x**k for k,c in enumerate myArray])
or I could have an array having c[k] in the kth entry. It seems there should be a fast numpy way to do this.
Unfortunately attempts to look this up are complicated by the fact that "generate" and "function" are common words in programming, as is the combination "generating function" so I haven't had any luck with search engines.
x = .2
coeffs = np.array([1,2,3,4])
Make an array of the degree of each term
degrees = np.arange(len(coeffs))
Raise x the each degree
terms = np.power(x, degrees)
Multiply the coefficients and sum
result = np.sum(coeffs*terms)
>>> coeffs
array([1, 2, 3, 4])
>>> degrees
array([0, 1, 2, 3])
>>> terms
array([ 1. , 0.2 , 0.04 , 0.008])
>>> result
1.552
>>>
As a function:
def f(coeffs, x):
degrees = np.arange(len(coeffs))
terms = np.power(x, degrees)
return np.sum(coeffs*terms)
Or simply us the Numpy Polynomial Package
from numpy.polynomial import Polynomial as P
p = P(coeffs)
result = p(x)
If you are looking for performance, using np.einsum could be suggested too -
np.einsum('i,i->',myArray,x**np.arange(myArray.size))
>>> coeffs = np.random.random(5)
>>> coeffs
array([ 0.70632473, 0.75266724, 0.70575037, 0.49293719, 0.66905641])
>>> x = np.random.random()
>>> x
0.7252944971757169
>>> powers = np.arange(0, coeffs.shape[0], 1)
>>> powers
array([0, 1, 2, 3, 4])
>>> result = coeffs * x ** powers
>>> result
array([ 0.70632473, 0.54590541, 0.37126147, 0.18807659, 0.18514853])
>>> np.sum(result)
1.9967167252487628
Using numpys Polynomial class is probably the easiest way.
from numpy.polynomial import Polynomial
coefficients = [1,2,3,4]
f = Polynomial( coefficients )
You can then use the object like any other function.
import numpy as np
import matplotlib.pyplot as plt
print f( 0.2 )
x = np.linspace( -5, 5, 51 )
plt.plot( x , f(x) )

Categories

Resources