Least Squares: Python

Least Squares: Python - python

I am trying to implement least squares:
I have: $y=\theta\omega$
The least square solution is \omega=(\theta^{T}\theta)^{-1}\theta^{T}y
I tryied:
import numpy as np
def least_squares1(y, tx):
"""calculate the least squares solution."""
w = np.dot(np.linalg.inv(np.dot(tx.T,tx)), np.dot(tx.T,y))
return w
The problem is that this method becomes quickly unstable
(for small problems its okay)
I realized that, when I compared the result to this least square calculation:
import numpy as np
def least_squares2(y, tx):
"""calculate the least squares solution."""
a = tx.T.dot(tx)
b = tx.T.dot(y)
return np.linalg.solve(a, b)
Compare both methods:
I tried to fit data with a polynomial of degree 12 [1, x,x^2,x^3,x^4...,x^12]
First method:
Second method:
Do you know why the first method diverges for large polynomials ?
P.S. I only added "import numpy as np" for your convinience, if you want to test the functions.

There are three points here:
One is that it is generally better (faster, more accurate) to solve linear equations rather than to compute inverses.
The second is that it's always a good idea to use what you know about a system of equations (e.g. that the coefficient matrix is positive definite) when computing a solution, in this case you should use numpy.linalg.lstsq
The third is more specifically about polynomials. When using monomials as a basis, you can end up with a very poorly conditioned coefficient matrix, and this will mean that numerical errors tend to be large. This is because, for example, the vectors x->pow(x,11) and x->pow(x,12) are very nearly parallel. You would get a more accurate fit, and be able to use higher degrees, if you were to use a basis of orthogonal polynomials, for example https://en.wikipedia.org/wiki/Chebyshev_polynomials or https://en.wikipedia.org/wiki/Legendre_polynomials

I am going to improve on what was said before. I answered this yesterday.
The problem with higher order polynomials is something called Runge's phenomena. The reason why the person resorted orthogonal polynomials which are known as Hermite polynomials is that they attempt to get rid of the Gibbs phenomenon which is an adverse oscillatory effect when Fourier series methods are applied to non-periodic signals.
You can sometimes improve under the conditioning be resorting to regularizing methods if the matrix is low rank as I did in the other post. Other parts may be due to smoothness properties of the vector.

Related

Checkgradient without solving optimization problem in MATLAB

I have a relatively complicated function and I have calculated the analytical form of the Jacobian of this function. However, sometimes, I mess up this Jacobian.
MATLAB has a nice way to check for the accuracy of the Jacobian when using some optimization technique as described here.
The problem though is that it looks like MATLAB solves the optimization problem and then returns if the Jacobian was correct or not. This is extremely time consuming, especially considering that some of my optimization problems take hours or even days to compute.
Python has a somewhat similar function in scipy as described here which just compares the analytical gradient with a finite difference approximation of the gradient for some user provided input.
Is there anything I can do to check the accuracy of the Jacobian in MATLAB without having to solve the entire optimization problem?

A laborious but useful method I've used for this sort of thing is to check that the (numerical) integral of the purported derivative is the difference of the function at the end points. I have found this more convenient than comparing fractions like (f(x+h)-f(x))/h with f'(x) because of the difficulty of choosing h so that on the one hand h is not so small that the fraction is not dominated by rounding error and on the other h is small enough that the fraction should be close to f'(x)
In the case of a function F of a single variable, the assumption is that you have code f to evaluate F and fd say to evaluate F'. Then the test is, for various intervals [a,b] to look at the differences, which the fundamental theorem of calculus says should be 0,
Integral{ 0<=x<=b | fd(x)} - (f(b)-f(a))
with the integral being computed numerically. There is no need for the intervals to be small.
Part of the error will, of course, be due to the error in the numerical approximation to the integral. For this reason I tend to use, for example, and order 40 Gausss Legendre integrator.
For functions of several variables, you can test one variable at a time. For several functions, these can be tested one at a time.
I've found that these tests, which are of course by no means exhaustive, show up the kinds of mistakes that occur in computing derivatives quire readily.

Have you considered the usage of Complex step differentiation to check your gradient? See this description

Computing the Jacobian matrix, using (at most) Numpy

I'm attempting to write a simple implementation of the Newton-Raphson method, in Python. I've already done so using the SymPy library, however what I'm working on now will (ultimately) end up running in an environment where only Numpy is available.
For those unfamiliar with the algorithm, it works (in my case) as follows:
I have some a system of symbolic equations, which I "stack" to form a matrix F. The unknowns are X,Y,Z,T (which I wish to determine). Some additional values are initially unknown, until passed to my solver, which substitutes these known values for variables in the symbolic expressions.
Now, the Jacobian matrix (J) of F is computed. This, too, is a matrix of symbolic expressions.
Now, I iterate in some range (max_iter). With each iteration, I form a matrix A by substituing for the unknowns X,Y,Z,T in F current estimates (starting with some initial values). Similarly, I form a matrix b by substituting for X,Y,Z,T current estimates.
I then obtain new estimates by solving the matrix equation Ax = b for x. This vector x holds dT, dX, dY, dZ. I then add these to current estimates for T,X,Y,Z, and iterate again.
Thus far, I've found my largest issue to be computing the Jacobian matrix. I need only to do this once, however it will be different depending upon the coefficients fed to the solver (not unknowns, but only known once fed to the solver, so I can't simply hard-code the Jacobian).
While I'm not terribly familiar with Numpy, I know that it offers numpy.gradient. I'm not sure, however, that this is the same as SymPy's .jacobian.
How can the Jacobian matrix be found, either in "pure" Python, or with Numpy?
EDIT:
Should it be useful to you, more information on the problem can be found [here]. 1. It can be formulated a few different ways, however (as of now) I'm writing it as 4 equations of the form:
\sqrt{(X-x_i)^2+(Y-y_i)^2+(Z-z_i)^2 }= c * (t_i-T)
Where X,Y,Z and T are unknown.
This describes the solution to a localization problem, where we know (a) the location of n >= 4 observers in a 3-dimensional space, (b) the time at which each observer "saw" some signal, and (c) the velocity of the signal. The goal is to determine the coordinates of the signal source X,Y,Z (and, as a side effect, the time of emission, T).
Notice that I've tried (many) other approaches to solving this problem, and all leads point toward a combination of Newton-Raphson with regression.

Scipy Gaussian KDE : Matrix is not positive definite

I am trying to estimate the the density of a data set at certain points, using scipy.
from scipy.stats import gaussian_kde
import numpy as np
I have a dataset A of 3D points (this is just a minimal example. My actual data has many more dimensions and many more samples)
A = np.array([[0.078377 , 0.76737392, 0.45038174],
[0.65990129, 0.13154658, 0.30770917],
[0.46068406, 0.22751313, 0.28122463]])
and the points at which I want to estimate the density
B = np.array([[0.40209377, 0.21063273, 0.75885516],
[0.91709997, 0.79303252, 0.65156937]])
But I can't seem to be able to use the gaussian_kde function, as
result = gaussian_kde(A.T)(B.T)
returns
LinAlgError: Matrix is not positive definite
How do I fix this error? How do I get the density of my sample?

TL; DR:
You have highly correlated features in your data which leads to a numerical error. There are several possible ways to address this, each with pros and cons. A drop-in replacement class for gaussian_kde is proposed below.
Diagnostic
Your dataset (i.e. the matrix that you feed when creating the gaussian_kde object, not when using it), likely contains highly dependent features. This fact (possibly combined with having low numerical resolution like float32 and "too many" datapoints) causes the covariance matrix to have near-zero eigenvalues, which due to numerical error can get below zero. This in turn will break the code that uses the Cholesky decomposition on the dataset covariance matrix (see explanation for specific details).
Assuming your dataset has shape (dims, N) you can test if this is your problem via np.linalg.eigh(np.cov(dataset))[0] <= 0. If any of the outputs comes out True, let me be the first welcoming you to the club.
Treatments
The idea is to get all eigenvalues above zero.
Increasing numerical resolution to the highest float that is practical can be an easy fix and worth trying, but may not be enough.
Given the fact that this is caused by correlated features, removing datapoints doesn't help much a priori. There is a slim hope that having less numbers to crush will propagate less error, and keep the eigenvalues above zero. It's easy to implement but it discards data points.
A more involved fix is to identify the highly correlated features and merge them or ignore the "superfluous" ones. This is tricky especially if the correlations among dimensions vary from instance to instance.
Probably the cleanest way is to leave the data untouched, and lift the problematic eigenvalues to positive values. This is usually done in 2 ways:
SVD addresses the problem directly at its core: Decompose the covariance matrix and replace the negative eigenvalues with a small positive epsilon. This will put your matrix back to PD domain introducing minimal error.
If the SVD is too expensive to compute, an alternative numerical hack is to add epsilon * np.eye(D) to the covariance matrix. This has the effect of adding epsilon to each one of the eigenvalues. Again, if epsilon is small enough, this doesn't introduce that much of an error.
If you go for the last approach you'll need to tell gaussian_kde to modify its covariance matrix. This is a relatively clean way I found to do that: simply add this class to your codebase and replace gaussian_kde with GaussianKde (tested on my end, seems to work fine).
class GaussianKde(gaussian_kde):
"""
Drop-in replacement for gaussian_kde that adds the class attribute EPSILON
to the covmat eigenvalues, to prevent exceptions due to numerical error.
"""
EPSILON = 1e-10 # adjust this at will
def _compute_covariance(self):
"""Computes the covariance matrix for each Gaussian kernel using
covariance_factor().
"""
self.factor = self.covariance_factor()
# Cache covariance and inverse covariance of the data
if not hasattr(self, '_data_inv_cov'):
self._data_covariance = np.atleast_2d(np.cov(self.dataset, rowvar=1,
bias=False,
aweights=self.weights))
# we're going the easy way here
self._data_covariance += self.EPSILON * np.eye(
len(self._data_covariance))
self._data_inv_cov = np.linalg.inv(self._data_covariance)
self.covariance = self._data_covariance * self.factor**2
self.inv_cov = self._data_inv_cov / self.factor**2
L = np.linalg.cholesky(self.covariance * 2 * np.pi)
self._norm_factor = 2*np.log(np.diag(L)).sum() # needed for scipy 1.5.2
self.log_det = 2*np.log(np.diag(L)).sum() # changed var name on 1.6.2
Explanation
In case your error is similar, but not quite that, or anyone feels curious, here is the process I followed, hopefully it helps.
The exception stack specified that the error was triggered during a Cholesky decomposition. In my case, this was this line inside the _compute_covariance method.
Indeed, after the exception, checking the Eigenvalues for the covariance and inv_cov attributes via np.eigh showed that covariance had a close-to-zero negative eigenvalue, and hence its inverse had a huge one. Since Cholesky expects PD matrices, this triggered an error.
At this point we can heavily suspect that the tiny, negative eigenvalue is a numerical error, since covariance matrices are PSD. Indeed, the error source comes when the covariance matrix is originally computed from the dataset that has been passed to the constructor, here. In my case, the covariance matrix yielded at least 2 almost identical columns, which led to the residual negative eigenvalue due to numerical error.
When will your dataset lead to a quasi-singular covariance matrix? That question is perfectly addressed in this other SE post. The bottom line is: If some variable is an exact linear combination of the other variables, with constant term allowed, the correlation and covariance matrices of the variables will be singular. The dependency observed in such matrix between its columns is actually that same dependency as the dependency between the variables in the data observed after the variables have been centered (their means brought to 0) or standardized (if we mean correlation rather than covariance matrix) (Kudos and +1 to ttnphns for the amazing work).

Intuition behind a minimizer

I'm very new in quantitative and scientifical programming and I've run into the minimizer function of scipy scipy.optimize.fmin. Can someone explain me the basic intuition of this function for a non-engineering student?
Letz say I want to minimize following function:
def f(x): x**2
1) What does the minimizer actually minimize? The dependent or independent variable?
2) What's the difference between scipy.optimize.fmin and scipy.optimize.minimize?

Given a function which contains some unknown parameters (so actually it is a family of functions) and data, a minimizer tries to find parameters which minimize the distance of the function values to data. This is done colloquially speaking by iteratively adjusting the parameters until further change does not seem to improve the result.
This is the equivalent to the ball running down a hill, mentioned by #pylang in the comment. The "hill" is the distance to the data, given all possible parameter values. The rolling ball is the minimizer which "moves" over that landscape, trying out parameters until it is in a position where every move would lead to increasing distance to the data or at least no notable decrease.
Note however, that by this method you are searching for a local minimum of the function values to the data, given a set of parameters to the function. For a simple function like you posted, the local minimum is the only one and therefore the global one, but for complex functions involving many parameters this problem quickly can get quite tricky.
People then often use multiple runs of the minimizer to see if it stops at the same positions. If that is not the case, people say the mimimizer fails to converge, which means the function is too complex that one minimum is easily found. There are a lot of algorithms to counter that, simulated annealing or Monte Carlo methods come to my mind.
To your function: The function f which is mentioned in the example in the help of the fmin function is the distance function. It tells you how far a certain set of parameters puts you with respect to your target. Now you must define what distance means for you. Commonly, the sum of squared residuals (also called euclidean norm) is used:
sum((function values - data points)^2)
Say you have a function
def f(x, a, b): return a*x**2 + b
You want to find values for a and b such that your function comes as closely as possible to the data points given below with their respective x and y values:
datax = [ 0, 1, 2, 3, 4]
datay = [ 2, 3, 5, 9, 15]
Then if you use the euclidean norm, your distance function is (this is the function f in the fmin help)
def dist(params):
a, b = params
return sum((f(x,a,b) - y)**2 for x,y in zip(datax, datay))
You should be able (sorry, I have no scipy on my current machine, will test it tonight) to minimize to get fitting values of a and b using
import scipy.optimize
res = scipy.optimize.fmin(dist, x0 = (0,0))
Note that you need starting values x0 for your parameters a and b. These are the values which you choose randomly if you run the minimizer multiple times to see whether it converges.

Imagine a ball rolling down a hill. This is an optimization problem. The idea of the "minimizer" is to find the value that reduces the gradient/slope or derivative. In other words, you're finding where the ball eventually rests. Optimization problems can get more interesting and complicated, particularly if the ball rolls into a saddlepoint or local minimum (not the global minimum) or along a level ridge or plane, for which a minimum is nearly impossible to find. For example, consider the famous Rosenbrock function (see image), a 2D surface where finding the valley is simple, but locating the minimum is difficult.
The fmin and minimize functions are nearly equivalent for the simplex algorithm. However, fmin is less robust for complex functions. minimize is generalized for other algorithms, e.g. "Nelder-Mead" (simplex), "Powell", "CG", etc. These algorithms are just different approaches for "jostling" or helping the ball down the hill faster towards the minimum. Moreover, supplying a Jacobian and Hessian matrix as parameters to the minimum function improves computation efficiency. See the Scipy docs for more on how these functions are used.

Inverse Matrix (Numpy) int too large to convert to float

I am trying to take the inverse of a 365x365 matrix. Some of the values get as large as 365**365 and so they are converted to long numbers. I don't know if the linalg.matrix_power() function can handle long numbers. I know the problem comes from this (because of the error message and because my program works just fine for smaller matrices) but I am not sure if there is a way around this. The code needs to work for a NxN matrix.
Here's my code:
item=0
for i in xlist:
xtotal.append(arrayit.arrayit(xlist[item],len(xlist)))
item=item+1
print xtotal
xinverted=numpy.linalg.matrix_power(xtotal,-1)
coeff=numpy.dot(xinverted,ylist)
arrayit.arrayit:
def arrayit(number, length):
newarray=[]
import decimal
i=0
while i!=(length):
newarray.insert(0,decimal.Decimal(number**i))
i=i+1
return newarray;
The program is taking x,y coordinates from a list (list of x's and list of y's) and makes a function.
Thanks!

One thing you might try is the library mpmath, which can do simple matrix algebra and other such problems on arbitrary precision numbers.
A couple of caveats: It will almost certainly be slower than using numpy, and, as Lutzl points out in his answer to this question, the problem may well not be mathematically well defined. Also, you need to decide on the precision you want before you start.
Some brief example code,
from mpmath import mp, matrix
# set the precision - see http://mpmath.org/doc/current/basics.html#setting-the-precision
mp.prec = 5000 # set it to something big at the cost of speed.
# Ideally you'd precalculate what you need.
# a quick trial with 100*100 showed that 5000 works and 500 fails
# see the documentation at http://mpmath.org/doc/current/matrices.html
# where xtotal is the output from arrayit
my_matrix = matrix(xtotal) # I think this should work. If not you'll have to create it and copy
# do the inverse
xinverted = my_matrix**-1
coeff = xinverted*matrix(ylist)
# note that as lutlz pointed out you really want to use solve instead of calculating the inverse.
# I think this is something like
from mpmath import lu_solve
coeff = lu_solve(my_matrix,matrix(ylist))
I suspect your real problem is with the maths rather than the software, so I doubt this will work fantastically well for you, but it's always possible!

Did you ever hear of Lagrange or Newton interpolation? This would avoid the whole construction of the VanderMonde matrix. But not the potentially large numbers in the coefficients.
As a general observation, you do not want the inverse matrix. You do not need to compute it. What you want is to solve a system of linear equations.
x = numpy.linalg.solve(A, b)
solves the system A*x=b.
You (really) might want to look up the Runge effect. Interpolation with equally spaced sample points is an increasingly ill-conditioned task. Useful results can be obtained for single-digit degrees, larger degrees tend to give wildly oscillating polynomials.
You can often use polynomial regression, i.e., approximating your data set by the best polynomial of some low degree.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Least Squares: Python - python

Related

Checkgradient without solving optimization problem in MATLAB

Computing the Jacobian matrix, using (at most) Numpy

Scipy Gaussian KDE : Matrix is not positive definite

Intuition behind a minimizer

Inverse Matrix (Numpy) int too large to convert to float

Categories

Resources