I'm attempting to write a simple implementation of the Newton-Raphson method, in Python. I've already done so using the SymPy library, however what I'm working on now will (ultimately) end up running in an environment where only Numpy is available.
For those unfamiliar with the algorithm, it works (in my case) as follows:
I have some a system of symbolic equations, which I "stack" to form a matrix F. The unknowns are X,Y,Z,T (which I wish to determine). Some additional values are initially unknown, until passed to my solver, which substitutes these known values for variables in the symbolic expressions.
Now, the Jacobian matrix (J) of F is computed. This, too, is a matrix of symbolic expressions.
Now, I iterate in some range (max_iter). With each iteration, I form a matrix A by substituing for the unknowns X,Y,Z,T in F current estimates (starting with some initial values). Similarly, I form a matrix b by substituting for X,Y,Z,T current estimates.
I then obtain new estimates by solving the matrix equation Ax = b for x. This vector x holds dT, dX, dY, dZ. I then add these to current estimates for T,X,Y,Z, and iterate again.
Thus far, I've found my largest issue to be computing the Jacobian matrix. I need only to do this once, however it will be different depending upon the coefficients fed to the solver (not unknowns, but only known once fed to the solver, so I can't simply hard-code the Jacobian).
While I'm not terribly familiar with Numpy, I know that it offers numpy.gradient. I'm not sure, however, that this is the same as SymPy's .jacobian.
How can the Jacobian matrix be found, either in "pure" Python, or with Numpy?
EDIT:
Should it be useful to you, more information on the problem can be found [here]. 1. It can be formulated a few different ways, however (as of now) I'm writing it as 4 equations of the form:
\sqrt{(X-x_i)^2+(Y-y_i)^2+(Z-z_i)^2 }= c * (t_i-T)
Where X,Y,Z and T are unknown.
This describes the solution to a localization problem, where we know (a) the location of n >= 4 observers in a 3-dimensional space, (b) the time at which each observer "saw" some signal, and (c) the velocity of the signal. The goal is to determine the coordinates of the signal source X,Y,Z (and, as a side effect, the time of emission, T).
Notice that I've tried (many) other approaches to solving this problem, and all leads point toward a combination of Newton-Raphson with regression.
Related
If I have a system of nonlinear ordinary differential equations, M(t,y) y' = F(t,y), what is the best method of solution when my mass matrix M is sometimes singular?
I'm working with the following system of equations:
If t=0, this reduces to a differential algebraic equation. However, even if we restrict t>0, this becomes a differential algebraic equation whenever y4=0, which I cannot set a domain restriction to avoid (and is an integral part of the system I am trying to model). My only previous exposure to DAEs is when an entire row is 0 -- but in this case my mass matrix is not always singular.
What is the best way to implement this numerically?
So far, I've tried using Python where I add a small number (0.0001) to the main diagonals of M and invert it, solving the equations y' = M^{-1}(t,y) F(t,y). However, this seems prone to instabilities, and I'm unsure if this is a universally appropriate means of regularization.
Python doesn't have any built-in functions to deal with mass matrices, so I've also tried coding this in Julia. However, DifferentialEquations.jl states explicitly that "Non-constant mass matrices are not directly supported: users are advised to transform their problem through substitution to a DAE with constant mass matrices."
I'm at a loss on how to accomplish this. Any insights on how to do this substitution or a better way to solve this type of problem would be greatly appreciated.
The following transformation leads to a constant mass matrix:
.
You need to handle the case of y_4 = 0 separately.
In most of the below answers for complex matrix differential equations, the odeintw package has been suggested.
https://stackoverflow.com/a/45970853/7952027
https://stackoverflow.com/a/26320130/7952027
https://stackoverflow.com/a/26747232/7952027
https://stackoverflow.com/a/26582411/7952027
I want to know the theory behind the manipulations done in the code of odeintw.
Like why one has to build that banded jacobian, the idea behind the functions _complex_to_real_jac, _transform_banded_jac, etc.
The answer is in the comments.
A complex matrix space is a real vector space, so a complex matrix can be represented by an array of real numbers preserving this linear structure. All odeintw has to do is to wrap odeint or better the function given to it with this basis transformation, forward and backward.
Now if you want to speed up the computation by providing the Jacobian, it also needs to be translated into the real form. In the method of lines as example you get banded Jacobians, the translation has to keep that property for efficiency reasons.
M-o-L is a common method in solving PDE of the heat or wave equation type. Essentially, it discretizes the space dimension(s) while leaving the time dimension continuous, resulting in a large-dimensional ODE system in time direction. The resulting Jacobians only are non-zero at nearest-neighbor interactions, thus very sparse, and have a banded structure if the discretization is via a regular grid.
lutz-lehmann
The nontrivial part arises when you want to specify the Jacobian via the Dfun argument. The complex Jacobian requires that the right-hand side of the equation be complex differentiable (i.e. holomorphic). For example, the function f(z) = z* (the complex conjugate) is not complex differentiable, so you can't specify a complex Jacobian for the equation dz/dt = z*. You would have to rewrite it as a system of real equations. (This example is in the docstring of odeintw.)
If the right-hand side is complex differentiable, then you can give a complex Jacobian via Dfun.
warren-weckesser
Matlab and Julia have the backslash operator that solves linear systems. I don't really know what Matlab does, but Julia does not compute the inverse, but it computes the effect the inverse has on a given vector, which is computationally easier.
I have a numpy sparse matrix and I want to apply its pseudo-inverse to a vector. Does Python have to compute the pseudo-inverse first or is there a backslash-like operator I can use?
Edit: In a sense I want to solve a linear system Ax=b. However the matrix A does not have full rank and the vector b is not in A's range. So the system does not have a solution. So in practice I want to get the vector X that minimises the norm of Ax-b. This is exactly what the pseudo-inverse matrix does. My question is whether I there is a function that will give me that without having to compute the pseudo-inverse first.
I want to find the minimum of a function in python y = f(x)
Problem : the solver tries to compute the gradient with super close x values (delta x around 1e-8), and my function f is not sensitive to such a small step (ie we can see y vary when delta x around 1e-1).
Hence gradient is 0 to the solver, and can not find the proper solution.
I've tried following solvers from scipy, I can't find the option I'm looking for..
scipy.optimize.minimize
scipy.optimize.fmin
In Matlab fmincon , there is an option that does the job 'DiffMinChange' : Minimum change in variables for finite-difference gradients (a positive scalar).
You may want to try and use L-BFGS-B from scipy:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_l_bfgs_b.html
And provide the “epsilon” parameter to be around 0.1/0.05 and see if it makes it better. I am of course assuming that you will let the solver compute the gradient for you by numerical differentiation (I.e., you pass fprime=None and approx_grad=True) to the routine.
I personally despise the “minimize” interface to various solvers so I prefer to deal with the actual solvers themselves.
I am trying to implement least squares:
I have: $y=\theta\omega$
The least square solution is \omega=(\theta^{T}\theta)^{-1}\theta^{T}y
I tryied:
import numpy as np
def least_squares1(y, tx):
"""calculate the least squares solution."""
w = np.dot(np.linalg.inv(np.dot(tx.T,tx)), np.dot(tx.T,y))
return w
The problem is that this method becomes quickly unstable
(for small problems its okay)
I realized that, when I compared the result to this least square calculation:
import numpy as np
def least_squares2(y, tx):
"""calculate the least squares solution."""
a = tx.T.dot(tx)
b = tx.T.dot(y)
return np.linalg.solve(a, b)
Compare both methods:
I tried to fit data with a polynomial of degree 12 [1, x,x^2,x^3,x^4...,x^12]
First method:
Second method:
Do you know why the first method diverges for large polynomials ?
P.S. I only added "import numpy as np" for your convinience, if you want to test the functions.
There are three points here:
One is that it is generally better (faster, more accurate) to solve linear equations rather than to compute inverses.
The second is that it's always a good idea to use what you know about a system of equations (e.g. that the coefficient matrix is positive definite) when computing a solution, in this case you should use numpy.linalg.lstsq
The third is more specifically about polynomials. When using monomials as a basis, you can end up with a very poorly conditioned coefficient matrix, and this will mean that numerical errors tend to be large. This is because, for example, the vectors x->pow(x,11) and x->pow(x,12) are very nearly parallel. You would get a more accurate fit, and be able to use higher degrees, if you were to use a basis of orthogonal polynomials, for example https://en.wikipedia.org/wiki/Chebyshev_polynomials or https://en.wikipedia.org/wiki/Legendre_polynomials
I am going to improve on what was said before. I answered this yesterday.
The problem with higher order polynomials is something called Runge's phenomena. The reason why the person resorted orthogonal polynomials which are known as Hermite polynomials is that they attempt to get rid of the Gibbs phenomenon which is an adverse oscillatory effect when Fourier series methods are applied to non-periodic signals.
You can sometimes improve under the conditioning be resorting to regularizing methods if the matrix is low rank as I did in the other post. Other parts may be due to smoothness properties of the vector.