I am trying to make my own CFD solver and one of the most computationally expensive parts is solving for the pressure term. One way to solve Poisson differential equations faster is by using a multigrid method. The basic recursive algorithm for this is:
function phi = V_Cycle(phi,f,h)
% Recursive V-Cycle Multigrid for solving the Poisson equation (\nabla^2 phi = f) on a uniform grid of spacing h
% Pre-Smoothing
phi = smoothing(phi,f,h);
% Compute Residual Errors
r = residual(phi,f,h);
% Restriction
rhs = restriction(r);
eps = zeros(size(rhs));
% stop recursion at smallest grid size, otherwise continue recursion
if smallest_grid_size_is_achieved
eps = smoothing(eps,rhs,2*h);
eps = V_Cycle(eps,rhs,2*h);
% Prolongation and Correction
phi = phi + prolongation(eps);
% Post-Smoothing
phi = smoothing(phi,f,h);
I've attempted to implement this algorithm myself however it is very slow and doesn't give good results so evidently it is doing something wrong. I've been trying to find why for too long and I think it's just worthwhile seeing if anyone can help me.
If I use a grid size of 2^5 by 2^5 points, then it can solve it and give reasonable results. However, as soon as I go above this it takes exponentially longer to solve and basically get stuck at some level of inaccuracy, no matter how many V-Loops are performed. at 2^7 by 2^7 points, the code takes way too long to be useful.
I think my main issue is that my implementation of a jacobian iteration is using linear algebra to calculate the update at each step. This should, in general, be fast however, the update matrix A is an n*m sized matrix, and calculating the dot product of a 2^7 * 2^7 sized matrix is expensive. As most of the cells are just zeros, should I calculate the result using a different method?
if anyone has any experience in multigrid methods, I would appreciate any advice!
I am trying to use the "brute" method to minimize a function of 20 variables. It is failing with a mysterious error. Here is the complete code:
import random
import numpy as np
import lmfit
def progress_update(params, iter, resid, *args, **kws):
def score(params, data = None):
parvals = params.valuesdict()
M = data
X_params = []
Y_params = []
for i in range(M.shape[0]):
for j in range(M.shape[1]):
return diff(M, X_params, Y_params)
def diff(M, X_params, Y_params):
total = 0
for i in range(M.shape[0]):
for j in range(M.shape[1]):
total += abs(M[i,j] - (X_params[i] - Y_params[j])**2)
return total
dim = 10
M = np.empty((dim, dim))
for i in range(M.shape[0]):
for j in range(M.shape[1]):
M[i,j] = i*random.random()+j**2
params = lmfit.Parameters()
for i in range(M.shape[0]):
params.add('x'+str(i), value=random.random()*10, min=0, max=10)
for j in range(M.shape[1]):
params.add('y'+str(j), value=random.random()*10, min=0, max=10)
result = lmfit.minimize(score, params, method='brute', kws={'data': M}, iter_cb=progress_update)
However, this fails with:
ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size.
What is causing this problem?
"What is causing this problem"
You can't brute force a high dimensional problem because brute force methods require exponential work (time, and memory if implemented naively).
More directly, lmfit uses numpy (*) under the hood, which has a maximum size of how much data it can allocate. Your initial data structure isn't too big (10x10), it's the combinatorical table required for a brute force that's causing problems.
If you're willing to hack the implementation, you could switch to a sparse memory structure. But this doesn't solve the math problem.
On High Dimensional Optimization
Try a different minimzer, but be warned: it's very difficult to minimze globally in high dimensional space. "Local minima" methods like fixed point / gradient descent might be more productive.
I hate to be pessimistic, but high level optimization is very hard when probed generally, and I'm afraid is beyond the scope of an SO question. Here is a survey.
Practical Alternatives
Gradient descent is supported a little in sklearn but more for machine learning than general optimization; scipy actually has pretty good optimization coverage, and great documentation. I'd start there. It's possible to do gradient descent there too, but not necessary.
From scipy's docs on unconstrained minimization, you have many options:
Method Nelder-Mead uses the Simplex algorithm [], []. This algorithm
is robust in many applications. However, if numerical computation of
derivative can be trusted, other algorithms using the first and/or
second derivatives information might be preferred for their better
performance in general.
Method Powell is a modification of Powell’s method [], [] which is a
conjugate direction method. It performs sequential one-dimensional
minimizations along each vector of the directions set (direc field in
options and info), which is updated at each iteration of the main
minimization loop. The function need not be differentiable, and no
derivatives are taken.
and many more derivative-based methods are available. (In general, you do better when you have derivative information available.)
Footnotes/Looking at the Source Code
(*) the actual error is thrown here, based on your numpy implementation. Quoted:
`if (npy_mul_with_overflow_intp(&nbytes, nbytes, dim)) {
"array is too big; `arr.size * arr.dtype.itemsize` "
"is larger than the maximum possible size.");
return NULL;`
My question is how can I put a weighted least squares problem into a python solver. I'm trying to implement the approaches in the paper found here (PDF warning). There is an overview of the problem at the bottom of the post.
Specifically I want to start with the following minimization equation (19 in the paper):
latex formula can be found here:
\frac{min}{\Theta \epsilon M} \sum_{j=1}^{n} \sum_{i=1}^{m}(w(i,j))\left | \Psi(i,j)*\Theta (i,j) - I(i,j) \right |^{2}
It is represented as a weighted least squares problem.
w, psi, and I are my knowns, and I am trying to solve for theta.
I tried at first creating a function that takes a theta and returns the sum of this equation exactly as it's expressed above. Then I passed it to scipy.optimize.least_squares, but the theta values always remained the same after optimization. I tried implementing a jacobian, but the resulting sum explodes to huge negative values. It also takes ages as I'm attempting to run this on images (I is the pixel value for a pixel j with light i).
I then realized I'm almost certainly misunderstanding how to solve this problem and could use some help approaching it. My current code is below:
def theta_solver(self, theta):
imshape = self.images.shape
sm = 0
for j in j_array:
for i in i_array:
w = self.get_w(i, j, theta)
psi = self.non_diff_smoothing(self.get_psi(i, j))
diff = psi*(theta[i, j]) - self.I[i, j]
res = w*(diff)
sm += res
return sm
def solve_theta(self, theta_guess):
res = scipy.optimize.least_squares(self.theta_solver, theta_guess)
Something tells me I'm way off base for how I'm approaching this problem, and I could use a finger in the right direction. Thanks for your time.
Problem overview:
This particular vision approach is called photometric stereo. By taking several images of a scene with different light sources, we can create a 3D reconstruction of that scene.
One issue is the 1/r^2 decay in lighting is dependent on distance from the light source, which means this can't be solved by normal linear solutions.
The approach documented in the paper is a nonlinear approach for solving near light photometric stereo. It does two things:
it solves the surface Z, and
the albedos/intensities at each pixel represented by theta, by alternating the solvers.
In this question I'm only trying to solve the theta element of the equation, which can be solved via weighted least squares.
Turns out I was heavily overthinking the problem. This can be decomposed to a simple linear solution of the form Ax = b. When looking at an error equation, in this case:
argmin(THETA) sum(W * ||PSI * THETA - I||^2)
we can just distribute the weight through the parts within the root mean square. Our equation ends up being:
W * PSI * THETA = W * I
Which we can solve using your favorite linear solver (i.e. conjugate gradient descent)
Some research that I am working on requires symbolically taking the determinant of large matrices; matrices ranging from 18x18 to 318x318. The matrix entries are either numeric or polynomials of degree two in the same variable, omega.
Currently, I am trying to use the .det() method in SymPy but it is very slow; an 18x18 matrix has been running for over 45 minutes now and is still computing as I write this. I realize that determinant calculations are very intensive, but is there anything I can do to speed this up?
I've already read the post at Speeding up computation of symbolic determinant in SymPy but did not take anything away from the post as to what could actually be done to speed the process up. What can I do?
SymPy is not naive about determinants (see MatrixDeterminant class) but it appears that juggling symbolic expression throughout the computation is a slow process. When the determinant is known to be a polynomial of certain degree (because the matrix entries are), it turns out to be faster to compute its numeric values for several values of the variable, and interpolate.
My test case is a dense 15 by 15 matrix full of quadratic polynomials of variable omega, with integer coefficients. I still use SymPy's .det method for the numeric determinants, so the coefficients end up being exactly the same long integers either way.
import numpy as np
from sympy import *
import time
n = 15
omega = Symbol('omega')
A = Matrix(np.random.randint(low=0, high=20, size=(n, n)) + omega*np.random.randint(low=0, high=20, size=(n, n)) + omega**2 * np.random.randint(low=0, high=20, size=(n, n)))
start = time.time()
p1 = A.det() # direct computation
print('Time: ' + str(time.time() - start))
start = time.time()
xarr = range(-n, n+1) # 2*n+1 points to get a polynomial of degree 2*n
yarr = [A.subs(omega, x).det() for x in xarr] # numeric values
p2 = expand(interpolating_poly(len(xarr), omega, xarr, yarr)) # interpolation
print('Time: ' + str(time.time() - start))
Both p1 and p2 are the same polynomial. Running time (on a pretty slow machine, t2.nano from Amazon):
74.6 seconds for direct computation,
5.4 seconds for the interpolation.
If your coefficients are floating point numbers and you don't expect exact arithmetical results when dealing with them, further speedup may be achieved by evaluating the matrix as a NumPy array, and using a NumPy method for the determinant:
Anum = lambdify(omega, A)
yarr = [np.linalg.det(Anum(x)) for x in xarr]
As a follow up to anyone else looking at this thread: Since trying to solve this problem a few years ago, I've learned a lot more about numerical methods and general computation and realized just how infeasible taking a symbolic determinant of a matrix that large is. I ended up solving this problem numerically by converting it to an eigenvalue problem. Moral of the story... there's usually multiple ways of solving a problem and some may be more feasible than others.
I'm trying to simulate a simple diffusion based on Fick's 2nd law.
from pylab import *
import numpy as np
gridpoints = 128
def profile(x):
range = 2.
straggle = .1576
dose = 1
return dose/(sqrt(2*pi)*straggle)*exp(-(x-range)**2/2/straggle**2)
x = linspace(0,4,gridpoints)
nx = profile(x)
dx = x[1] - x[0] # use np.diff(x) if x is not uniform
dxdx = dx**2
timestep = 0.5
steps = 21
diffusion_coefficient = 0.002
for i in range(steps):
coefficients = [-1.785714e-3, 2.539683e-2, -0.2e0, 1.6e0,
1.6e0, -0.2e0, 2.539683e-2, -1.785714e-3]
ccf = (np.convolve(nx, coefficients) / dxdx)[4:-4] # second order derivative
nx = timestep*diffusion_coefficient*ccf + nx
for the first few time steps everything looks fine, but then I start to get high frequency noise, do to build-up from numerical errors which are amplified through the second derivative. Since it seems to be hard to increase the float precision I'm hoping that there is something else that I can do to suppress this? I already increased the number of points that are being used to construct the 2nd derivative.
I don't have the time to study your solution in detail, but it seems that you are solving the partial differential equation with a forward Euler scheme. This is pretty easy to implement, as you show, but this can become numerical instable if your timestep is too small. Your only solution is to reduce the timestep or to increase the spatial resolution.
The easiest way to explain this is for the 1-D case: assume your concentration is a function of spatial coordinate x and timestep i. If you do all the math (write down your equations, substitute the partial derivatives with finite differences, should be pretty easy), you will probably get something like this:
C(x, i+1) = [1 - 2 * k] * C(x, i) + k * [C(x - 1, i) + C(x + 1, i)]
so the concentration of a point on the next step depends on its previous value and the ones of its two neighbors. It is not too hard to see that when k = 0.5, every point gets replaced by the average of its two neighbors, so a concentration profile of [...,0,1,0,1,0,...] will become [...,1,0,1,0,1,...] on the next step. If k > 0.5, such a profile will blow up exponentially. You calculate your second order derivative with a longer convolution (I effectively use [1,-2,1]), but I guess that does not change anything for the instability problem.
I don't know about normal diffusion, but based on experience with thermal diffusion, I would guess that k scales with dt * diffusion_coeff / dx^2. You thus have to chose your timestep small enough so that your simulation does not become instable. To make the simulation stable, but still as fast as possible, chose your parameters so that k is a bit smaller than 0.5. Something similar can be derived for 2-D and 3-D cases. The easiest way to achieve this is to increase dx, since your total calculation time will scale with 1/dx^3 for a linear problem, 1/dx^4 for 2-D problems, and even 1/dx^5 for 3-D problems.
There are better methods to solve diffusion equations, I believe that Crank Nicolson is at least standard for solving heat-equations (which is also a diffusion problem). The 'problem' is that this is an implicit method, which means that you have to solve a set of equations to calculate your 'concentration' at the next timestep, which is a bit of a pain to implement. But this method is guaranteed to be numerical stable, even for big timesteps.
I have a very large absorbing Markov chain (scales to problem size -- from 10 states to millions) that is very sparse (most states can react to only 4 or 5 other states).
I need to calculate one row of the fundamental matrix of this chain (the average frequency of each state given one starting state).
Normally, I'd do this by calculating (I - Q)^(-1), but I haven't been able to find a good library that implements a sparse matrix inverse algorithm! I've seen a few papers on it, most of them P.h.D. level work.
Most of my Google results point me to posts talking about how one shouldn't use a matrix inverse when solving linear (or non-linear) systems of equations... I don't find that particularly helpful. Is the calculation of the fundamental matrix similar to solving a system of equations, and I simply don't know how to express one in the form of the other?
So, I pose two specific questions:
What's the best way to calculate a row (or all the rows) of the inverse of a sparse matrix?
What's the best way to calculate a row of the fundamental matrix of a large absorbing Markov chain?
A Python solution would be wonderful (as my project is still currently a proof-of-concept), but if I have to get my hands dirty with some good ol' Fortran or C, that's not a problem.
Edit: I just realized that the inverse B of matrix A can be defined as AB=I, where I is the identity matrix. That may allow me to use some standard sparse matrix solvers to calculate the inverse... I've got to run off, so feel free to complete my train of thought, which I'm starting to think might only require a really elementary matrix property...
Assuming that what you're trying to do is work out is the expected number of steps before absorbtion, the equation from "Finite Markov Chains" (Kemeny and Snell), which is reproduced on Wikipedia is:
Or expanding the fundamental matrix
Which is in the standard format for using functions for solving systems of linear equations
Putting this into practice to demonstrate the difference in performance (even for much smaller systems than those you're describing).
import networkx as nx
import numpy
def example(n):
"""Generate a very simple transition matrix from a directed graph
g = nx.DiGraph()
for i in xrange(n-1):
g.add_edge(i+1, i)
g.add_edge(i, i+1)
g.add_edge(n-1, n)
g.add_edge(n, n)
m = nx.to_numpy_matrix(g)
# normalize rows to ensure m is a valid right stochastic matrix
m = m / numpy.sum(m, axis=1)
return m
Presenting the two alternative approaches for calculating the number of expected steps.
def expected_steps_fundamental(Q):
I = numpy.identity(Q.shape[0])
N = numpy.linalg.inv(I - Q)
o = numpy.ones(Q.shape[0])
def expected_steps_fast(Q):
I = numpy.identity(Q.shape[0])
o = numpy.ones(Q.shape[0])
numpy.linalg.solve(I-Q, o)
Picking an example that's big enough to demonstrate the types of problems that occur when calculating the fundamental matrix:
P = example(2000)
# drop the absorbing state
Q = P[:-1,:-1]
Produces the following timings:
%timeit expected_steps_fundamental(Q)
1 loops, best of 3: 7.27 s per loop
%timeit expected_steps_fast(Q)
10 loops, best of 3: 83.6 ms per loop
Further experimentation is required to test the performance implications for sparse matrices, but it's clear that calculating the inverse is much much slower than what you might expect.
A similar approach to the one presented here can also be used for the variance of the number of steps
The reason you're getting the advice not to use matrix inverses for solving equations is because of numerical stability. When you're matrix has eigenvalues that are zero or near zero, you have problems either from lack of an inverse (if zero) or numerical stability (if near zero). The way to approach the problem, then, is to use an algorithm that doesn't require that an inverse exist. The solution is to use Gaussian elimination. This doesn't provide a full inverse, but rather gets you to row-echelon form, a generalization of upper-triangular form. If the matrix is invertible, then the last row of the result matrix contains a row of the inverse. So just arrange that the last row you eliminate on is the row you want.
I'll leave it to you to understand why I-Q is always invertible.