Calculate 3D variant for summed area table using numpy cumsum

Calculate 3D variant for summed area table using numpy cumsum - python

In case of a 2D array array.cumsum(0).cumsum(1) gives the Integral image of the array.
What happens if I compute array.cumsum(0).cumsum(1).cumsum(2) over a 3D array?
Do I get a 3D extension of Integral Image i.e, Integral volume over the array?
Its hard to visualize what happens in case of 3D.
I have gone through this discussion.
3D variant for summed area table (SAT)
This gives a recursive way on how to compute the Integral volume. What if I use the cumsum along the 3 axes. Will it give me the same thing?
Will it be more efficient than the recursive method?

Yes, the formula you give, array.cumsum(0).cumsum(1).cumsum(2), will work.
What the formula does is compute a few partial sums so that the sum of these sums is the volume sum. That is, every element needs to be summed exactly once, or, in other words, no element can be skipped and no element counted twice. I think going through each of these questions (is any element skipped or counted twice) is a good way to verify to yourself that this will work. And also run a small test:
x = np.ones((20,20,20)).cumsum(0).cumsum(1).cumsum(2)
print x[2,6,10] # 231.0
print 3*7*11 # 231
Of course, with all ones there could two errors that cancel each other out, but this wouldn't happen everywhere so it's a reasonable test.
As for efficiency, I would guess that the single pass approach is probably faster, but not by a lot. Also, the above could be sped up using an output array, eg, cumsum(n, out=temp) as otherwise three arrays will be created for this calculation. The best way to know is to test (but only if you need to).

Related

Difference between np.linalg.norm(a-b) and np.sqrt(np.sum(np.square(a-b)))?

I'm trying to find the Euclidean distance between two images for a K-nearest neighbor algorithm. However, upon exploring some distance functions, I'm facing this discrepancy.
norm1 = np.sqrt(np.sum(np.square(image1-image2))))
norm2 = np.linalg.norm(image1-image2)
Both of these lines seem to be giving different results. Upon trying the same thing with simple 3D Numpy arrays, I seem to get the same results, but with my images, the answers are different. I'm not sure which one is the correct one to use so, any help is welcome, thanks in advance!

Indeed, the two gives different results in your case while the approach are mathematically equal. This is because image1 and image2 are likely of the type uint8 and np.square does not cast the result to a bigger type. This means using np.square gives simply wrong results because of overflows. In fact, the subtraction already gives wrong results... You need to cast the input to a bigger type so to avoid overflows. Here is an example:
norm1 = np.sqrt(np.sum(np.square(image1.astype(np.int32)-image2.astype(np.int32))))
norm2 = np.linalg.norm(image1.astype(np.int32)-image2.astype(np.int32))
With that, you should get almost the same result (possibly with a difference of few ULPs that should be negligible here).
Note that np.linalg.norm is likely significantly faster because it should not create temporary arrays as opposed to np.sqrt+np.sum+np.square.

How to (robustly) find a local minimum on a 2D array in python?

I have a 2D array which represent the value of a certain function f(x, y), and I would like to detect the minimum on the array.
Generally it just looks like that, so it's easy to spot the minimum.
Example normal minimum
But sometimes there is a kind of drift, which means that the actual minimum is not the one I'm looking for.
Example failed minimum
On the above image the minimum I'm looking for is on the left, but the right part of the image has smaller values.
It is really important for me that I get an exact value, precise to the pixel, which is why I can't really use a maximum filter or stuff like that.
I'm looking for a computationally efficient way to detect this minimum, so I'd rather use an existing method instead of doing my own code.

To get the index of the smallest value in a 2D array I would suggest something like this:
find_smallest = lambda arr: np.unravel_index(np.argmin(arr),arr.shape)

Maybe you could try some of the solvers in scipy.optimize (for example Nelder-Mead). These will look for local minima.
Then, to get the minimum that you're looking for, you could try to launch several optimizations from different random points and discard the solutions that end up close to the border of the image. This is, supposing that your minimum is somewhere away from the border. It's an ugly method and probably not very efficient, but I have a little hope that it could work. Also, you'll probably have to tune the parameters.
Otherwise maybe there is some magic that can be done with the gradient function. I guess you'll have to look for some point that has positive derivatives all around it.

elimination the linear dependent columns of a non-square matrix in python

I have a matrix A = np.array([[1,1,1],[1,2,3],[4,4,4]]) and I want only the linearly independent rows in my new matrix. The answer might be A_new = np.array([1,1,1],[1,2,3]]) or A_new = np.array([1,2,3],[4,4,4])
Since I have a very large matrix so I need to decompose the matrix into smaller linearly independent full rank matrix. Can someone please help?

There are many ways to do this, and which way is best will depend on your needs. And, as you noted in your statement, there isn't even a unique output.
One way to do this would be to use Gram-Schmidt to find an orthogonal basis, where the first $k$ vectors in this basis have the same span as the first $k$ independent rows. If at any step you find a linear dependence, drop that row from your matrix and continue the procedure.
A simple way do do this with numpy would be,
q,r = np.linalg.qr(A.T)
and then drop any columns where R_{i,i} is zero.
For instance, you could do
A[np.abs(np.diag(R))>=1e-10]
While this will work perfectly in exact arithmetic, it may not work as well in finite precision. Almost any matrix will be numerically independent, so you will need some kind of thresholding to determine if there is a linear dependence. If you use the built in QR method, you will have to make sure that there is no dependence on columns which you previously dropped.
If you need even more stability, you could iteratively solve the least squares problem
A.T[:,dependent_cols] x = A.T[:,col_to_check]
using a stable direct method. If you can solve this exactly, then A.T[:,k] is dependent on the previous vectors, with the combination given by x.
Which solver to use may also be dictated by your data type.

Is there a way to tell what makes a particular numpy array singular?

I am trying to generate a few very large arrays, and at least one is ending up being singular, which is made obvious by this familiar error message:
File "C:\Anaconda3\lib\site-packages\numpy\linalg\linalg.py", line 90, in _raise_linalgerror_singular
raise LinAlgError("Singular matrix")
LinAlgError: Singular matrix
Of course I do not want my array to be singular, but I am more interested in determining WHY my array is singular. What I mean by this is that I would like to have a way to answer the following questions without manually checking each entry:
Is the array square? (I believe this is returned by a separate error message, which is convenient, but I'll include this as a singularity property anyway)
Are any rows populated only by zeros?
Are any columns populated only by zeros?
Are any rows not linearly independent of all other rows?
For relatively small arrays, the first two conditions are easily answered by visual inspection. However, because my arrays are substantially large, I do not want to have to go in and manually check each array element to see if any of those conditions are met.
I tried pulling up the linalg.py script to see if I could see how it determines a matrix to be singular, but I could not tell how it determines a matrix to be singular.
(this paragraph was edited for clarity)
I also tried searching for info online, and nothing seemed to be of help. Most topics seemed to only answer some form of the following questions/objectives: 1) "I want Python to tell me if my matrix is singular" or 2) why is Python giving me this error message". Because I already know that my matrix/matrices are singular, neither of these two questions are of importance to me.
Again, I am not looking for an answer along the lines of, "Oh, well this particular matrix is singular because . . .". I am looking for a method I can use immediately on ANY singular matrix to determine (especially for large arrays) what is causing the singularity.
Is there a built-in Python function that does this, or is there some other relatively simple way to do this before I try to create a function that will do this for me?

Singular matrices have at least one eigenvalue equal to zero. You can create a diagonalizable singular matrix by starting from its eigenvalue decomposition:
A = V D V^{-1}
D is the diagonal matrix of eigenvalues. So create any matrix V, the diagonal matrix D that has at least one zero in the diagonal, and then A will be singular.

The traditional way of checking is by computing an SVD. This is what the function numpy.linalg.matrix_rank uses to compute the rank, and you can then check if matrix_rank(M) == M.shape[0] (assuming a square matrix).
For more information, check out this excellent answer to a similar question for Matlab users.
The rank of the matrix will tell you how many rows aren't zero or linear combinations, but not specifically which ones. It's a relatively fast operation, so it might be useful as a first-pass check.

Fastest way to approximately compare values in large numpy arrays?

I have two arrays, array A with ~1M lines and array B with ~400K lines. Each contains, among other things, coordinates of a point. For each point in array A, I need to find how many points in array B are within a certain distance of it. How do I avoid naively comparing everything to everything? Based on its speed at the start, running naively would take 10+ days on my machine. That required nested loops, but the arrays are too large to construct a distance matrix (400G entries!)
I thought the way would be to check only a limited set of B coordinates against each A coordinates. However, I haven't determined an easy way of doing that. That is, what's the easiest/quickest way to make a selection that doesn't require checking all the values in B (which is exactly the same task I'm trying to avoid)?
EDIT: I should've mentioned these aren't 2D (or nD) Cartesian, but spherical surface (lat/long), and distance is great-circle distance.

I cannot give a full answer right now, but some hints to get you started. It will be much more efficient to organise the points in B in a kd-tree. You can use the class scipy.spatial.KDTree to do this easily, and you can use the query() method on this class to request the points within a given distance.

Here is one possible implementation of the cross match between list of points on the sphere using k-d tree.
http://code.google.com/p/astrolibpy/source/browse/my_utils/match_lists.py
Another way is to use healpy module and their get_neighbors method.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.