I'm lost when iterating over a ndarray with nditer.
Background
I am trying to compute the eigenvalues of 3x3 symmetric matrices for each point in a 3D array.
My data is a 4D array of shape [6,x,y,z] with the 6 values being the values of matrix at point x,y,z, over a ~500x500x500 cube of float32.
I first used numpy's eigvalsh, but it's optimized for large matrices, while I can use analytical simplification for 3x3 symmetric matrices.
I then implemented wikipedia's simplification , both as a function that takes a single matrix and computes eigenvalues (then iterating naively with nested for loops), and then vectorized using numpy.
The problem is that now inside my vectorization, each operation creates an internal array of my data's size, culminating in too much RAM used and PC freeze.
I tried using numexpr etc, it's still around 10G usage.
What I'm trying to do
I want to iterate (using numpy's nditer) through my array so that for each matrix, I compute my eigenvalues. This would remove the need to allocate huge intermediary arrays because we only calculate ~ 10 float numbers at a time.
Basically trying to substitute nested for loops into one iterator.
I'm looking for something like this :
for a,b,c,d,e,f in np.nditer([symMatrix,eigenOut]): # for each matrix in x,y,z
# computing my output for this matrix
eigenOut[...] = myLovelyEigenvalue(a,b,c,d,e,f)
The best I have so far is this :
for i in np.nditer([derived],[],[['readonly']],op_axes=[[1,2,3]]):
But this means that i takes all values of the 4D array instead of being a tuple of 6 length.
I can't seem to get the hang of the nditer documentation.
What am I doing wrong ? Do you have any tips and tricks as to iterating over "all but one" axis ?
The point is to have an nditer that would outperform regular nested loops on iteration (once this works i'll change function calls, buffer iteration ... but so far I just want it to work ^^)
You don't really need np.nditer for this. A simpler way of iterating over all but the first axis is just to reshape into a [6, 500 ** 3] array, transpose it to [500 ** 3, 6], then iterate over the rows:
for (a, b, c, d, e, f) in (symMatrix.reshape(6, -1).T):
# do something involving a, b, c, d, e, f...
If you really want to use np.nditer then you would do something like this:
for (a, b, c, d, e, f) in np.nditer(x, flags=['external_loop'], order='F'):
# do something involving a, b, c, d, e, f...
A potentially important thing to consider is that if symMatrix is C-order (row-major) rather than Fortran-order (column-major) then iterating over the first dimension may be significantly faster than iterating over the last 3 dimensions, since then you will be accessing adjacent blocks of memory address. You might therefore want to consider switching to Fortran-order.
I wouldn't expect a massive performance gain from either of these, since at the end of the day you're still doing all of your looping in Python and operating only on scalars rather than taking advantage of vectorization.
Related
I would like to vectorize nested sums in Python, in order to speed up the process. At the moment I have nested for loops.
for ja in np.arange(0,Na):
for jb in np.arange(0,Nb):
for ma in np.arange(-ja,ja+1):
...
The end result is the sum across 2x2 matrices, each with entries dependent on the values of ja,jb,ma,mb.
The matrices look like:
[[f11(ja,jb,ma,mb),f12(ja,jb,ma,mb)],
[f21(ja,jb,ma,mb),f22(ja,jb,ma,mb)]]
where fij are functions. The functions can be applied to arrays as they will work element by element (expoenetials, square roots, trig functions etc...). I can create arrays like:
ja=[0,0,0,1,1,1,2,2,2,3,3,3]
jb=[0,1,2,3,0,1,2,3,0,1,2,3]
By using
range_a = np.arange(0,Na/2+1)
range_b = np.arange(0,Nb/2+1)
ja = np.tile(a_range,Nb/2+1)
jb = np.repeat(b_range,Na/2+1)
But my trouble is to create arrays, such that for each value in the above j we have the m structure (from -j to j):
ma=[0,0,0,-1,-1,-1,0,0,0,1,1,1,-2,-2,-2,...]
mb=[0,-1,0,1,-2,-1,0,1,2,...]
I am having trouble making those m arrays though! each time a -j,..,j structure repeats it has a different length, so I cannot use functions like numpy.tile and numpy.repeat. So, any ideas on how to do this?
My further intentions might be relevant: I hope to be able to pad them with zeros and construct 2*length matrices, so that only one entry is populated (I need 4 matrices for each vector, total of 4 vectors). Then I can apply the functions to and add these upp. To calculate the summation across the 2*2 matrices I will dot product with a matrix that is length*4 in shape. The result is 2*2. Perhaps a better strategy exists? I thought this might have occurred before, as it has a common application in physics (trace over a density operator) but I have not found it.
I have two np.matrixes, one of which I'm trying to normalize. I know, in general, list comprehensions are faster than for loops, so I'm trying to convert my double for loop into a list expression.
# normalize the rows and columns of A by B
for i in range(1,q+1):
for j in range(1,q+1):
A[i-1,j-1] = A[i-1,j-1] / (B[i-1] / B[j-1])
This is what I have gotten so far:
A = np.asarray([A/(B[i-1]/B[j-1]) for i, j in zip(range(1,q+1), range(1,q+1))])
but I think I'm taking the wrong approach because I'm not seeing any significant time difference.
Any help would be appreciated.
First, if you really do mean np.matrix, stop using np.matrix. It has all sorts of nasty incompatibilities, and its role is obsolete now that # for matrix multiplication exists. Even if you're stuck on a Python version without #, using the dot method with normal ndarrays is still better than dealing with np.matrix.
You shouldn't use any sort of Python-level iteration construct with NumPy arrays, whether for loops or list comprehensions, unless you're sure you have no better options. Assuming A is 2D and B is 1D with shapes (q, q) and (q,) respectively, what you should instead do for this case is
A *= B
A /= B[:, np.newaxis]
broadcasting the operation over A. This will allow NumPy to perform the iteration at C level directly over the arrays' underlying data buffers, without having to create wrapper objects and perform dynamic dispatch on every operation.
I am very new to Python, and I am trying to get used to performing Python's array operations rather than looping through arrays. Below is an example of the kind of looping operation I am doing, but am unable to work out a suitable pure array operation that does not rely on loops:
import numpy as np
def f(arg1, arg2):
# an arbitrary function
def myFunction(a1DNumpyArray):
A = a1DNumpyArray
# Create a square array with each dimension the size of the argument array.
B = np.zeros((A.size, A.size))
# Function f is a function of two elements of the 1D array. For each
# element, i, I want to perform the function on it and every element
# before it, and store the result in the square array, multiplied by
# the difference between the ith and (i-1)th element.
for i in range(A.size):
B[i,:i] = f(A[i], A[:i])*(A[i]-A[i-1])
# Sum through j and return full sums as 1D array.
return np.sum(B, axis=0)
In short, I am integrating a function which takes two elements of the same array as arguments, returning an array of results of the integral.
Is there a more compact way to do this, without using loops?
The use of an arbitrary f function, and this [i, :i] business complicates by passing a loop.
Most of the fast compiled numpy operations work on the whole array, or whole rows and/or columns, and effectively do so in parallel. Loops that are inherently sequential (value from one loop depends on the previous) don't fit well. And different size lists or arrays in each loop are also a good indicator that 'vectorizing' will be difficult.
for i in range(A.size):
B[i,:i] = f(A[i], A[:i])*(A[i]-A[i-1])
With a sample A and known f (as simple as arg1*arg2), I'd generate a B array, and look for patterns that treat B as a whole. At first glance it looks like your B is a lower triangle. There are functions to help index those. But that final sum might change the picture.
Sometimes I tackle these problems with a bottom up approach, trying to remove inner loops first. But in this case, I think some sort of big-picture approach is needed.
I have two boolean sparse square matrices of c. 80,000 x 80,000 generated from 12BM of data (and am likely to have orders of magnitude larger matrices when I use GBs of data).
I want to multiply them (which produces a triangular matrix - however I dont get this since I don't limit the dot product to yield a triangular matrix).
I am wondering what the best way of multiplying them is (memory-wise and speed-wise) - I am going to do the computation on a m2.4xlarge AWS instance which has >60GB of RAM. I would prefer to keep the calc in RAM for speed reasons.
I appreciate that SciPy has sparse matrices and so does h5py, but have no experience in either.
Whats the best option to go for?
Thanks in advance
UPDATE: sparsity of the boolean matrices is <0.6%
If your matrices are relatively empty it might be worthwhile encoding them as a data structure of the non-False values. Say a list of tuples describing the location of the non-False values. Or a dictionary with the tuples as the keys.
If you use e.g. a list of tuples you could use a list comprehension to find the items in the second list that can be multiplied with an element from the first list.
a = [(0,0), (3,7), (5,2)] # et cetera
b = ... # idem
for r, c in a:
res = [(r, k) for j, k in b if k == j]
-- EDITED TO SATISFY BELOW COMMENT / DOWNVOTER --
You're asking how to multiply matrices fast and easy.
SOLUTION 1: This is a solved problem: use numpy. All these operations are easy in numpy, and since they are implemented in C, are rather blazingly fast.
http://www.numpy.org/
http://www.scipy.org
also see:
Very large matrices using Python and NumPy
http://docs.scipy.org/doc/scipy/reference/sparse.html
SciPy and Numpy have sparse matrices and matrix multiplication. It doesn't use much memory since (at least if I wrote it in C) it probably uses linked lists, and thus will only use the memory required for the sum of the datapoints, plus some overhead. And, it will almost certainly be blazingly fast compared to pure python solution.
SOLUTION 2
Another answer here suggests storing values as tuples of (x, y), presuming value is False unless it exists, then it's true. Alternate to this is a numeric matrix with (x, y, value) tuples.
REGARDLESS: Multiplying these would be Nasty time-wise: find element one, decide which other array element to multiply by, then search the entire dataset for that specific tuple, and if it exists, multiply and insert the result into the result matrix.
SOLUTION 3 ( PREFERRED vs. Solution 2, IMHO )
I would prefer this because it's simpler / faster.
Represent your sparse matrix with a set of dictionaries. Matrix one is a dict with the element at (x, y) and value v being (with x1,y1, x2,y2, etc.):
matrixDictOne = { 'x1:y1' : v1, 'x2:y2': v2, ... }
matrixDictTwo = { 'x1:y1' : v1, 'x2:y2': v2, ... }
Since a Python dict lookup is O(1) (okay, not really, probably closer to log(n)), it's fast. This does not require searching the entire second matrix's data for element presence before multiplication. So, it's fast. It's easy to write the multiply and easy to understand the representations.
SOLUTION 4 (if you are a glutton for punishment)
Code this solution by using a memory-mapped file of the required size. Initialize a file with null values of the required size. Compute the offsets yourself and write to the appropriate locations in the file as you do the multiplication. Linux has a VMM which will page in and out for you with little overhead or work on your part. This is a solution for very, very large matrices that are NOT SPARSE and thus won't fit in memory.
Note this solves the complaint of the below complainer that it won't fit in memory. However, the OP did say sparse, which implies very few actual datapoints spread out in giant arrays, and Numpy / SciPy handle this natively and thus nicely (lots of people at Fermilab use Numpy / SciPy regularly, I'm confident the sparse matrix code is well tested).
i want to create a matrix of size 1234*5678 with it being filled with 1 to 5678 in row major order?>..!!
I think you will need to use numpy to hold such a big matrix efficiently , not just computation. You have ~5e6 items of 4/8 bytes means 20/40 Mb in pure C already, several times of that in python without an efficient data structure (a list of rows, each row a list).
Now, concerning your question:
import numpy as np
a = np.empty((1234, 5678), dtype=np.int)
a[:] = np.linspace(1, 5678, 5678)
You first create an array of the requested size, with type int (I assume you know you want 4 bytes integer, which is what np.int will give you on most platforms). The 3rd line uses broadcasting so that each row (a[0], a[1], ... a[1233]) is assigned the values of the np.linspace line (which gives you an array of [1, ....., 5678]). If you want F storage, that is column major:
a = np.empty((1234, 4567), dtype=np.int, order='F')
...
The matrix a will takes only a tiny amount of memory more than an array in C, and for computation at least, the indexing capabilities of arrays are much better than python lists.
A nitpick: numeric is the name of the old numerical package for python - the recommended name is numpy.
Or just use Numerical Python if you want to do some mathematical stuff on matrix too (like multiplication, ...). If they use row major order for the matrix layout in memory I can't tell you but it gets coverd in their documentation
Here's a forum post that has some code examples of what you are trying to achieve.