Subtraction equivalent of itertools.product() - python

I am a college student working on a project analysing some large datasets.
Simplifying my problem, I have a 2 sets of points, In Matrices "A" and "B"
Such that:
A = [[x1, y1], [x2, y2],...] and B = [[x'1, y'1], [x'2, y'2],...]
I would like to create a function which outputs a Matrix, C, with elements:
Cij = atan((y'i - yj)/(x'i - xj))
Essentially, the angle (wrt x.axis) subtended by the line connecting any two points, one from each list.
The dataset is sufficiently large such that nested FOR Loops are not an option.
Current attempts have led me to itertools product function.
If there was an equivalent which provided a subtraction between the elements (i.e y'i-yj ) then I would be able to go from there quite simply.
Is anyone aware of something which would provide this functionality?
Or perhaps any other way of achieving the angle between all of these points without a slow iterative process?
Thanks in advance,

Use numpy for these computations
import numpy as np
A = np.array(A)
B = np.array(B)
C = np.arctan((B[:, None, 1] - A[:, 1]) / (B[:, None, 0] - A[:, 0]))


Efficient way of constructing a 3D stack of block diagonal matrix in numpy/scipy from a 3D stack of matrices

I am trying to construct a stack of block diagonal matrix in the form of nXMXM in numpy/scipy from a given stacks of matrices (nXmXm), where M=k*m with k the number of stacks of matrices. At the moment, I'm using the scipy.linalg.block_diag function in a for loop to perform this task:
import numpy as np
import scipy.linalg as linalg
a = np.ones((5,2,2))
b = np.ones((5,2,2))
c = np.ones((5,2,2))
result = np.zeros((5,6,6))
for k in range(0,5):
result[k,:,:] = linalg.block_diag(a[k,:,:],b[k,:,:],c[k,:,:])
However, since n is in my case getting quite large, I'm looking for a more efficient way than a for loop. I found 3D numpy array into block diagonal matrix but this does not really solve my problem. Anything I could imagine is transforming each stack of matrices into block diagonals
import numpy as np
import scipy.linalg as linalg
a = np.ones((5,2,2))
b = np.ones((5,2,2))
c = np.ones((5,2,2))
a = linalg.block_diag(*a)
b = linalg.block_diag(*b)
c = linalg.block_diag(*c)
and constructing the resulting matrix from it by reshaping
result = linalg.block_diag(a,b,c)
result = result.reshape((5,6,6))
which does not reshape. I don't even know, if this approach would be more efficient, so I'm asking if I'm on the right track or if somebody knows a better way of constructing this block diagonal 3D matrix or if I have to stick with the for loop solution.
Since I'm new to this platform, I don't know where to leave this (Edit or Answer?), but I want to share my final solution: The highlightet solution from panadestein worked very nice and easy, but I'm now using higher dimensional arrays, where my matrices reside in the last two dimensions. Additionally my matrices are no longer of the same dimension (mostly a mixture of 1x1, 2x2, 3x3), so I adopted V. Ayrat's solution with minor changes:
def nd_block_diag(arrs):
shapes = np.array([i.shape for i in arrs])
out = np.zeros(np.append(np.amax(shapes[:,:-2],axis=0), [shapes[:,-2].sum(), shapes[:,-1].sum()]))
r, c = 0, 0
for i, (rr, cc) in enumerate(shapes[:,-2:]):
out[..., r:r + rr, c:c + cc] = arrs[i]
r += rr
c += cc
return out
which works also with array broadcasting, if the input arrays are shaped properly (i.e. the dimensions, which are to be broadcasted are not added automatically). Thanks to pandestein and V. Ayrat for your kind and fast help, I've learned a lot about the possibilites of list comprehensions and array indexing/slicing!
block_diag also just iterate through shapes. Almost all time spend in copying data so you can do it whatever way your want for example with little change of source code of block_diag
arrs = a, b, c
shapes = np.array([i.shape for i in arrs])
out = np.zeros([shapes[0, 0], shapes[:, 1].sum(), shapes[:, 2].sum()])
r, c = 0, 0
for i, (_, rr, cc) in enumerate(shapes):
out[:, r:r + rr, c:c + cc] = arrs[i]
r += rr
c += cc
print(np.allclose(result, out))
# True
I don't think that you can escape all possible loops to solve your problem. One way that I find convenient and perhaps more efficient than your for loop is to use a list comprehension:
import numpy as np
from scipy.linalg import block_diag
# Define input matrices
a = np.ones((5, 2, 2))
b = np.ones((5, 2, 2))
c = np.ones((5, 2, 2))
# Generate block diagonal matrices
mats = np.array([a, b, c]).reshape(5, 3, 2, 2)
result = [block_diag(*bmats) for bmats in mats]
Maybe this can give you some ideas to improve your implementation.

Rolling/Increasing dimensionality of a NumPy array

I'm currently trying to find an easy way to do the following operation to an N dimensional array in Python. For simplicity let's start with a 1 dimensional array of size 4.
X = np.array([1,2,3,4])
What I want to do is create a new array, call it Y, such that:
Y = np.array([1,2,3,4],[2,3,4,1],[3,4,1,2],[4,1,2,3])
So what I'm trying to do is create an array Y such that:
Y[:,i] = np.roll(X[:],-i, axis = 0)
I know how to do this using for loops, but I'm looking for a faster method of doing so. The actual array I'm trying to do this to is a 3 dimensional array, call it X. What I'm looking for is a way to find an array Y, such that:
Y[:,:,:,i,j,k] = np.roll(X[:,:,:],(-i,-j,-k),axis = (0,1,2))
I can do this using the itertools.product class using for loops, but this is quite slow. If anyone has a better way of doing this, please let me know. I also have CUPY installed with a GTX-970, so if there's a way of using CUDA to do this faster please let me know. If anyone wants some more context please let me know.
Here is my original code for computing the position space two point correlation function. The array x0 is an n by n by n real valued array representing a real scalar field. The function iterate(j,s) runs j iterations. Each iteration consists of generating a random float between -s and s for each lattice site. It then computes the change in the action dS and accepts the change with a probability of min(1,exp^(-dS))
def momentum(k,j,s):
global Gxa
Gx = numpy.zeros((n,n,t))
for i1 in range(0,k):
for i2,i3,i4 in itertools.product(range(0,n),range(0,n),range(0,n)):
x1 = numpy.roll(numpy.roll(numpy.roll(x0, -i2, axis = 0),-i3, axis = 1),-i4,axis = 2)
x2 = numpy.mean(numpy.multiply(x0,x1))
Gx[i2,i3,i4] = x2
Gxa = Gxa + Gx
Gxa = Gxa/k
Approach #1
We can extend this idea to our 3D array case here. So, simply concatenate with sliced versions along the three dims and then use np.lib.stride_tricks.as_strided based scikit-image's view_as_windows to efficiently get the final output as the strided-view of the concatenated version, like so -
from skimage.util.shape import view_as_windows
X1 = np.concatenate((X,X[:,:,:-1]),axis=2)
X2 = np.concatenate((X1,X1[:,:-1,:]),axis=1)
X3 = np.concatenate((X2,X2[:-1,:,:]),axis=0)
out = view_as_windows(X3,X.shape)
Approach #2
For really large arrays, we might want to initialize the output array and then re-use X3 from earlier approach to assign with slicing it. This slicing process would be faster than the original-rolling. The implementation would be -
m,n,r = X.shape
Yout = np.empty((m,n,r,m,n,r),dtype=X.dtype)
for i in range(m):
for j in range(n):
for k in range(r):
Yout[:,:,:,i,j,k] = X3[i:i+m,j:j+n,k:k+r]

How to vectorize 3D Numpy arrays

I have a 3D numpy array like a = np.zeros((100,100, 20)). I want to perform an operation over every x,y position that involves all the elements over the z axis and the result is stored in an array like b = np.zeros((100,100)) on the same corresponding x,y position.
Now i'm doing it using a for loop:
d_n = np.array([...]) # a parameter with the same shape as b
for (x,y), v in np.ndenumerate(b):
C = a[x,y,:]
### calculate some_value using C
minv = sys.maxint
depth = -1
C = a[x,y,:]
for d in range(len(C)):
e = 2.5 * float(math.pow(d_n[x,y] - d, 2)) + C[d] * 0.05
if e < minv:
minv = e
depth = d
some_value = depth
if depth == -1:
some_value = len(C) - 1
b[x,y] = some_value
The problem now is that this operation is much slower than others done the pythonic way, e.g. c = b * b (I actually profiled this function and it's around 2 orders of magnitude slower than others using numpy built in functions and vectorized functions, over a similar number of elements)
How can I improve the performance of such kind of functions mapping a 3D array to a 2D one?
What is usually done in 3D images is to swap the Z axis to the first index:
>>> a = a.transpose((2,0,1))
>>> a.shape
(20, 100, 100)
And now you can easily iterate over the Z axis:
>>> for slice in a:
do something
The slice here will be each of your 100x100 fractions of your 3D matrix. Additionally, by transpossing allows you to access each of the 2D slices directly by indexing the first axis. For example a[10] will give you the 11th 2D 100x100 slice.
Bonus: If you store the data contiguosly, without transposing (or converting to a contiguous array using a = np.ascontiguousarray(a.transpose((2,0,1))) the access to you 2D slices will be faster since they are mapped contiguosly in memory.
Obviously you want to get rid of the explicit for loop, but I think whether this is possible depends on what calculation you are doing with C. As a simple example,
a = np.zeros((100,100, 20))
a[:,:] = np.linspace(1,20,20) # example data: 1,2,3,.., 20 as "z" for every "x","y"
b = np.sum(a[:,:]**2, axis=2)
will fill the 100 by 100 array b with the sum of the squared "z" values of a, that is 1+4+9+...+400 = 2870.
If your inner calculation is sufficiently complex, and not amenable to vectorization, then your iteration structure is good, and does not contribute significantly to the calculation time
for (x,y), v in np.ndenumerate(b):
C = a[x,y,:]
for d in range(len(C)):
... # complex, not vectorizable calc
b[x,y] = some_value
There doesn't appear to be a special structure in the 1st 2 dimensions, so you could just as well think of it as 2D mapping on to 1D, e.g. mapping a (N,20) array onto a (N,) array. That doesn't speed up anything, but may help highlight the essential structure of the problem.
One step is to focus on speeding up that C to some_value calculation. There are functions like cumsum and cumprod that help you do sequential calculations on a vector. cython is also a good tool.
A different approach is to see if you can perform that internal calculation over the N values all at once. In other words, if you must iterate, it is better to do so over the smallest dimension.
In a sense this a non-answer. But without full knowledge of how you get some_value from C and d_n I don't think we can do more.
It looks like e can be calculated for all points at once:
e = 2.5 * float(math.pow(d_n[x,y] - d, 2)) + C[d] * 0.05
E = 2.5 * (d_n[...,None] - np.arange(a.shape[-1]))**2 + a * 0.05 # (100,100,20)
E.min(axis=-1) # smallest value along the last dimension
E.argmin(axis=-1) # index of where that min occurs
On first glance it looks like this E.argmin is the b value that you want (tweaked for some boundary conditions if needed).
I don't have realistic a and d_n arrays, but with simple test ones, this E.argmin(-1) matches your b, with a 66x speedup.
How can I improve the performance of such kind of functions mapping a 3D array to a 2D one?
Many functions in Numpy are "reduction" functions*, for example sum, any, std, etc. If you supply an axis argument other than None to such a function it will reduce the dimension of the array over that axis. For your code you can use the argmin function, if you first calculate e in a vectorized way:
d = np.arange(a.shape[2])
e = 2.5 * (d_n[...,None] - d)**2 + a*0.05
b = np.argmin(e, axis=2)
The indexing with [...,None] is used to engage broadcasting. The values in e are floating point values, so it's a bit strange to compare to sys.maxint but there you go:
I, J = np.indices(b.shape)
b[e[I,J,b] >= sys.maxint] = a.shape[2] - 1
* Strickly speaking a reduction function is of the form reduce(operator, sequence) so technically not std and argmin

Calculating long expressions using Numpy (coordinate transform)?

In Pythons Numpy module, is there a function that can calculate long/advanced math expressions on an array? I heard of the numexp module but want to stay clear of further dependencies.
Better yet, can I limit these expressions to only say the first or second element of the sub arrays within my array, without having to unpack them as separate arrays?
Here is my specific problem. I have an array of arrays containing geographic point coordinates looking like this: [[x1,y1],[x2,y2],[x3,y3],etc...]. What I want is to transform these geocoords to pixel coordinates so they can be drawn on an image. I therefore want to run the following expression/calculation on the first element of each subarray, ie the xs:
And on the second element, ie the ys:
These expressions would work in a python for-loop but is too slow, which is why I'm turning to Numpy. I know I can and have tried to just link numpys single math operator functions after each other but still too slow, and besides, to do that I first had to unpack all the xs and ys to separate arrays and repack them together after the calculation making it even slower.
So I guess I'm looking for a more direct Numpy way using less steps to transform my coordinate array using the expressions above. Any ideas?
import numpy as np
points = np.random.rand(10,2)
translation = np.array([180,-90])
scaling = np.array([1024, -768]) / np.array([360,180])
transformed_points = (points + translation) * scaling
This will do what you are looking for. It relies on numpy broadcasting rules to achieve expressiveness and performance.
But rather than explaining exactly how that works, I think you are better off finding yourself a good numpy primer, and starting at the top. numpy is one of the best things about python, and you cant go wrong learning a little more about it. Suffice to say, numpy is certainly up to the kind of task you are facing.
I'm a little confused because I'm not sure exactly what you're saying you already tried, or what the speed condition for success is.
Are you saying you already tried something like the following, but it is too slow?
arr = whatever
arr[:,0] = (arr[:,0] + 180) / (360 * screenwidthpixels)
arr[:,1] = 180 - (arr[:,1] - 90) / (180 * screenheightpixels)
I'm not sure what you mean by "having to unpack" to X and Y. Here's how you avoid unpacking (if i understand...)
arr = np.array([ [x1,y1], [x2,y2], [x3,y3] ])
=> (3, 2)
X = arr[:,0] # fast, creates a view
Y = arr[:,1] # fast too
Further speed up can be achieved by rewriting/simplifying your expressions.
((X+180)/360)/s => (X+180)/(360*s)
(180-((Y+90)/180))/s => (180/s-1/(2*s)) - y/(180*s)
In the first rewrite, you get 2 traverses of the array, instead of 3, and in the second, the array is only traversed twice, instead of 4 times.
In [235]: xs=arange(1000)
In [236]: ys=arange(1, 1001)
In [237]: a=array([xs, ys]).T
In [238]: a
array([[ 0, 1],
[ 1, 2],
[ 2, 3],
[ 997, 998],
[ 998, 999],
[ 999, 1000]])
In [240]: a[:, 0]=(a[:, 0]+180)/360/1024
the a[:, 0] offers a view of the first column of a, it's fast and memory saving. docs for numpy here

Speed up python code for computing matrix cofactors

As part of a complex task, I need to compute matrix cofactors. I did this in a straightforward way using this nice code for computing matrix minors. Here is my code:
def matrix_cofactor(matrix):
C = np.zeros(matrix.shape)
nrows, ncols = C.shape
for row in xrange(nrows):
for col in xrange(ncols):
minor = matrix[np.array(range(row)+range(row+1,nrows))[:,np.newaxis],
C[row, col] = (-1)**(row+col) * np.linalg.det(minor)
return C
It turns out that this matrix cofactor code is the bottleneck, and I would like to optimize the code snippet above. Any ideas as to how to do this?
If your matrix is invertible, the cofactor is related to the inverse:
def matrix_cofactor(matrix):
return np.linalg.inv(matrix).T * np.linalg.det(matrix)
This gives large speedups (~ 1000x for 50x50 matrices). The main reason is fundamental: this is an O(n^3) algorithm, whereas the minor-det-based one is O(n^5).
This probably means that also for non-invertible matrixes, there is some clever way to calculate the cofactor (i.e., not use the mathematical formula that you use above, but some other equivalent definition).
If you stick with the det-based approach, what you can do is the following:
The majority of the time seems to be spent inside det. (Check out line_profiler to find this out yourself.) You can try to speed that part up by linking Numpy with the Intel MKL, but other than that, there is not much that can be done.
You can speed up the other part of the code like this:
minor = np.zeros([nrows-1, ncols-1])
for row in xrange(nrows):
for col in xrange(ncols):
minor[:row,:col] = matrix[:row,:col]
minor[row:,:col] = matrix[row+1:,:col]
minor[:row,col:] = matrix[:row,col+1:]
minor[row:,col:] = matrix[row+1:,col+1:]
This gains some 10-50% total runtime depending on the size of your matrices. The original code has Python range and list manipulations, which are slower than direct slice indexing. You could try also to be more clever and copy only parts of the minor that actually change --- however, already after the above change, close to 100% of the time is spent inside numpy.linalg.det so that furher optimization of the othe parts does not make so much sense.
The calculation of np.array(range(row)+range(row+1,nrows))[:,np.newaxis] does not depended on col so you could could move that outside the inner loop and cache the value. Depending on the number of columns you have this might give a small optimization.
Instead of using the inverse and determinant, I'd suggest using the SVD
def cofactors(A):
U,sigma,Vt = np.linalg.svd(A)
N = len(sigma)
g = np.tile(sigma,N)
g[::(N+1)] = 1
G = np.diag(-(-1)**N*np.product(np.reshape(g,(N,N)),1))
return U # G # Vt
from sympy import *
A = Matrix([[1,2,0],[0,3,0],[0,7,1]])
And the output (which is cofactor matrix) is:
[ 3, 0, 0],
[-2, 1, -7],
[ 0, 0, 3]])

