Fast iteration over vectors in a multidimensional numpy array - python

I'm writing some python + numpy + cython code, and am trying to find the most elegant and efficient way of doing the following kind of iteration over an array:
Let's say I have a function f(x, y) that takes a vector x of shape (3,) and a vector y of shape (10,) and returns a vector of shape (10,). Now I have two arrays X and Y of shape sx + (3,) and sy + (10,), where the sx and sy are two shapes that can be broadcast together (i.e. either sx == sy, or when an axis differs, one of the two has length 1, in which case it will be repeated). I want to produce an array Z that has the shape zs + (10,), where zs is the shape of the broadcasting of sx with sy. Each 10 dimensional vector in Z is equal to f(x, y) of the vectors x and y at the corresponding locations in X and Y.
I looked into np.nditer and while it plays nice with cython (see bottom of linked page), it doesn't seem to allow iterating over vectors from a multidimensional array, instead of elements. I also looked at index grids, but the problem there is that cython indexing is only fast when the number of indexes is equal to the dimensionality of the array, and are stored as cython integers instead of python tuples.
Any help is greatly appreciated!

You are describing what Numpy calls a Generalized Universal FUNCtion, or gufunc. As it name suggests, it is an extension of ufuncs. You probably want to start by reading these two pages:
Writing your own ufunc
Building a ufunc from scratch
The second example uses Cython and has some material on gufuncs. To fully go down the gufunc road, you will need to read the corresponding section in the numpy C API documentation:
Generalized Universal Function API
I do not know of any example of gufuncs being coded in Cython, although it shouldn't be too hard to do following the examples above. If you want to look at gufuncs coded in C, you can take a look at the source code for np.linalg here, although that can be a daunting experience. A while back I bored my local Python User Group to death giving a talk on extending numpy with C, which was mostly about writing gufuncs in C, the slides of that talk and a sample Python module providing a new gufunc can be found here.

If you want to stick with nditer, here's a way using your example dimensions. It's pure Python here, but shouldn't be hard to implement with cython (though it still has the tuple iterator). I'm borrowing ideas from ndindex as described in shallow iteration with nditer
The idea is to find the common broadcasting shape, sz, and construct a multi_index iterator over it.
I'm using as_strided to expand X and Y to usable views, and passing the appropriate vectors (actually (1,n) arrays) to the f(x,y) function.
import numpy as np
from numpy.lib.stride_tricks import as_strided
def f(x,y):
# sample that takes (10,) and (3,) arrays, and returns (10,) array
assert x.shape==(1,10), x.shape
assert y.shape==(1,3), y.shape
z = x*10 + y.mean()
return z
def brdcast(X, X1):
# broadcast X to shape of X1 (keep last dim of X)
# modeled on np.broadcast_arrays
shape = X1.shape + (X.shape[-1],)
strides = X1.strides + (X.strides[-1],)
X1 = as_strided(X, shape=shape, strides=strides)
return X1
def F(X, Y):
X1, Y1 = np.broadcast_arrays(X[...,0], Y[...,0])
Z = np.zeros(X1.shape + (10,))
it = np.nditer(X1, flags=['multi_index'])
X1 = brdcast(X, X1)
Y1 = brdcast(Y, Y1)
while not it.finished:
I = it.multi_index + (None,)
Z[I] = f(X1[I], Y1[I])
it.iternext()
return Z
sx = (2,3) # works with (2,1)
sy = (1,3)
# X, Y = np.ones(sx+(10,)), np.ones(sy+(3,))
X = np.repeat(np.arange(np.prod(sx)).reshape(sx)[...,None], 10, axis=-1)
Y = np.repeat(np.arange(np.prod(sy)).reshape(sy)[...,None], 3, axis=-1)
Z = F(X,Y)
print Z.shape
print Z[...,0]

Related

How to use scipy.integrate.fixed_quad for computing many integrals at once?

Given a function func(x,y,z), I want to provide a function
def integral_over_z(func,x,y,zmin=0,zmax=1,n=16):
lambda_func = z,x,y: ???
return scipy.integrate.fixed_quad(lambda_func,a=zmin,b=zmax,args=(x,y),n=n)
that computes its integral over z for user provided (x,y) inputs using scipy.integrate.fixed_quad. The input (x,y) can be each be a single float or an array of floats (when both are arrays, their shapes are identical).
scipy.integrate.fixed_quad supports integrating vector-valued functions. To this end, the function func must return a corresponding array of higher dimension: "If integrating a vector-valued function, the returned array must have shape (..., len(x))" (from the docs).
My question therefore is how to generate the corresponding output array of the lambda_func (which may be implemented using a special-purpose class).
EDIT: to help understand my question, here is an implementation that works, but is not vectorized over z (and hence doesn't use scipy.integrate.fixed_quad).
def integral_over_z(func,x,y,zmin,zmax,n=16):
z,w = scipy.special.roots_legendre(n)
dz = 0.5*(zmax-zmin)
z = zmin + (np.real(z)+1) * dz
w = np.real(w) * dz
result = w[0] * func(x,y,z[0])
for i in range(1,len(z)):
result += w[i] * func(x,y,z[i])
return result
The problem is: how to vectorize it, such that it works for any valid input (x and/or y floats or arrays).
ANOTHER EDIT:
For the implementation via scipy.integrate.fixed_quad, the integrand function must take a 1D array of z of shape (nz). The inputs x and y must broadcast together, when the broadcasted shape of them could be anything, say (n0,n1,..,nk) Then the return from func must have shape (n0,n1,..,nk,nz) -- how to I generated that?
It seems as a vector valued function the vector values must be in the 0th dimension, and the integration arguments (in your case z) must come last (that what they mean with (..., len(x)), their x is your z), I think this comes from the broadcasting rules. Following example worked fine for me - the key here is that x and y must have the right shape for the broadcasting to work
import numpy as np
import scipy.integrate
def integral_over_z(func,x,y,n=16):
lambda_func = lambda z, x, y: func(x[..., None],y[..., None],z) # the last dimension of (x,y) needs to be size 1, but you can have as many leading dimensions as you want
return scipy.integrate.fixed_quad(lambda_func,a=0,b=1,args=(x,y),n=n)
func = lambda x,y,z: 1 + 0*x + 0*y + 0*z # make sure that the output has the right (broadcast) shape
x = np.zeros((5,))
y = np.arange(5)
print(integral_over_z(func, x, y, 2))
After the (incomplete) answer by flawr and reading about numpy broadcasting, I found a solution. I'd be happy to learn whether this can still be improved and/or if this is really correct, i.e. works for any valid input (it does for my tests sofar).
The important point is to adapt the shapes of x and y such that
func(x,y,z) works just fine, i.e. x, y, and z are jointly broadcastable;
after summing the output of func over the last (z) dimension, the result has the joint broadcasted shape of x and y.
Here is my solution:
def integral_over_z(func,x,y,zmin=0,zmax=1,n=16):
xe = x
ye = y
if type(xe) is np.ndarray or type(ye) is np.ndarray:
xe,ye = np.broadcast_arrays(x,y) # replace x,y by their joint broadcast
xe = np.expand_dims(xe, xe.ndim) # expand by an extra dimension for z
ye = np.expand_dims(ye, ye.ndim) # expand by an extra dimension for z
return scipy.integrate.fixed_quad(lambda z : func(xe,ye,z), a=zmin, b=zmax, n=n)

Reshaping array of matrices in Python

I have a Numpy array X of n 2x2 matrices, arranged so that X.shape = (2,2,n), that is, to get the first matrix I call X[:,:,0]. I would like to reshape X into an array Y such that I can get the first matrix by calling Y[0] etc., but performing X.reshape(n,2,2) messes up the matrices. How can I get it to preserve the matrices while reshaping the array?
I am essentially trying to do this:
import numpy as np
Y = np.zeros([n,2,2])
for i in range(n):
Y[i] = X[:,:,i]
but without using the for loop. How can I do this with reshape or a similar function?
(To get an example array X, try X = np.concatenate([np.identity(2)[:,:,None]] * n, axis=2) for some n.)
numpy.moveaxis can be used to take a view of an array with one axis moved to a different position in the shape:
numpy.moveaxis(X, 2, 0)
numpy.moveaxis(a, source, destination) takes a view of array a where the axis originally at position source ends up at position destination, so numpy.moveaxis(X, 2, 0) makes the original axis 2 the new axis 0 in the view.
There's also numpy.transpose, which can be used to perform arbitrary rearrangements of an array's axes in one go if you pass it the optional second argument, and numpy.rollaxis, an older version of moveaxis with a more confusing calling convention.
Use swapaxis:
Y = X.swapaxes(0,2)

Vector dot product along one dimension for multidimensional arrays

I want to compute the sum product along one dimension of two multidimensional arrays, using Theano.
I'll describe precisely what I want to do using numpy first. numpy.tensordot and numpy.dot seem to always do a matrix product, whereas I'm in essence looking for a batched equivalent of a vector product. Given x and y, I want to compute z like so:
x = np.random.normal(size=(200, 2, 2, 1000))
y = np.random.normal(size=(200, 2, 2))
# this is how I now approach it:
z = np.sum(y[:,:,:,np.newaxis] * x, axis=1)
# z is of shape (200, 2, 1000)
Now I know that numpy.einsum would probably be able to help me here, but again, I want to do this particular computation in Theano, which does not have an einsum equivalent. I will need to use dot, tensordot, or Theano's specialized einsum subset functions batched_dot or batched_tensordot.
The reason I'm looking to change my approach to this is performance; I suspect that using builtin (CUDA) dot products will be faster than relying on broadcasting, element-wise product, and sum.
In Theano, none of the dimensions of three and four dimensional tensors are broadcastable. You have to explicitly set them. Then the Numpy principles will work just fine. One way to do this is to use T.patternbroadcast. To read more about broadcasting, refer this.
You have three dimensions in one of the tensors. So first you need to append a singleton dimension at the end and then make that dimension broadcastable. These two things can be achieved with a single command - T.shape_padaxis. The entire code is as follows:
import theano
from theano import tensor as T
import numpy as np
X = T.ftensor4('X')
Y = T.ftensor3('Y')
Y_broadcast = T.shape_padaxis(Y, axis=-1) # appending extra dimension and making it
# broadcastable
Z = T.sum((X*Y_broadcast), axis=1) # element-wise multiplication
f = theano.function([X, Y], Z, allow_input_downcast=True)
# Making sure that it works and gives correct results
x = np.random.normal(size=(3, 2, 2, 4))
y = np.random.normal(size=(3, 2, 2))
theano_result = f(x,y)
numpy_result = np.sum(y[:,:,:,np.newaxis] * x, axis=1)
print np.amax(theano_result - numpy_result) # prints 2.7e-7 on my system, close enough!
I hope this helps.

Python Numpy error : setting an array element with a sequence

I'm quite new to Python and Numpy, so I apologize if I'm missing something obvious here.
I have a function that solves a system of 2 differential equations :
import numpy as np
import numpy.linalg as la
def solve_ode(x0, a0, beta, t):
At = np.array([[0.23*t, (-10**5)*t], [0, -beta*t]], dtype=np.float32)
# get eigenvalues and eigenvectors
evals, V = la.eig(At)
Vi = la.inv(V)
# get e^At coeff
eAt = V # np.exp(evals) # Vi
xt = eAt*x0
return xt
However, running it with this code :
import matplotlib.pyplot as plt
# initial values
x0 = 10**6
a0 = 2.5
beta = 0.05
t = np.linspace(0, 3600, 360)
plt.semilogy(t, solve_ode(x0, a0, beta, t))
... throws this error :
ValueError: setting an array element with a sequence.
At this line :
At = np.array([[0.23*t, (-10**5)*t], [0, -beta*t]], dtype=np.float32)
Note that t and beta are supposed to be floats. I think Python might not be able to infer this but I don't know how I could do this...
Thx in advance for your help.
You are supplying t as a numpy array of shape 360 from linspace and not simply a float. The resulting At numpy array you are trying to create is then ill formed as all columns must be the same length. In python there is an important difference between lists and numpy arrays. For example, you could do what you have here as a list of lists, e.g.
At = [[0.23*t, (-10**5)*t], [0, -beta*t]]
with dimensions [[360 x 360] x [1 x 360]].
Alternatively, if all elements of At are the length of t the array would work,
At = np.array([[0.23*t, (-10**5)*t], [t, -beta*t]], dtype=np.float32)
with shape [2, 2, 360].
When you give a list or a list of lists, or in this case, a list of list of listss, all of them should have the same length, so that numpy can automatically infer the dimensions (shape) of the resulting matrix.
In your example, it's all correctly put, except the part you put 0 as a column I guess. Not sure what to call it though, cause your expected output is a cube I suppose.
You can fix it by giving the correct number of zeros as bellow:
At = np.array([[0.23*t, (-10**5)*t], [np.zeros(len(t)), -beta*t]], dtype=np.float32)
But check the .shape of the resulting array, and make sure it's what you want.
As others note the problem is the 0 in the inner list. It doesn't match the 360 length arrays generated by the other expressions. np.array can make an object dtype array from that (2x2), but can't make a float one.
At = np.array([[0.23*t, (-10**5)*t], [0*t, -beta*t]])
produces a (2,2,360) array. But I suspect the rest of that function is built around the assumption that At is (2,2) - a 2d square array with eig, inv etc.
What is the return xt supposed to be?
Does this work?
S = np.array([solve_ode(x0, a0, beta, i) for i in t])
giving a 1d array with the same number of values as in t?
I'm not suggesting this is the fastest way of solving the problem, but it's the simplest, especially if you are only generating 360 values.

numpy broadcast from first dimension

In NumPy, is there an easy way to broadcast two arrays of dimensions e.g. (x,y) and (x,y,z)? NumPy broadcasting typically matches dimensions from the last dimension, so usual broadcasting will not work (it would require the first array to have dimension (y,z)).
Background: I'm working with images, some of which are RGB (shape (h,w,3)) and some of which are grayscale (shape (h,w)). I generate alpha masks of shape (h,w), and I want to apply the mask to the image via mask * im. This doesn't work because of the above-mentioned problem, so I end up having to do e.g.
mask = mask.reshape(mask.shape + (1,) * (len(im.shape) - len(mask.shape)))
which is ugly. Other parts of the code do operations with vectors and matrices, which also run into the same issue: it fails trying to execute m + v where m has shape (x,y) and v has shape (x,). It's possible to use e.g. atleast_3d, but then I have to remember how many dimensions I actually wanted.
how about use transpose:
(a.T + c.T).T
numpy functions often have blocks of code that check dimensions, reshape arrays into compatible shapes, all before getting down to the core business of adding or multiplying. They may reshape the output to match the inputs. So there is nothing wrong with rolling your own that do similar manipulations.
Don't offhand dismiss the idea of rotating the variable 3 dimension to the start of the dimensions. Doing so takes advantage of the fact that numpy automatically adds dimensions at the start.
For element by element multiplication, einsum is quite powerful.
np.einsum('ij...,ij...->ij...',im,mask)
will handle cases where im and mask are any mix of 2 or 3 dimensions (assuming the 1st 2 are always compatible. Unfortunately this does not generalize to addition or other operations.
A while back I simulated einsum with a pure Python version. For that I used np.lib.stride_tricks.as_strided and np.nditer. Look into those functions if you want more power in mixing and matching dimensions.
as another angle: if you encounter this pattern frequently, it may be useful to create a utility function to enforce right-broadcasting:
def right_broadcasting(arr, target):
return arr.reshape(arr.shape + (1,) * (target.ndim - arr.ndim))
Although if there are only two types of input (already having 3 dims or having only 2), id say the single if statement is preferable.
Indexing with np.newaxis creates a new axis in that place. Ie
xyz = #some 3d array
xy = #some 2d array
xyz_sum = xyz + xy[:,:,np.newaxis]
or
xyz_sum = xyz + xy[:,:,None]
Indexing in this way creates an axis with shape 1 and stride 0 in this location.
Why not just decorate-process-undecorate:
def flipflop(func):
def wrapper(a, mask):
if len(a.shape) == 3:
mask = mask[..., None]
b = func(a, mask)
return np.squeeze(b)
return wrapper
#flipflop
def f(x, mask):
return x * mask
Then
>>> N = 12
>>> gs = np.random.random((N, N))
>>> rgb = np.random.random((N, N, 3))
>>>
>>> mask = np.ones((N, N))
>>>
>>> f(gs, mask).shape
(12, 12)
>>> f(rgb, mask).shape
(12, 12, 3)
Easy, you just add a singleton dimension at the end of the smaller array. For example, if xyz_array has shape (x,y,z) and xy_array has shape (x,y), you can do
xyz_array + np.expand_dims(xy_array, xy_array.ndim)

Categories

Resources