How to fold/accumulate a numpy matrix product (dot)?

How to fold/accumulate a numpy matrix product (dot)? - python

With using python library numpy, it is possible to use the function cumprod to evaluate cumulative products, e.g.
a = np.array([1,2,3,4,2])
np.cumprod(a)
gives
array([ 1, 2, 6, 24, 48])
It is indeed possible to apply this function only along one axis.
I would like to do the same with matrices (represented as numpy arrays), e.g. if I have
S0 = np.array([[1, 0], [0, 1]])
Sx = np.array([[0, 1], [1, 0]])
Sy = np.array([[0, -1j], [1j, 0]])
Sz = np.array([[1, 0], [0, -1]])
and
b = np.array([S0, Sx, Sy, Sz])
then I would like to have a cumprod-like function which gives
np.array([S0, S0.dot(Sx), S0.dot(Sx).dot(Sy), S0.dot(Sx).dot(Sy).dot(Sz)])
(This is a simple example, in reality I have potentially large matrices evaluated over n-dimensional meshgrids, so I seek for the most simple and efficient way to evaluate this thing.)
In e.g. Mathematica I would use
FoldList[Dot, IdentityMatrix[2], {S0, Sx, Sy, Sz}]
so I searched for a fold function, and all I found is an accumulate method on numpy.ufuncs. To be honest, I know that I am probably doomed because an attempt at
np.core.umath_tests.matrix_multiply.accumulate(np.array([pauli_0, pauli_x, pauli_y, pauli_z]))
as mentioned in a numpy mailing list yields the error
Reduction not defined on ufunc with signature
Do you have an idea how to (efficiently) do this kind of calculation ?
Thanks in advance.

As food for thought, here are 3 ways of evaluating the 3 sequential dot products:
With the normal Python reduce (which could also be written as a loop)
In [118]: reduce(np.dot,[S0,Sx,Sy,Sz])
array([[ 0.+1.j, 0.+0.j],
[ 0.+0.j, 0.+1.j]])
The einsum equivalent
In [119]: np.einsum('ij,jk,kl,lm',S0,Sx,Sy,Sz)
The einsum index expression looks like a sequence of operations, but it is actually evaluated as a 5d product with summation on 3 axes. In the C code this is done with an nditer and strides, but the effect is as follows:
In [120]: np.sum(S0[:,:,None,None,None] * Sx[None,:,:,None,None] *
Sy[None,None,:,:,None] * Sz[None,None,None,:,:],(1,2,3))
In [127]: np.prod([S0[:,:,None,None,None], Sx[None,:,:,None,None],
Sy[None,None,:,:,None], Sz[None,None,None,:,:]]).sum((1,2,3))
A while back while creating a patch from np.einsum I translated that C code to Python, and also wrote a Cython sum-of-products function(s). This code is on github at
https://github.com/hpaulj/numpy-einsum
einsum_py.py is the Python einsum, with some useful debugging output
sop.pyx is the Cython code, which is compiled to sop.so.
Here's how it could be used for part of your problem. I'm skipping the Sy array since my sop is not coded for complex numbers (but that could be changed).
import numpy as np
import sop
import einsum_py
S0 = np.array([[1., 0], [0, 1]])
Sx = np.array([[0., 1], [1, 0]])
Sz = np.array([[1., 0], [0, -1]])
print np.einsum('ij,jk,kl', S0, Sx, Sz)
# [[ 0. -1.] [ 1. 0.]]
# same thing, but with parsing information
einsum_py.myeinsum('ij,jk,kl', S0, Sx, Sz, debug=True)
"""
{'max_label': 108, 'min_label': 105, 'nop': 3,
'shapes': [(2, 2), (2, 2), (2, 2)],
'strides': [(16, 8), (16, 8), (16, 8)],
'ndim_broadcast': 0, 'ndims': [2, 2, 2], 'num_labels': 4,
....
op_axes [[0, -1, 1, -1], [-1, -1, 0, 1], [-1, 1, -1, 0], [0, 1, -1, -1]]
"""
# take op_axes (for np.nditer) from this debug output
op_axes = [[0, -1, 1, -1], [-1, -1, 0, 1], [-1, 1, -1, 0], [0, 1, -1, -1]]
w = sop.sum_product_cy3([S0,Sx,Sz], op_axes)
print w
As written sum_product_cy3 cannot take an arbitrary number of ops. Plus the iteration space increases with each op and index. But I can imagine calling it repeatedly, either at the Cython level, or from Python. I think it has potential for being faster than repeat(dot...) for lots of small arrays.
A condensed version of the Cython code is:
def sum_product_cy3(ops, op_axes, order='K'):
#(arr, axis=None, out=None):
cdef np.ndarray[double] x, y, z, w
cdef int size, nop
nop = len(ops)
ops.append(None)
flags = ['reduce_ok','buffered', 'external_loop'...]
op_flags = [['readonly']]*nop + [['allocate','readwrite']]
it = np.nditer(ops, flags, op_flags, op_axes=op_axes, order=order)
it.operands[nop][...] = 0
it.reset()
for x, y, z, w in it:
for i in range(x.shape[0]):
w[i] = w[i] + x[i] * y[i] * z[i]
return it.operands[nop]

Related

Generate array from elementwise operation of vector with itself

What is the "best" way to generate an array from performing an operation between each element of a vector and the whole vector?
The below example uses a loop and subtraction as the operation but in the general case, the operation could be any function.
Criteria for "best" could be: execution speed, amount of code needed, readability
a = np.array([1, 2, 3])
dim = len(a)
b = np.empty([dim, dim])
def operation(x1, x2):
return x1-x2
for i in range(dim):
b[i,:] = operation(a, a[i])
print(b)

I think numpy broadcasting will meet all of your criteria ;)
>>> a - a[:, None]
array([[ 0, 1, 2],
[-1, 0, 1],
[-2, -1, 0]])

Improving np.fromfuction performance in terms

I am trying to create a big array for a high dim in y_shift = np.fromfunction(lambda i,j: (i)>>j, ((2**dim), dim), dtype=np.uint32). For example dim=32. I have two questions
1.- How to improve the speed in term of time
2.- How to avoid the message for dim=32 zsh: killed python3
EDIT::
Alternative you can consider to use uint8 instead of uint32
y_shift = np.fromfunction(lambda i,j: (1&(i)>>j), ((2**dim), dim), dtype=np.uint8)

To answer your question:
You get the error zsh: killed python3 because you run out of memory.
If you want to run the code you initially proposed:
dim =32
y_shift = np.fromfunction(lambda i,j: (i)>>j, ((2**dim), dim), dtype=np.uint32)
You would need more than 500GB of memory, see here.
I would recommend thinking of alternatives and avoid trying to save the entire array to memory.

fromfunction just does 2 things:
args = indices(shape, dtype=dtype)
return function(*args, **kwargs)
It makes the indices array
In [247]: args = np.indices(((2**4),4))
In [248]: args.shape
Out[248]: (2, 16, 4)
and it passes that array to your function
In [249]: args[0]>>args[1]
Out[249]:
array([[ 0, 0, 0, 0],
[ 1, 0, 0, 0],
[ 2, 1, 0, 0],
[ 3, 1, 0, 0],
...
[14, 7, 3, 1],
[15, 7, 3, 1]])
With dim=32:
In [250]: ((2**32),32)
Out[250]: (4294967296, 32)
the resulting args array will be (2, 4294967296, 32). There's no way around that in terms of speed or memory use.

In a numpy array (a list of tuples), the processing speed is slow by extending () many times. I want to make that part faster

There is a numpy array that can be formed by combining an array of tuples in a for loop like "res" in this code. (Variable names and contents are simplified from the actual code.)
If you take a closer look at this, a for loop is executed for the length of arr_2, and the array extends () is executed.It turns out that the processing speed becomes extremely heavy when arr_2 becomes long.
Wouldn't it be possible to process at high speed by making array creation well?
# -*- coding: utf-8 -*-
import numpy as np
arr_1 = np.array([[0, 0, 1], [0, 0.5, -1], [-1, 0, -1], [0, -0.5, -1], [1, 0, -1]])
arr_2 = np.array([[0, 1, 2], [0, 1, 2]])
all_arr = []
for p in arr_2:
all_arr = [
(arr_1[0], p), (arr_1[1], p), (arr_1[2], p),
(arr_1[0], p), (arr_1[1], p), (arr_1[4], p),
(arr_1[0], p), (arr_1[2], p), (arr_1[3], p),
(arr_1[0], p), (arr_1[3], p), (arr_1[4], p),
(arr_1[1], p), (arr_1[2], p), (arr_1[4], p),
(arr_1[2], p), (arr_1[3], p), (arr_1[4], p)]
all_arr.extend(all_arr)
vtype = [('type_a', np.float32, 3), ('type_b', np.float32, 3)]
res = np.array(all_arr, dtype=vtype)
print(res)

I couldn't figure out why you used this indexing for arr_1 so I just copied it
import numpy as np
arr_1 = np.array([[0, 0, 1], [0, 0.5, -1], [-1, 0, -1], [0, -0.5, -1], [1, 0, -1]])
arr_2 = np.array([[0, 1, 2], [0, 1, 2]])
weird_idx = np.array([0,1,2,0,1,4,0,2,3,0,3,4,1,2,4,2,3,4])
weird_arr1 = arr_1[weird_idx]
all_arr = [(wiered_arr1[i],arr_2[j]) for j in range(len(arr_2)) for i in range(len(wiered_arr1)) ]
vtype = [('type_a', np.float32, 3), ('type_b', np.float32, 3)]
res = np.array(all_arr, dtype=vtype)
you can also repeat the arrays
arr1_rep = np.tile(weird_arr1.T,2).T
arr2_rep = np.repeat(arr_2,weird_arr1.shape[0],0)
res = np.empty(arr1_rep.shape[0],dtype=vtype)
res['type_a']=arr1_rep
res['type_b']=arr2_rep

Often with structured arrays it is faster to assign by field instead of the list of tuples approach:
In [388]: idx = [0,1,2,0,1,4,0,2,3,0,3,4,1,2,4,2,3,4]
In [400]: res1 = np.zeros(36, dtype=vtype)
In [401]: res1['type_a'][:18] = arr_1[idx]
In [402]: res1['type_a'][18:] = arr_1[idx]
In [403]: res1['type_b'][:18] = arr_2[0]
In [404]: res1['type_b'][18:] = arr_2[1]
In [405]: np.allclose(res['type_a'], res1['type_a'])
Out[405]: True
In [406]: np.allclose(res['type_b'], res1['type_b'])
Out[406]: True

Python-How to multiply matrix with symbols and 0s

I am brand new to python, but is there any way to multiply matrices with both 0's and symbols? For example, see below:
import sympy as sym
import numpy as np
teams=np.matrix([[1,2],[3,4]])
teams=teams-1
n=4
x,a,b=sym.symbols('x a b')
X=np.empty((n,n), dtype=object)
Y=np.empty((n,n), dtype=object)
Z=np.empty((n,n), dtype=object)
for i in range(n):
for j in range(n):
if j==i:
X[i,j]=x
elif ([i,j] in teams.tolist()):
Y[i,j]=a
elif ([j,i] in teams.tolist()):
Y[i,j]=a
else:
Z[i,j]=b
for i in range(n):
for j in range(n):
if X[i,j]==None:
X[i,j]=0
if Y[i,j]==None:
Y[i,j]=0
if Z[i,j]==None:
Z[i,j]=0
print(np.matmul(X,Y))
TypeError Traceback (most recent call last)
<ipython-input-189-00b753462a2d> in <module>
2 print(Y)
3 print(Z)
----> 4 print(np.matmul(X,Y))
TypeError: ufunc 'matmul' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
I know why it is messing up, I am trying to multiply a symbol by a number, but I was wondering if there was anyway to make this recognize that a symbol times 0 is just zero and should be disregarded if being added to another symbol.

The problem isn't specifically with the symbols, but with the object dtype. matmul doesn't (or didn't) work with object dtype arrays. The fast version uses BLAS library functions, which only work with C numeric types - float and integers. np.dot does have a slower branch that does work with non-numeric dtypes.
In a isympy session:
In [4]: X
Out[4]:
array([[x, 0, 0, 0],
[0, x, 0, 0],
[0, 0, x, 0],
[0, 0, 0, x]], dtype=object)
In [5]: Y
Out[5]:
array([[0, a, 0, 0],
[a, 0, 0, 0],
[0, 0, 0, a],
[0, 0, a, 0]], dtype=object)
In [6]: np.dot(X,Y)
Out[6]:
array([[0, a*x, 0, 0],
[a*x, 0, 0, 0],
[0, 0, 0, a*x],
[0, 0, a*x, 0]], dtype=object)
BUT, matmul does work for me. I wonder if that's because of my numpy version?
In [7]: np.matmul(X,Y)
Out[7]:
array([[0, a*x, 0, 0],
[a*x, 0, 0, 0],
[0, 0, 0, a*x],
[0, 0, a*x, 0]], dtype=object)
In [8]: np.__version__
Out[8]: '1.17.4'
As a general rule mixing sympy and numpy is not a good idea. numpy arrays containing symbols are necessarily object dtype. Math on object dtype depends on delegating the action to methods. The result is hit-or-miss. Multiplication and addition may work (x+x), but np.sin does not, because x.sin() fails. It's best to use sympy.lambdify if you want to use sympy expressions in numpy. Otherwise, try to use pure sympy.
In [12]: X*X
Out[12]:
array([[x**2, 0, 0, 0],
[0, x**2, 0, 0],
[0, 0, x**2, 0],
[0, 0, 0, x**2]], dtype=object)
In [13]: np.sin(X)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
AttributeError: 'Symbol' object has no attribute 'sin'
===
From the numpy 1.17.0 release notes
Support of object arrays in matmul¶
It is now possible to use matmul (or the # operator) with object arrays. For instance, it is now possible to do:
from fractions import Fraction
a = np.array([[Fraction(1, 2), Fraction(1, 3)], [Fraction(1, 3), Fraction(1, 2)]])
b = a # a

Whenever you are working with symbolic math, you should leave out numpy and keep everything inside sympy. Numpy doesn't understand about sympy's symbols. You can be lucky a few times with multiplying by zero, but it doesn't make much sense in general. Numpy works with arrays of numbers, preferably everything of the same type.
However, you can use lambdify to bridge the gap and convert sympy expressions to be used by numpy.
Here is your code with sympy's matrices:
import sympy as sym
teams = sym.Matrix([[1, 2], [3, 4]])
teams = teams - sym.ones(2, 2)
n = 4
x, a, b = sym.symbols('x a b')
X = sym.zeros(n, n)
Y = sym.zeros(n, n)
Z = sym.zeros(n, n)
for i in range(n):
for j in range(n):
if j == i:
X[i, j] = x
elif [i, j] in teams.tolist() or [j, i] in teams.tolist():
Y[i, j] = a
else:
Z[i, j] = b
for i in range(n):
for j in range(n):
if X[i, j] is None:
X[i, j] = 0
if Y[i, j] is None:
Y[i, j] = 0
if Z[i, j] is None:
Z[i, j] = 0
print(X * Y)
Result:
Matrix([[0, a*x, 0, 0],
[a*x, 0, 0, 0],
[0, 0, 0, a*x],
[0, 0, a*x, 0]])

I tested your code with print(np.dot(X,Y)) instead of print(np.matmul(X,Y)) and it worked. According to the documentation np.matmul is preferred over np.dot for matrix multiplication, but I wasn't able to figure out how to do it using np.matmul. I tried np.matmul(X, Y, casting='unsafe'), but the same error resulted. I don't think the error is caused by adding 0 or multiplying by 0, sympy is able to do simplifications.
E.g.
x = sym.symbols('x')
print(x + 0)
print(x*0)
print(3*x + 5*x)
returns just as expected x, 0 and x*8.
Hopefully this helps you out.

Extract sub arrays based on kernel in numpy

I would like to know if there is an efficient method to get sub-arrays from a larger numpy array.
What I have is an application of np.where. I iterate 'manually' over x and y as offsets and apply where with a kernel to each rectangle extracted from the larger array with proper dimensions.
But is there a more direct approach in numpy's collection of methods?
import numpy as np
example = np.arange(20).reshape((5, 4))
# e.g. a cross kernel
a_kernel = np.asarray([[0, 1, 0], [1, 1, 1], [0, 1, 0]])
np.where(a_kernel, example[1:4, 1:4], 0)
# returns
# array([[ 0, 6, 0],
# [ 9, 10, 11],
# [ 0, 14, 0]])
def arrays_from_kernel(a, a_kernel):
width, height = a_kernel.shape
y_max, x_max = a.shape
return [np.where(a_kernel, a[y:(y + height), x:(x + width)], 0)
for y in range(y_max - height + 1)
for x in range(x_max - width + 1)]
sub_arrays = arrays_from_kernel(example, a_kernel)
This returns the arrays I need for further processing.
# [array([[0, 1, 0],
# [4, 5, 6],
# [0, 9, 0]]),
# array([[ 0, 2, 0],
# [ 5, 6, 7],
# [ 0, 10, 0]]),
# ...
# array([[ 0, 9, 0],
# [12, 13, 14],
# [ 0, 17, 0]]),
# array([[ 0, 10, 0],
# [13, 14, 15],
# [ 0, 18, 0]])]
The context: similar to 2D convolution I would like to apply a custom function on each of the subarrays (e.g. product of squared numbers).

At the moment, you're manually advancing a sliding window over the data - stride tricks to the rescue! (And no, I didn't just make that up - there's actually a submodule called stride_tricks in numpy!) Instead of manually building windows into the data, and calling np.where() on them, if you had the windows in an array, you could call np.where() just once. Stride tricks allow you to create such an array without even having to copy the data.
Let me explain. Normal slices in numpy create views into the original data instead of copies. This is done by referring to the original data, but changing the strides used to access the data (ie. how much to jump between two elements or two rows, and so on). Stride tricks allow you to modify those strides more freely than just slicing and reshaping does, so you can eg. iterate over the same data more than once, which is useful here.
Let me demonstrate:
import numpy as np
example = np.arange(20).reshape((5, 4))
a_kernel = np.array([[0, 1, 0], [1, 1, 1], [0, 1, 0]])
def sliding_window(data, win_shape, **kwargs):
assert data.ndim == len(win_shape)
shape = tuple(dn - wn + 1 for dn, wn in zip(data.shape, win_shape)) + win_shape
strides = data.strides * 2
return np.lib.stride_tricks.as_strided(data, shape=shape, strides=strides, **kwargs)
def arrays_from_kernel(a, a_kernel):
windows = sliding_window(a, a_kernel.shape)
return np.where(a_kernel, windows, 0)
sub_arrays = arrays_from_kernel(example, a_kernel)

The scipy.ndimage module offers a number of filters -- one of which might meet your needs. If none of those filters do what you want, you could use ndimage.generic_filter
to call a custom function on each subarray. ndimage.generic_filter is not as fast as the other ndimage filters, however.
For example,
import numpy as np
example = np.arange(20).reshape((5, 4))
a_kernel = np.asarray([[0, 1, 0], [1, 1, 1], [0, 1, 0]])
# def arrays_from_kernel(a, a_kernel):
# width, height = a_kernel.shape
# y_max, x_max = a.shape
# return [np.where(a_kernel, a[y:(y + height), x:(x + width)], 0)
# for y in range(y_max - height + 1)
# for x in range(x_max - width + 1)]
# sub_arrays = arrays_from_kernel(example, a_kernel)
# for arr in sub_arrays:
# print(arr)
# print('-'*80)
import scipy.ndimage as ndimage
def func(x):
# reject subarrays that extend beyond the border of the `example` array
if not np.isnan(x).any():
y = np.zeros_like(a_kernel, dtype=example.dtype)
np.put(y, np.flatnonzero(a_kernel), x)
print(y)
# Instead or returning 0, you can perform your desired computation on the subarray here.
# Note that you may not need the 2D array y; often, you only need the values in the 1D array x
return 0
result = ndimage.generic_filter(example, func, footprint=a_kernel, mode='constant', cval=np.nan)
For the particular problem of computing the product of squares for each subarray, you
could convert the product into a sum by taking advantage of the fact that A * B = exp(log(A)+log(B)). This would allow you to express the computation as a normal convolution. Now using ndimage.convolve can improve performance a lot. The amount of the improvement depends on the size of example:
import numpy as np
import scipy.ndimage as ndimage
import perfplot
a_kernel = np.asarray([[0, 1, 0], [1, 1, 1], [0, 1, 0]])
def orig(example, a_kernel=a_kernel):
def arrays_from_kernel(a, a_kernel):
width, height = a_kernel.shape
y_max, x_max = a.shape
return [
np.where(a_kernel, a[y : (y + height), x : (x + width)], 1)
for y in range(y_max - height + 1)
for x in range(x_max - width + 1)
]
return [np.prod(x) ** 2 for x in arrays_from_kernel(example, a_kernel)]
def alt(example, a_kernel=a_kernel):
logged = np.log(example)
result = ndimage.convolve(logged, a_kernel, mode="constant", cval=0)[1:-1, 1:-1]
return (np.exp(result) ** 2).ravel()
def make_example(N):
return np.random.random(size=(N, N))
def check(A, B):
return np.allclose(A, B)
perfplot.show(
setup=make_example,
kernels=[orig, alt],
n_range=[2 ** k for k in range(2, 11)],
logx=True,
logy=True,
xlabel="len(example)",
equality_check=check,
)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to fold/accumulate a numpy matrix product (dot)? - python

Related

Generate array from elementwise operation of vector with itself

Improving np.fromfuction performance in terms

In a numpy array (a list of tuples), the processing speed is slow by extending () many times. I want to make that part faster

Python-How to multiply matrix with symbols and 0s

Extract sub arrays based on kernel in numpy

Categories

Resources