Python vectorization of matrix-vector operation - python

I have a Matrix A with shape (2,2,N) and a Matrix V with shape (2,N)
I want to vectorize the following:
F = np.zeros(N)
for k in xrange(N):
F[k] = np.dot( A[:,:,k], V[:,k] ).sum()
Any way this can be done with either tensordot or any other numpy function without explicit looping?

With np.einsum -
F = np.einsum('ijk,jk->k',A,V)
We can optimize it further with optimize flag (check docs) set as True.

Related

scipy sparse `LinearOperator` preserves sparseness under what conditions?

I have a scipy sparse csc_matrix J with J.shape = (n, k).
Suppose d is some k-length array with no zeros.
I want to construct a LinearOperator, call it linop, where
from scipy.sparse.linalg import LinearOperator,
J = # (n,k) csc_matrix
d = # some k-array
D = # assume I make a sparse diagonal matrix here of 1/d
linop = LinearOperator((n,k),
matvec=lambda v: J.dot(v/d),
rmatvec=lambda v: D.dot(J.T.dot(v))
)
My question is, under what conditions does this preserve "sparsity"? Not of the result, but of the intermediate steps. (I am unsure in general what happens "under the hood" when you multiply sparse times dense.)
For example, if (v/d) is dense, is J converted to dense before the multiplication? This would be very bad for my use case. Do I need to explicitly convert the input arguments in the lambda methods to sparse before the multiplication?
Thank you in advance.
Edit: pre-computing "J / d" is not an option as I need J later, and don't have the memory to store J and J / d.

Efficient way to "broadcast" the sum of elements of two 1D arrays to a 2D array

Is there a more efficient way (without loops) to do this with Numpy ?:
for i, x in enumerate(array1):
for j, y in enumerate(array2):
result[i, j] = x + y
I was trying to use einsum without success yet.
Thank you !
Simply use broadcasting with an extra dimension:
result = array1[:,None]+array2

Matrix-vector-multiplication with tensors in numpy

I have a numpy.array A with shape (l,l) and another numpy.array B with shape (l,m,n). Usually, the second and third dimension in B correspond to spatial cells and the first to something else.
I want to compute
l,m,n = 2,3,4 # dummy dimensions
A = np.random.rand(l,l) # dummy data
B = np.random.rand(l,m,n) # dummy data
C = np.zeros((l,m,n))
for i in range(m):
for j in range(n):
C[:,i,j] = A#B[:,i,j]
i.e., in every spatial cell, I want to perform a matrix-vector-multiplication.
Since I have to do this frequently, I would like to know, if there's a more compact way to write this with numpy. (Especially, because there are several situations in which the tensor has shape (l,m,n,o,p).)
Thank you in advance!
I found the answer using np.einsum:
np.einsum('ij,jkl->ikl', A,B)
Explanation:
Einstein notation implies that we sum over matching subscripts.
np.einsum('ij,jkl->ikl', A,B)
= rewritten in math terms
A_{i,j} B_{j,k,l}
= Einstein notation implies summation
sum_j A_{i,j} B_{j,k,l}

Raise array to the power of another array - i.e. expanding the dimension of the array

Is it possible to use numpy to raise an array to the power of another array, in a way that yields a result with a larger dimension than the inputs - i.e. not just simple element wise raising to the power of.
As a simple example, I'm looking to compute the following. Below is the "longhand" form - in practice this is implemented by a loop over a large x array, so it's slow.
x = np.arange(4)
t = np.random.rand(3,3)
y = np.empty_like(x)
y[0] = np.sum(x[0]**t)
y[1] = np.sum(x[1]**t)
y[2] = np.sum(x[2]**t)
y[3] = np.sum(x[3]**t)
I'd like a vectorised solution to replace doing y[i] each time. However, since x has shape [4] and y has shape [3,3], when I try to compute x**t I get an error.
Is there a fast optimized solution?
A straight-forward vectorized way would be with broadcasting -
y = (x[:,None,None]**t).sum((1,2)).astype(x.dtype)
Or with the builtin np.power.outer -
y = np.power.outer(x,t).sum((1,2)).astype(x.dtype)
For large arrays, leverage multi-cores with numexpr module -
import numexpr as ne
y = ne.evaluate('sum(x3D**t1D,1)',{'x3D':x[:,None],'t1D':t.ravel()}).astype(x.dtype)

GEMM using Numpy einsum

Can a single numpy einsum statement replicate gemm functionality? Scalar and matrix multiplication seem straightforward, but I haven't found how to get the "+" working. In case its simpler, D = alpha * A * B + beta * C would be acceptable (preferable actually)
alpha = 2
beta = 3
A = np.arange(9).reshape(3, 3)
B = A + 1
C = B + 1
left_part = alpha*np.dot(A, B)
print(left_part)
left_part = np.einsum(',ij,jk->ik', alpha, A, B)
print(left_part)
There seems to be some confusion here: np.einsum handles operations that can be cast in the following form: broadcast–multiply–reduce. Element-wise summation is not part of its scope.
The reason why you need this sort of thing for the multiplication is that writing these operations out "naively" may exceed memory or computing resources quickly. Consider, for example, matrix multiplication:
import numpy as np
x, y = np.ones((2, 2000, 2000))
# explicit loop - ridiculously slow
a = sum(x[:,j,np.newaxis] * y[j,:] for j in range(2000))
# explicit broadcast-multiply-reduce: throws MemoryError
a = (x[:,:,np.newaxis] * y[:,np.newaxis,:]).sum(1)
# einsum or dot: fast and memory-saving
a = np.einsum('ij,jk->ik', x, y)
The Einstein convention however factorizes for addition, so you
can write your BLAS-like problem simply as:
d = np.einsum(',ij,jk->ik', alpha, a, b) + np.einsum(',ik', beta, c)
with minimal memory overhead (you can rewrite most of it as in-place operations if you are really concerned about memory) and constant runtime overhead (the cost of two python-to-C calls).
So regarding performance, this seems, respectfully, like a case of premature optimization to me: have you actually verified that the split of GEMM-like operations into two separate numpy calls is a bottleneck in your code? If it indeed is, then I suggest the following (in order of increasing involvedness):
Try, carefully!, scipy.linalg.blas.dgemm. I would be surprised if you get
significantly better performance, since dgemms are usually only
building block themselves.
Try an expression compiler (essentially you are proposing
such a thing) like Theano.
Write your own generalised ufunc using Cython or C.

Categories

Resources