Vectorizing three nested loops with Numpy - python

I have a complex matrix C with dimensions (r, r) as well as a complex vector of size r. I need to compute a new matrix from C and v following this equation:
where K is also a square matrix of dimensions (r, r). Here is the code to compute K with three loops:
import numpy as np
import matplotlib.pyplot as plt
r = 9
# Create random matrix
C = np.random.rand(r,r) + np.random.rand(r,r) * 1j
v = np.random.rand(r) + np.random.rand(r) * 1j
# Original loops
K = np.zeros((r, r))
for m in range(r):
for n in range(r):
for i in range(r):
K[m,n] += np.imag( C[i,m] * np.conj(C[i,n]) * np.sign(np.imag(v[i])) )
plt.figure()
plt.imshow(K)
plt.show()
Removing the loop with i is relatively easy:
# First optimization
K = np.zeros((r, r))
for m in range(r):
for n in range(r):
K[m,n] = np.imag(np.sum(C[:,m] * np.conj(C[:,n]) * np.sign(np.imag(v)) ))
but I am not sure how to proceed to vectorize the two remaining loops. Is it actually possible in this case?

I had a lot of these of problems and here is how I usually proceeded to find solutions to writing out vectorized code.
Here is what I have noticed about your summation. Cool conclusion is that you probably do not need vectorization at all, as you can express your whole calculation as a single product of 2D matrics. Here comes...
Lets first define following matrix (sorry for lack of Latex notation, Stackoverflow does not support Mathjax) :
A_{i,j} = c_{i,j}.
B_{i,j} = c_{i,j} * sgn(Im(v_i))
Then you can write your summation as:
k_{m,n} = Im( \sum_{i=1}^{r} c_{i,m} * sgn(Im(v_i)) * c_{i,n}^* ) = Im ( \sum_{i=1}^{r} B_{i,m} * A_{i,n}^* ) = Im( \sum_{i=1}^{r} B_{m,i}^T * A_{i,n}^* )
The expression above inside of Im(.) is the by definition of matrix multiplication equivalent to following :
k_{m,n} = Im( (B^T * A^*)_{m,n} )
Which means that your matrix k can be expressed as product of transpose of matrix B and product of matrix A. In your code the matrix matrix A is assigned already to variable C. So the vectorization could be done as follows:
C = np.random.rand(r,r) + np.random.rand(r,r) * 1j
v = np.random.rand(r) + np.random.rand(r) * 1j
k = np.imag( (C * np.sign(np.imag(v)).T # np.conj(C) )
And you have avoided both nasty loops and convoluted expressions

This looks like matrix multiplication:
out = np.imag((C*np.sign(np.imag(v))[:,None]).T # np.conj(C))
Or you can use np.einsum:
out = np.imag(np.einsum('im,in,i', C, np.conj(C), np.sign(np.imag(v))))
Verification with your approach:
np.all(np.abs(out-K) < 1e-6)
# True

I found something that can work for now. However, one loop remains and since the resulting matrix is symetric, there is still some optimization to be made.
Instead of removing the i loop, I removed the two other ones:
K = np.zeros((r, r), dtype=np.complex128)
for i in range(r):
K += adjointMatrix(C) # (np.sign(np.imag(v)) * C)
K = np.imag(K)
with:
def adjointMatrix(X):
return np.conjugate( np.transpose(X) )

Related

Computing derivatives using numpy

I'm trying to implement a differential in python via numpy that can accept a scalar, a vector, or a matrix.
import numpy as np
def foo_scalar(x):
f = x * x
df = 2 * x
return f, df
def foo_vector(x):
f = x * x
n = x.size
df = np.zeros((n, n))
for mu in range(n):
for i in range(n):
if mu == i:
df[mu, i] = 2 * x[i]
return f, df
def foo_matrix(x):
f = x * x
m, n = x.shape
df = np.zeros((m, n, m, n))
for mu in range(m):
for nu in range(n):
for i in range(m):
for j in range(n):
if (mu == i) and (nu == j):
df[mu, nu, i, j] = 2 * x[i, j]
return f, df
This works fine, but it seems like there should be a way to do this in a single function, and let numpy "figure out" the correct dimensions. I could force everything into a 2-D array form with something like
x = np.array(x)
if len(x.shape) == 0:
x = x.reshape(1, 1)
elif len(x.shape) == 1:
x = x.reshape(-1, 1)
if len(f.shape) == 0:
f = f.reshape(1, 1)
elif len(f.shape) == 1:
f = f.reshape(-1, 1)
and always have 4 nested for loops, but this doesn't scale if I need to generalize to higher-order tensors.
Is what I'm trying to do possible, and if so, how?
I highly doubt there is a function to generate the second parameter returned by the function in Numpy. That being said you can play with the feature of Numpy and Python so to vectorize this and make the function faster. You first need to generate the indices and, then generate the target matrix and set it. Note that operating with N-dimensional generic arrays tends to be slow and tricky in non-trivial cases. The magic * unrolling operator is used to generate N parameters.
def foo_generic(x):
f = x ** 2
idx = np.stack(np.meshgrid(*[np.arange(e) for e in x.shape], indexing='ij'))
idx = tuple(np.concatenate((idx, idx)).reshape(2*x.ndim, -1))
df = np.zeros([*x.shape, *x.shape])
df[idx] = 2 * x.ravel()
return f, df
Note that foo_generic does not support scalar and it would be very inefficient to use it for that anyway, but you can add a condition in it to support this special case apart.
The df matrix will very quickly be huge for higher order so I strongly advise you not to use dense matrices for that since the number of zeros is huge compared to the number of values in the matrix case already. Sparse matrices fix this. In fact, for a 5x5 matrix, there are >95% of zeros. Not to mention the matrix becomes quickly huge and willing a huge matrix full of zeros is not efficient.

How can I enforce PBC when simulating a 2D Reaction-Diffusion system?

i'm having some problems attempting to implement periodic boundary conditions (PBC) on a reaction diffusion system simulated in Python using 2D numpy arrays. I'll try to explain using pseudocode and attach code as to how i'm currently handling the boundaries.
import numpy as np
N = 100
# I define a 2D array for each of my species in the reaction-diffusion system
a = np.array((N, N), dtype=np.float64)
b = np.array((N, N), dtype=np.float64)
.
.
.
n = np.array((N, N), dtype=np.float64)
# And also copies to update at each time step
a_copy = np.array((N, N), dtype=np.float64)
b_copy = np.array((N, N), dtype=np.float64)
.
.
.
n_copy = np.array((N, N), dtype=np.float64)
# I calculate the laplacian using the following function
#jit(nopython=True, fastmath=True)
def laplacian_numba(field, dh2_inv, out):
"""
Compute the laplacian of an array using a 5 point stencil.
"""
for i in range(1, N - 1):
for j in range(1, N - 1):
out[i, j] = (
field[i + 1, j]
+ field[i - 1, j]
+ field[i, j + 1]
+ field[i, j - 1]
- 4 * field[i, j]
) * dh2_inv
return out
# I have a main loop to update the functions using an explicit method
#jit(nopython=True, fastmath=True)
def update(a, b, ..., n):
# Compute the laplacian of only the diffusing variables
laplacian_numba(a, dh2_inv, out=lap_a)
laplacian_numba(b, dh2_inv, out=lap_b)
# Update the copy arrays
a_copy = a[1:-1, 1:-1] + dt * (ODE stuff + lap_a * a[1:-1, 1:-1])
...
# Finally I enforce the boundary conditions on the system
# If i'm not mistaken, these are reflecting boundary conditions, not periodic
# And this is where i'm lost as to how to implement the periodicity
a_copy[0,:] = a_copy[1,:]
a_copy[-1,:] = a_copy[-2,:]
a_copy[:,0] = a_copy[:,1]
a_copy[:,-1] = a_copy[:,-2]
# Update the previous and next timestep arrays
a, a_copy = a_copy, a
b, b_copy = b_copy, b
return a, b, ..., n
The above is a very crude pseudocode of how I implemented the system, and how I update at each timestep and enforce the boundary conditions, which if I'm not mistaken are just reflecting the edges back onto the main grid. Here is my main question, what would I need to change in order to make the edges periodic and not reflecting? As I come from a biochemistry background, PDEs are not my forte, but I'm making an effort since this is a key objective in my thesis and would appreciate any help or guidance.
Thanks in advance to anyone who takes the time to read this! And I apologize for any formatting mistakes I could've made.
With periodic boundary conditions, there isn't really a boundary; your coordinate space just wraps around modulo N. So there is no need to add special first/last rows/columns at all, and no need to explicitly enforce the boundary condition either.
But you do have to make sure that all reads outside the matrix bounds are also wrapped around properly. For example, for your Laplacian you could do something like this:
#jit(nopython=True, fastmath=True)
def laplacian_numba(field, dh2_inv, out):
"""
Compute the laplacian of an array using a 5 point stencil.
"""
# Note the new range: 0, ..., N-1
for i in range(N):
for j in range(N):
out[i, j] = (
field[(i + 1) % N, j]
+ field[(i - 1) % N, j]
+ field[i, (j + 1) % N]
+ field[i, (j - 1) % N]
- 4 * field[i, j]
) * dh2_inv
return out
Incidentally, the same could be written in a oneliner using four np.roll calls, but I don't know if it'd be faster than your numba approach.

Sympy: Drop terms without a specific variable

I'm trying to compute some (a lot) of multivariate conditional densities (i.e., the multiplication of several multivariate probability density functions). I'm able to set up and expand the matrices properly but now would like to drop terms that, for example in the equation (and code) below, don't contain wg. With help from the posted answer, I was able to develop a hacky solution; improvements are welcome.
UPDATE: MWE
import sympy as sym
from IPython.display import display as disp
N = 211
wg = sym.MatrixSymbol('w_g', N, 1)
wg_n = sym.MatrixSymbol('w_gn', N, 1)
Z_wg = sym.MatrixSymbol('Z_wg', N, N)
# pdf wg
pdf_wg = ((wg - wg_n).T * Z_wg.I * (wg - wg_n))
pdf_full = sym.expand(pdf_wg)
# pdf_full.collect(wg) # NotImplementedError: noncommutative scalars in MatMul are not supported.
# print (wg in pdf_full.atoms()) # False
# this gives what I want
terms = pdf_full.as_terms()[0]
for term in terms:
if 'w_g,' in str(term[0].atoms()):
disp (term[0])
UPDATE 2: More Complex MWE
Here I'm trying to grab just the terms with b in them.
import sympy as sym
from IPython.display import display as disp, Math
mu = sym.symbols('mu') # mean non GIA SSH trend
N = 211
vec1 = sym.MatrixSymbol('1', N, 1)
u = sym.MatrixSymbol('u', N, 1)
Pi = sym.MatrixSymbol('Pi', N, N)
b = sym.MatrixSymbol('b', N, 1)
wg = sym.MatrixSymbol('w_g', N, 1)
wm = sym.MatrixSymbol('w_m', N, 1)
bhat = mu*vec1 + wg + wm + u # convenience
pdf = sym.expand((b - bhat).T * Pi.I * (b-bhat))
terms = pdf.as_terms()[0]
good_terms = []
for term in terms:
if b.args[0] in term[0].atoms():
good_terms.append(term[0])
print ('Good terms:'); disp(sym.Add(*good_terms))
UPDATE 4: Solved
For more complex expressions adding doit() to the expand will prevent a bunch of extra loops (e.g.):
pdf = sym.expand((b - bhat).T * Pi.I * (b-bhat)).doit()
More information can be found in the comments to the various answers.
Thanks!
You could extract the atoms of the expression and test whether the variable is among them:
from sympy import symbols
a, b, mug = symbols('a b mu_g')
expr1 = a * b + a * mug
expr2 = a * b
for expr in [expr1, expr2]:
if mug in expr.atoms():
print(expr, 'contains', mug)
else:
print(expr, 'does not contain', mug)
PS: An update for your new question. For a MatrixSymbol the symbol is stored as wg.args[0] (args[1] and args[2] are the dimensions):
import sympy as sym
N = 211
wg = sym.MatrixSymbol('w_g', N, 1)
wg_n = sym.MatrixSymbol('w_gn', N, 1)
Z_wg = sym.MatrixSymbol('Z_wg', N, N)
pdf_wg = ((wg - wg_n).T * Z_wg.I * (wg - wg_n))
pdf_full = sym.expand(pdf_wg)
print (wg.args[0] in pdf_full.atoms()) # True
Note that the hacky solution is the question could go wrong when w_g would be the last item or another name would end in the same string.
You can get the terms not containing wg like:
In [53]: pdf_full.subs(wg, ZeroMatrix(N, 1)).doit()
Out[53]:
T -1
w_gn ⋅Z_wg ⋅w_gn
Then you can subtract those from pdf_full:
In [54]: pdf_full - pdf_full.subs(wg, ZeroMatrix(N, 1)).doit()
Out[54]:
T -1 T -1 T -1
w_g ⋅Z_wg ⋅w_g -w_g ⋅Z_wg ⋅w_gn -w_gn ⋅Z_wg ⋅w_g

Efficient computation of a bilinear norm over sparse vectors in Python

Given two column vectors x, y : scipy.sparse.csc_matrix, where len(x) == len(y) == N and max(x.nnz, y.nnz) == M, and a symmetric N × N matrix A : scipy.sparse.csc_matrix, where for all columns j, A[j].nnz = C, I need to compute x.T * A * y = ∑∑ᵢ,ⱼ x[i] * a[j][i] * y[j] efficiently in at most M * max(M, C) steps, which can be achieved as follows:
in the outer loop, we iterate over the y.nnz columns j of A,
in the inner loop, we iterate over either:
the x.nnz rows i of A, if x.nnz < C, or
the C rows i of A, otherwise.
My question is whether this can be achieved using high-level Python and existing libraries (and if so, then how), or whether this requires custom C / C++ code.
The following naive Python code using the scipy library:
(x.T).dot(A).dot(y)[0, 0]
computes separately:
x.T * A using up to x.nnz * N steps, and
∑ⱼ (x.T * A)[j] * y[j] using up to M steps.
This takes O(M * N) steps in total, which is a major slow-down for large N.

Fast way for matrix multiplication in Python

Does anybody know a fast way to compute matrices such as:
Z{i,j} = \sum_{p,k,l,q} \frac{A_{ip} B_{pk} C_{kl} D_{lq} E_{qj} }{a_p - b_q - c}
For normal matrix multiplication I would use numpy.dot(a,b), but now I got to divide the elements by $a_p$ and $b_q$.
Any suggestions?
Any suggestions on how to compute
$$ C_{i,j} = \sum _p = \frac{E_{i,p} B_{p,j}}{m_p} $$
will be of great help as well.
Note that (E[i, p] * B[p, j]) / m[p] is equal to E[i, p] * (B[p, j] / m[p]), so you can simply divide m into B before calling np.dot.
def f(E, B, m):
B = np.asarray(B) # matrix
m = np.asarray(m).reshape((B.shape[0], 1)) # row vector
return np.dot(E, B / m) # m is broadcasted to match B

Categories

Resources