Python: Speeding up large double sum with elements precalculated

Python: Speeding up large double sum with elements precalculated - python

I need to calculate a double sum of the form:
wignersum{ell} = sum_{ell1} sum_{ell2} (2*ell1+1)(2*ell2+1) * W{ell,ell1,ell2}^2 * C1(ell1) * C2(ell2)
where wignersum is an array indexed by ell, and ell, ell1, and ell2 all run from 0 to ellmax. The W{ell,ell1,ell2}^2 are a set of known coefficients that I've already calculated (called w3j), stored in an array of shape (ellmax, ellmax, ellmax) as a global variable to be called by this function. (These coefficients are time intensive to calculate and I've found it faster to load them from a numpy file). The C1 and C2 are arrays of coefficients of shape (ellmax).
I have successfully calculated this sum by making use of a double for loop and grabbing the appropriate elements from each prexisting array and updating the wignersum array in each iteration. I assume there is a better way to vectorize this problem to speed up the calculation. I thought about making the C1 and C2 arrays into arrays of the same shape as the w3j array, then multiplying these arrays elementwise before using np.sum on the ell1 and ell2 axes. I'm unsure whether this is in fact a good method of vecotrizing, and if it is, how to actually do this.
The code as it stands is something like
import numpy as np
ell_max = 400
w3j = np.ones((ell_max, ell_max, ell_max))
C1 = np.arange(ell_max)
C2 = np.arange(ell_max)
def function(ell_max)
ells = np.arange(ell_max)
wignersum = np.zeros(ell_max)
factor = np.array([2*i+1 for i in range(384)])
for ell1 in ells:
A = factor[ell1]
B = C1[ell1]
for ell2 in ells:
D = factor[ell2] * C2[ell2] * w3j[:,ell1,ell2]
wignersum += A * B * D
return wignersum
(note the in actuality C1 and C2 are not global variables but are local variables that must be calculated from a set of parameters fed to function. This is not the limiting factor in the code speed however)
With the double for loop this takes ~1.5 seconds to run for ell_max~400 which is too long for the purposes I'm using it for. I'd like to vectorize this as much as possible to improve speed.

You can use either einsum or matrix multiplication for a ~20x speedup:
import numpy as np
ell_max = 400
w3j = np.random.randint(1,10,(ell_max, ell_max, ell_max))
C1 = np.random.randint(1,10,ell_max)
C2 = np.random.randint(1,10,ell_max)
def function(ell_max):
ells = np.arange(ell_max)
wignersum = np.zeros(ell_max)
factor = np.array([2*i+1 for i in range(ell_max)])
for ell1 in ells:
A = factor[ell1]
B = C1[ell1]
for ell2 in ells:
D = factor[ell2] * C2[ell2] * w3j[:,ell1,ell2]
wignersum += A * B * D
return wignersum
def pp_es(l_mx):
l = np.arange(l_mx)
f = 2*l+1
return np.einsum("i,i,j,j,kij",f,C1,f,C2,w3j,optimize=True)
def pp_mm(l_mx):
l = np.arange(l_mx)
f = 2*l+1
return w3j.reshape(l_mx,-1)#np.outer(f*C1,f*C2).ravel()
from timeit import timeit
print(timeit(lambda:pp_es(400),number=10))
print(timeit(lambda:pp_mm(400),number=10))
print(timeit(lambda:function(400),number=10))
print((pp_mm(400)==pp_es(400)).all())
print((function(400)==pp_mm(400)).all())
Sample run:
0.6061844169162214 # einsum
0.6111843499820679 # matrix x vector
12.233918005018495 # OP
True # einsum == matrix x vector
True # OP == matrix x vector

Related

Computing derivatives using numpy

I'm trying to implement a differential in python via numpy that can accept a scalar, a vector, or a matrix.
import numpy as np
def foo_scalar(x):
f = x * x
df = 2 * x
return f, df
def foo_vector(x):
f = x * x
n = x.size
df = np.zeros((n, n))
for mu in range(n):
for i in range(n):
if mu == i:
df[mu, i] = 2 * x[i]
return f, df
def foo_matrix(x):
f = x * x
m, n = x.shape
df = np.zeros((m, n, m, n))
for mu in range(m):
for nu in range(n):
for i in range(m):
for j in range(n):
if (mu == i) and (nu == j):
df[mu, nu, i, j] = 2 * x[i, j]
return f, df
This works fine, but it seems like there should be a way to do this in a single function, and let numpy "figure out" the correct dimensions. I could force everything into a 2-D array form with something like
x = np.array(x)
if len(x.shape) == 0:
x = x.reshape(1, 1)
elif len(x.shape) == 1:
x = x.reshape(-1, 1)
if len(f.shape) == 0:
f = f.reshape(1, 1)
elif len(f.shape) == 1:
f = f.reshape(-1, 1)
and always have 4 nested for loops, but this doesn't scale if I need to generalize to higher-order tensors.
Is what I'm trying to do possible, and if so, how?

I highly doubt there is a function to generate the second parameter returned by the function in Numpy. That being said you can play with the feature of Numpy and Python so to vectorize this and make the function faster. You first need to generate the indices and, then generate the target matrix and set it. Note that operating with N-dimensional generic arrays tends to be slow and tricky in non-trivial cases. The magic * unrolling operator is used to generate N parameters.
def foo_generic(x):
f = x ** 2
idx = np.stack(np.meshgrid(*[np.arange(e) for e in x.shape], indexing='ij'))
idx = tuple(np.concatenate((idx, idx)).reshape(2*x.ndim, -1))
df = np.zeros([*x.shape, *x.shape])
df[idx] = 2 * x.ravel()
return f, df
Note that foo_generic does not support scalar and it would be very inefficient to use it for that anyway, but you can add a condition in it to support this special case apart.
The df matrix will very quickly be huge for higher order so I strongly advise you not to use dense matrices for that since the number of zeros is huge compared to the number of values in the matrix case already. Sparse matrices fix this. In fact, for a 5x5 matrix, there are >95% of zeros. Not to mention the matrix becomes quickly huge and willing a huge matrix full of zeros is not efficient.

Multiplication of two huge dense matrices Hadamard-multiplied by a sparse matrix

I have two dense matrices A and B, and each of them has a size fo 3e5x100. Another sparse binary matrix, C, with size 3e5x3e5. I want to find the following quantity: C ∘ (AB'), where ∘ is Hadamard product (i.e., element wise) and B' is the transpose of B. Explicitly calculating AB' will ask for crazy amount of memory (~500GB). Since the end result won't need the whole AB', it is sufficient to only calculate the multiplication A_iB_j' where C_ij != 0, where A_i is the column i of matrix A and C_ij is the element at location (i,j) of the matrix C. A suggested approach would be like the algorithm below:
result = numpy.initalize_sparse_matrix(shape = C.shape)
while True:
(i,j) = C_ij.pop_nonzero_index() #prototype function returns the nonzero index and then points to the next nonzero index
if (i,j) is empty:
break
result(i,j) = A_iB_j'
This algorithm however takes too much time. Is there anyway to improve it using LAPACK/BLAS algorithms? I am coding in Python so I think numpy can be more human friendly wrapper for LAPACK/BLAS.

You can do this computation using the following, assuming C is stored as a scipy.sparse matrix:
C = C.tocoo()
result_data = C.data * (A[C.row] * B[C.col]).sum(1)
result = sparse.coo_matrix((result_data, (row, col)), shape=C.shape)
Here we show that the result matches the naive algorithm for some smaller inputs:
import numpy as np
from scipy import sparse
N = 300
M = 10
def make_C(N, nnz=1000):
data = np.random.rand(nnz)
row = np.random.randint(0, N, nnz)
col = np.random.randint(0, N, nnz)
return sparse.coo_matrix((data, (row, col)), shape=(N, N))
A = np.random.rand(N, M)
B = np.random.rand(N, M)
C = make_C(N)
def f_naive(C, A, B):
return C.multiply(np.dot(A, B.T))
def f_efficient(C, A, B):
C = C.tocoo()
result_data = C.data * (A[C.row] * B[C.col]).sum(1)
return sparse.coo_matrix((result_data, (C.row, C.col)), shape=C.shape)
np.allclose(
f_naive(C, A, B).toarray(),
f_efficient(C, A, B).toarray()
)
# True
And here we see that it works for the full input size:
N = 300000
M = 100
A = np.random.rand(N, M)
B = np.random.rand(N, M)
C = make_C(N)
out = f_efficient(C, A, B)
print(out.shape)
# (300000, 300000)
print(out.nnz)
# 1000

Vectorizing three nested loops with Numpy

I have a complex matrix C with dimensions (r, r) as well as a complex vector of size r. I need to compute a new matrix from C and v following this equation:
where K is also a square matrix of dimensions (r, r). Here is the code to compute K with three loops:
import numpy as np
import matplotlib.pyplot as plt
r = 9
# Create random matrix
C = np.random.rand(r,r) + np.random.rand(r,r) * 1j
v = np.random.rand(r) + np.random.rand(r) * 1j
# Original loops
K = np.zeros((r, r))
for m in range(r):
for n in range(r):
for i in range(r):
K[m,n] += np.imag( C[i,m] * np.conj(C[i,n]) * np.sign(np.imag(v[i])) )
plt.figure()
plt.imshow(K)
plt.show()
Removing the loop with i is relatively easy:
# First optimization
K = np.zeros((r, r))
for m in range(r):
for n in range(r):
K[m,n] = np.imag(np.sum(C[:,m] * np.conj(C[:,n]) * np.sign(np.imag(v)) ))
but I am not sure how to proceed to vectorize the two remaining loops. Is it actually possible in this case?

I had a lot of these of problems and here is how I usually proceeded to find solutions to writing out vectorized code.
Here is what I have noticed about your summation. Cool conclusion is that you probably do not need vectorization at all, as you can express your whole calculation as a single product of 2D matrics. Here comes...
Lets first define following matrix (sorry for lack of Latex notation, Stackoverflow does not support Mathjax) :
A_{i,j} = c_{i,j}.
B_{i,j} = c_{i,j} * sgn(Im(v_i))
Then you can write your summation as:
k_{m,n} = Im( \sum_{i=1}^{r} c_{i,m} * sgn(Im(v_i)) * c_{i,n}^* ) = Im ( \sum_{i=1}^{r} B_{i,m} * A_{i,n}^* ) = Im( \sum_{i=1}^{r} B_{m,i}^T * A_{i,n}^* )
The expression above inside of Im(.) is the by definition of matrix multiplication equivalent to following :
k_{m,n} = Im( (B^T * A^*)_{m,n} )
Which means that your matrix k can be expressed as product of transpose of matrix B and product of matrix A. In your code the matrix matrix A is assigned already to variable C. So the vectorization could be done as follows:
C = np.random.rand(r,r) + np.random.rand(r,r) * 1j
v = np.random.rand(r) + np.random.rand(r) * 1j
k = np.imag( (C * np.sign(np.imag(v)).T # np.conj(C) )
And you have avoided both nasty loops and convoluted expressions

This looks like matrix multiplication:
out = np.imag((C*np.sign(np.imag(v))[:,None]).T # np.conj(C))
Or you can use np.einsum:
out = np.imag(np.einsum('im,in,i', C, np.conj(C), np.sign(np.imag(v))))
Verification with your approach:
np.all(np.abs(out-K) < 1e-6)
# True

I found something that can work for now. However, one loop remains and since the resulting matrix is symetric, there is still some optimization to be made.
Instead of removing the i loop, I removed the two other ones:
K = np.zeros((r, r), dtype=np.complex128)
for i in range(r):
K += adjointMatrix(C) # (np.sign(np.imag(v)) * C)
K = np.imag(K)
with:
def adjointMatrix(X):
return np.conjugate( np.transpose(X) )

I am newbie in python and doing coding for my physics project which requires to generate a matrix with a variable E

I am newbie in python and doing coding for my physics project which requires to generate a matrix with a variable E for which first element of the matrix has to be solved. Please help me. Thanks in advance.
Here is the part of code
import numpy as np
import pylab as pl
import math
import cmath
import sympy as sy
from scipy.optimize import fsolve
#Constants(Values at temp 10K)
hbar = 1.055E-34
m0=9.1095E-31 #free mass of electron
q= 1.602E-19
v = [0.510,0,0.510] # conduction band offset in eV
m1= 0.043 #effective mass in In_0.53Ga_0.47As
m2 = 0.072 #effective mass in Al_0.48In_0.52As
d = [-math.inf,100,math.inf] # dimension of structure in nanometers
'''scaling factor to with units of E in eV, mass in terms of free mass of electron, length in terms
of nanometers '''
s = (2*q*m0*1E-18)/(hbar)**2
#print('scaling factor is ',s)
E = sy.symbols('E') #Suppose energy of incoming particle is 0.3eV
m = [0.043,0.072,0.043] #effective mass of electrons in layers
for i in range(3):
print ('Effective mass of e in layer', i ,'is', m[i])
k=[ ] #Defining an array for wavevectors in different layers
for i in range(3):
k.append(sy.sqrt(s*m[i]*(E-v[i])))
print('Wave vector in layer',i,'is',k[i])
x = []
for i in range(2):
x.append((k[i+1]*m[i])/(k[i]*m[i+1]))
# print(x[i])
#Define Boundary condition matrix for two interfaces.
D0 = (1/2)*sy.Matrix([[1+x[0],1-x[0]], [1-x[0], 1+x[0]]], dtype = complex)
#print(D0)
#A = sy.matrix2numpy(D0,dtype=complex)
D1 = (1/2)*sy.Matrix([[1+x[1],1-x[1]], [1-x[1], 1+x[1]]], dtype = complex)
#print(D1)
#a=eye(3,3)
#print(a)
#Define Propagation matrix for 2nd layer or quantum well
#print(d[1])
#print(k[1])
P1 = 1*sy.Matrix([[sy.exp(-1j*k[1]*d[1]), 0],[0, sy.exp(1j*k[1]*d[1])]], dtype = complex)
#print(P1)
print("abs")
T= D0*P1*D1
#print('Transfer Matrix is given by:',T)
#print('Dimension of tranfer matrix T is' ,T.shape)
#print(T[0,0]
# I want to solve T{0,0} = 0 equation for E
def f(x):
return T[0,0]
x0= 0.5 #intial guess
x = fsolve(f, x0)
print("E is",x)
'''
y=sy.Eq(T[0,0],0)
z=sy.solve(y,E)
print('z',z)
'''
**The main part i guess is the part of the code where i am trying to solve the equation.***Steps I am following:
Defining a symbol E by using sympy
Generating three matrices which involves sum formulae and with variable E
Generating a matrix T my multiplying those 3 matrices,note that elements are complex and involves square roots of negative number.
I need to solve first element of this matrix T[0,0]=0,for variable E and find out value of E. I used fsolve for soving T[0,0]=0.*

Just a note for future questions, please leave out unused imports such as numpy and leave out zombie code like # a = eye(3,3). This helps keep the code as clean and short as possible. Also, the sample code would not run because of indentation problems, so when you copy and paste code, make sure it works before you do so. Always try to make your questions as short and modular as possible.
The expression of T[0,0] is too complex to solve analytically by SymPy so numerical approximation is needed. This leaves 2 options:
using SciPy's solvers which are advanced but require type casting to float values since SciPy does not deal with SymPy objects in any way.
using SymPy's root solvers which are less advanced but are probably simpler to use.
Both of these will only ever produce a single number as output since you can't expect numeric solvers to find every root. If you wanted to find more than one, then I advise that you use a list of points that you want to use as initial values, input each of them into the solvers and keep track of the distinct outputs. This will however never guarantee that you have obtained every root.
Only mix SciPy and SymPy if you are comfortable using both with no problems. SciPy doesn't play at all with SymPy and you should only have list, float, and complex instances when working with SciPy.
import math
import sympy as sy
from scipy.optimize import newton
# Constants(Values at temp 10K)
hbar = 1.055E-34
m0 = 9.1095E-31 # free mass of electron
q = 1.602E-19
v = [0.510, 0, 0.510] # conduction band offset in eV
m1 = 0.043 # effective mass in In_0.53Ga_0.47As
m2 = 0.072 # effective mass in Al_0.48In_0.52As
d = [-math.inf, 100, math.inf] # dimension of structure in nanometers
'''scaling factor to with units of E in eV, mass in terms of free mass of electron, length in terms
of nanometers '''
s = (2 * q * m0 * 1E-18) / hbar ** 2
E = sy.symbols('E') # Suppose energy of incoming particle is 0.3eV
m = [0.043, 0.072, 0.043] # effective mass of electrons in layers
for i in range(3):
print('Effective mass of e in layer', i, 'is', m[i])
k = [] # Defining an array for wavevectors in different layers
for i in range(3):
k.append(sy.sqrt(s * m[i] * (E - v[i])))
print('Wave vector in layer', i, 'is', k[i])
x = []
for i in range(2):
x.append((k[i + 1] * m[i]) / (k[i] * m[i + 1]))
# Define Boundary condition matrix for two interfaces.
D0 = (1 / 2) * sy.Matrix([[1 + x[0], 1 - x[0]], [1 - x[0], 1 + x[0]]], dtype=complex)
D1 = (1 / 2) * sy.Matrix([[1 + x[1], 1 - x[1]], [1 - x[1], 1 + x[1]]], dtype=complex)
# Define Propagation matrix for 2nd layer or quantum well
P1 = 1 * sy.Matrix([[sy.exp(-1j * k[1] * d[1]), 0], [0, sy.exp(1j * k[1] * d[1])]], dtype=complex)
print("abs")
T = D0 * P1 * D1
# did not converge for 0.5
x0 = 0.75
# method 1:
def f(e):
# evaluate T[0,0] at e and remove all sympy related things.
result = complex(T[0, 0].replace(E, e))
return result
solution1 = newton(f, x0)
print(solution1)
# method 2:
solution2 = sy.nsolve(T[0,0], E, x0)
print(solution2)
This prints:
(0.7533104353644469-0.023775286117722193j)
1.00808496181754 - 0.0444042144405285*I
Note that the first line is a native Python complex instance while the second is an instance of SymPy's complex number. One can convert the second simply with print(complex(solution2)).
Now, you'll notice that they produce different numbers but both are correct. This function seems to have a lot of zeros as can be shown from the Geogebra plot:
The red axis is Re(E), green is Im(E) and blue is |T[0,0]|. Each of those "spikes" are probably zeros.

Apply DFT matrix along each axis of 3D array in NumPy?

I can first obtain the DFT matrix of a given size, say n by
import numpy as np
n = 64
D = np.fft.fft(np.eye(n))
The FFT is of course just a quick algorithm for applying D to a vector:
x = np.random.randn(n)
ft1 = np.dot(D,x)
print( np.abs(ft1 - fft.fft(x)).max() )
# prints near double precision roundoff
The 2D FFT can be obtained by applying D to both the rows and columns of a matrix:
x = np.random.randn(n,n)
ft2 = np.dot(x, D.T) # Apply D to rows.
ft2 = np.dot(D, ft2) # Apply D to cols.
print( np.abs(ft2 - fft.fft2(x)).max() )
# near machine round off again
How do I compute this analogously for the 3 dimensional Discrete Fourier Transform?
I.e.,
x = np.random.randn(n,n,n)
ft3 = # dot operations using D and x
print( np.abs(ft3 - fft.fftn(x)).max() )
# prints near zero
Essentially, I think I need to apply D to each column vector in the volume, then each row vector in the volume, and finally each "depth vector". But I'm not sure how to do this using dot.

You can use the einsum expression to perform the transformation on each index:
x = np.random.randn(n, n, n)
ft3 = np.einsum('ijk,im->mjk', x, D)
ft3 = np.einsum('ijk,jm->imk', ft3, D)
ft3 = np.einsum('ijk,km->ijm', ft3, D)
print(np.abs(ft3 - np.fft.fftn(x)).max())
1.25571216554e-12
This can also be written as a single NumPy step:
ft3 = np.einsum('ijk,im,jn,kl->mnl', ft3, D, D, D, optimize=True)
Without the optimize argument (available in NumPy 1.12+) it will be very slow however. You can also do each of the steps using dot, but it requires a bit of reshaping and transposing. In NumPy 1.14+ the einsum function will automatically detect the BLAS operations and do this for you.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Speeding up large double sum with elements precalculated - python

Related

Computing derivatives using numpy

Multiplication of two huge dense matrices Hadamard-multiplied by a sparse matrix

Vectorizing three nested loops with Numpy

I am newbie in python and doing coding for my physics project which requires to generate a matrix with a variable E

Apply DFT matrix along each axis of 3D array in NumPy?

Categories

Resources