Speed up looping over three axis' in NumPy

Speed up looping over three axis' in NumPy - python

I need to speed up a for loop that does something like the code below:
import numpy as np
x = np.random.normal(size=(206,11,11))
y = np.random.normal(size=(206,11,11))
complx = x + 1j*y
complx
a,b,c = complx.shape
for n in xrange(a):
#do somthing
z = np.zeros(b)
for i in xrange(b):
z[i] = (complx[n,:,:].real[i][i]*complx[n,:,:].real[i][i] +\
complx[n,:,:].imag[i][i]*complx[n,:,:].imag[i][i])(**-0.25)
I'm vaguely aware these things can sometimes be done with numpy.einsum.
However, i am not really sure how to use it?
Or does anyone have any other suggestions?

In case you want to speed up the inner for loop you can do something like this
import numpy as np
x = np.random.normal(size=(206,11,11))
y = np.random.normal(size=(206,11,11))
complx = x + 1j*y
# takes only the diagonal part of all the 11x11 matrices
complx_diag = np.diagonal(complx,0,1,2)
# do the calc
zn = np.abs(complx_diag)**(-0.5)
for n in xrange(a):
z = zn[n]
# do your stuff
In case your stuff is not too complicated it can be vectorized as well (very likely).
The more you calculate outside the for loop the faster is your code.

If I am not mistaken, this is more or less what you want. The print statements are only there to convince oneself that the calculation is correct.
def optimize_01():
x = np.random.normal(size=(6, 11, 11))
y = np.random.normal(size=(6, 11, 11))
complx = x + 1j * y
a, b, _ = complx.shape
for n in range(a):
# do somthing
A = complx[n, :, :]
d = np.diagonal(A)
z = np.power(np.abs(d * d), -0.25)
print (d[0])
print (z[0])
print ((d[0].real * d[0].real + d[0].imag * d[0].imag) ** -0.25)
EDIT: If I compare this implementation with your implementation, I get the following.
import timeit
def optimize_02():
x = np.random.normal(size=(206, 11, 11))
y = np.random.normal(size=(206, 11, 11))
complx = x + 1j * y
a, b, _ = complx.shape
for n in range(a):
# do somthing
A = complx[n, :, :]
d = np.diagonal(A)
z = np.power(np.abs(d * d), -0.25)
def optimize_03():
x = np.random.normal(size=(206, 11, 11))
y = np.random.normal(size=(206, 11, 11))
complx = x + 1j * y
a, b, _ = complx.shape
for n in range(a):
# do somthing
z = np.zeros(b)
for i in range(b):
z[i] = (complx[n, :, :].real[i][i] * complx[n, :, :].real[i][i] + \
complx[n, :, :].imag[i][i] * complx[n, :, :].imag[i][i]) ** (-0.25)
if __name__ == '__main__':
print (timeit.timeit(optimize_02, number=10))
print (timeit.timeit(optimize_03, number=10))
Result:
0.03474012700007734
0.09025639800074714
With 6 arrays of 1100 elements, instead of 206 arrays of 11 elements, the result is:
5.762741210999593
5.771216576999905
It looks like my solution is not as fast after all.

Related

starting Summation value at i=2

I am trying to plot the error of this algorithm against h and I have run into a problem, for this error calculation, it cant use the first value, as it divides 0/0. How do I go about ignoring the first value where x =0? I basically need to start the summation on i=2 on line 46 (the absolute error one). Any help is much appreciated
import numpy
import matplotlib.pyplot as pyplot
from scipy.optimize import fsolve
from matplotlib import rcParams
rcParams['font.family'] = 'serif'
rcParams['font.size'] = 16
rcParams['figure.figsize'] = (12,6)
printing = False
def rk3(A, bvector, y0, interval, N):
h = (interval[1] - interval[0]) / N
x = numpy.linspace(interval[0], interval[1], N+1)
y = numpy.zeros((len(y0), N+1))
y[:, 0] = y0
b = bvector
for i in range(N):
y_1 = y[:, i] + h *(numpy.dot(A, y[:, i]) + b(x[i]))
y_2= (3/4)*y[:, i] + 0.25*y_1+0.25* h* (numpy.dot(A,y_1)+b(x[i]+h))
y[:, i+1] = (1/3)*y[:, i] + (2/3)*y_2 + (2/3)*h*(numpy.dot(A,y_2)+b(x[i]+h))
return x, y
def exact( interval, N):
w = numpy.linspace(interval[0], interval[1], N+1)
z = numpy.array([numpy.exp(-1000*w),(1000/999)*(numpy.exp(-w)-numpy.exp(-1000*w))])
return w, z
A=numpy.array([[-1000,0],[1000,-1]])
def bvector(x):
return numpy.zeros(2)
y0=numpy.array([1,0])
interval=numpy.array([0,0.1])
N=numpy.arange(40,401,40)
h=numpy.zeros(len(N))
abs_err = numpy.zeros(len(N))
for i in range(len(N)):
interval=numpy.array([0,0.1])
h[i]=(interval[1] - interval[0]) / N[i]
x, y = rk3(A, bvector, y0, interval, N[i])
w,z=exact(interval,N[i])
abs_err[i] = h[i]*numpy.sum(numpy.abs((y[1,:]-z[1,:])/z[1,:]))
p = numpy.polyfit(numpy.log(h), numpy.log(abs_err),1)
fig = pyplot.figure(figsize = (12, 8), dpi = 50)
pyplot.loglog(h, abs_err, 'kx')
pyplot.loglog(h, numpy.exp(p[1]) * h**(p[0]), 'b-')
pyplot.xlabel('$h$', size = 16)
pyplot.ylabel('$|$Error$|$', size = 16)
pyplot.show()

Simply add an if for the value which is zero. so for example if the dividing variable is x.
if x>0:
#code here for the calculation
The above code will use all positive non-zero value. to only skip zero use this
if x!=0:
You can also us the three arguments of a for loop:
for a in range(start_value, end_value, increment):
so this means
for a in range(2,10,2):
print a
will give you the below result
2
4
6
8

Replacing multiprocessing pool.map with mpi4py

I'm a beginner in using MPI, and I'm still going through the documentation. However, there's very little to work on when it comes to mpi4py. I have written a code that currently uses the multiprocessing module to run on many cores, but I need replace this with mpi4py so that I can use more than one node to run my code. My code is below, when using the multiprocessing module, and also without.
With multiprocessing,
import numpy as np
import multiprocessing
start_time = time.time()
E = 0.1
M = 5
n = 1000
G = 1
c = 1
stretch = [10, 1]
#Point-Distribution Generator Function
def CDF_inv(x, e, m):
A = 1/(1 + np.log(m/e))
if x == 1:
return m
elif 0 <= x <= A:
return e * x / A
elif A < x < 1:
return e * np.exp((x / A) - 1)
#Elliptical point distribution Generator Function
def get_coor_ellip(dist=CDF_inv, params=[E, M], stretch=stretch):
R = dist(random.random(), *params)
theta = random.random() * 2 * np.pi
return (R * np.cos(theta) * stretch[0], R * np.sin(theta) * stretch[1])
def get_dist_sq(x_array, y_array):
return x_array**2 + y_array**2
#Function to obtain alpha
def get_alpha(args):
zeta_list_part, M_list_part, X, Y = args
alpha_x = 0
alpha_y = 0
for key in range(len(M_list_part)):
z_m_z_x = X - zeta_list_part[key][0]
z_m_z_y = Y - zeta_list_part[key][1]
dist_z_m_z = get_dist_sq(z_m_z_x, z_m_z_y)
alpha_x += M_list_part[key] * z_m_z_x / dist_z_m_z
alpha_y += M_list_part[key] * z_m_z_y / dist_z_m_z
return (alpha_x, alpha_y)
#The part of the process containing the loop that needs to be parallelised, where I use pool.map()
if __name__ == '__main__':
# n processes, scale accordingly
num_processes = 10
pool = multiprocessing.Pool(processes=num_processes)
random_sample = [CDF_inv(x, E, M)
for x in [random.random() for e in range(n)]]
zeta_list = [get_coor_ellip() for e in range(n)]
x1, y1 = zip(*zeta_list)
zeta_list = np.column_stack((np.array(x1), np.array(y1)))
x = np.linspace(-3, 3, 100)
y = np.linspace(-3, 3, 100)
X, Y = np.meshgrid(x, y)
print len(x)*len(y)*n,'calculations to be carried out.'
M_list = np.array([.001 for i in range(n)])
# split zeta_list, M_list, X, and Y
zeta_list_split = np.array_split(zeta_list, num_processes, axis=0)
M_list_split = np.array_split(M_list, num_processes)
X_list = [X for e in range(num_processes)]
Y_list = [Y for e in range(num_processes)]
alpha_list = pool.map(
get_alpha, zip(zeta_list_split, M_list_split, X_list, Y_list))
alpha_x = 0
alpha_y = 0
for e in alpha_list:
alpha_x += e[0] * 4 * G / (c**2)
alpha_y += e[1] * 4 * G / (c**2)
print("%f seconds" % (time.time() - start_time))
Without multiprocessing,
import numpy as np
E = 0.1
M = 5
G = 1
c = 1
M_list = [.1 for i in range(n)]
#Point-Distribution Generator Function
def CDF_inv(x, e, m):
A = 1/(1 + np.log(m/e))
if x == 1:
return m
elif 0 <= x <= A:
return e * x / A
elif A < x < 1:
return e * np.exp((x / A) - 1)
n = 1000
random_sample = [CDF_inv(x, E, M)
for x in [random.random() for e in range(n)]]
stretch = [5, 2]
#Elliptical point distribution Generator Function
def get_coor_ellip(dist=CDF_inv, params=[E, M], stretch=stretch):
R = dist(random.random(), *params)
theta = random.random() * 2 * np.pi
return (R * np.cos(theta) * stretch[0], R * np.sin(theta) * stretch[1])
#zeta_list is the list of coordinates of a distribution of points
zeta_list = [get_coor_ellip() for e in range(n)]
x1, y1 = zip(*zeta_list)
zeta_list = np.column_stack((np.array(x1), np.array(y1)))
#Creation of a X-Y Grid
x = np.linspace(-3, 3, 100)
y = np.linspace(-3, 3, 100)
X, Y = np.meshgrid(x, y)
def get_dist_sq(x_array, y_array):
return x_array**2 + y_array**2
#Calculation of alpha, containing the loop that needs to be parallelised.
alpha_x = 0
alpha_y = 0
for key in range(len(M_list)):
z_m_z_x = X - zeta_list[key][0]
z_m_z_y = Y - zeta_list[key][1]
dist_z_m_z = get_dist_sq(z_m_z_x, z_m_z_y)
alpha_x += M_list[key] * z_m_z_x / dist_z_m_z
alpha_y += M_list[key] * z_m_z_y / dist_z_m_z
alpha_x *= 4 * G / (c**2)
alpha_y *= 4 * G / (c**2)
Basically what my code does is, it first generates a list of points that follow a certain distribution. Then I apply an equation to obtain the quantity 'alpha' using different relations between the distances of the points. The part that requires parallelisation is the single for loop involved in the calculation of alpha. What I want to do is to use mpi4py instead of multiprocessing to do this, and I am not sure how to get this going.

Transforming the multiprocessing.map version to MPI can be done using scatter / gather. In your case it is useful, that you already prepare the input list into one chunk for each rank. The main difference is, that all code gets executed by all ranks in the first place, so you must make everything that should be done only by the maste rank 0 conidtional.
if __name__ == '__main__':
comm = MPI.COMM_WORLD
if comm.rank == 0:
random_sample = [CDF_inv(x, E, M)
for x in [random.random() for e in range(n)]]
zeta_list = [get_coor_ellip() for e in range(n)]
x1, y1 = zip(*zeta_list)
zeta_list = np.column_stack((np.array(x1), np.array(y1)))
x = np.linspace(-3, 3, 100)
y = np.linspace(-3, 3, 100)
X, Y = np.meshgrid(x, y)
print len(x)*len(y)*n,'calculations to be carried out.'
M_list = np.array([.001 for i in range(n)])
# split zeta_list, M_list, X, and Y
zeta_list_split = np.array_split(zeta_list, comm.size, axis=0)
M_list_split = np.array_split(M_list, comm.size)
X_list = [X for e in range(comm.size)]
Y_list = [Y for e in range(comm.size)]
work_list = list(zip(zeta_list_split, M_list_split, X_list, Y_list))
else:
work_list = None
my_work = comm.scatter(work_list)
my_alpha = get_alpha(my_work)
alpha_list = comm.gather(my_alpha)
if comm.rank == 0:
alpha_x = 0
alpha_y = 0
for e in alpha_list:
alpha_x += e[0] * 4 * G / (c**2)
alpha_y += e[1] * 4 * G / (c**2)
This works fine as long as each processor gets a similar amount of work. If communication becomes an issue, you might want to split up the data generation among processors instead of doing it all on the master rank 0.
Note: Some things about the code are bogus, e.g. alpha_[xy] ends up as np.ndarray. The serial version runs into an error.

For people who are still interested in similar subjects, I highly recommend having a look at the MPIPoolExecutor() class here and the documentation is here.

Numpy arange error with Lagrange Multiplier in Python

I try to use Lagrange multiplier to optimize a function, and I am trying to loop through the function to get a list of number, however I got the error
ValueError: setting an array element with a sequence.
Here is my code, where do I go wrong? If the n is not an array I can get the result correctly though
import numpy as np
from scipy.optimize import fsolve
n = np.arange(10000,100000,10000)
def func(X):
x = X[0]
y = X[1]
L = X[2]
return (x + y + L * (x**2 + y**2 - n))
def dfunc(X):
dLambda = np.zeros(len(X))
h = 1e-3
for i in range(len(X)):
dX = np.zeros(len(X))
dX[i] = h
dLambda[i] = (func(X+dX)-func(X-dX))/(2*h);
return dLambda
X1 = fsolve(dfunc, [1, 1, 0])
print (X1)
Helps would be appreciated, thank you very much

First, check func = fsolve()
Second, print(func([1,1,0]))` - result in not number ([2 2 2 2 2 2 2 2 2]), beause "n" is list. if you want to iterate n try:
import numpy as np
from scipy.optimize import fsolve
n = np.arange(10000,100000,10000)
def func(X,n):
x = X[0]
y = X[1]
L = X[2]
return (x + y + L * (x**2 + y**2 - n))
def dfunc(X,n):
dLambda = np.zeros(len(X))
h = 1e-3
r = 0
for i in range(len(X)):
dX = np.zeros(len(X))
dX[i] = h
dLambda[i] = (func(X+dX,n)-func(X-dX,n))/(2*h)
return dLambda
for iter_n in n:
print("for n = {0} dfunc = {1}".format(iter_n,dfunc([0.8,0.4,0.3],iter_n)))

Are Python's map and filter really so ridiculously fast or are my measurements wrong?

I do some speed tests for operations on vectors/lists. Suprisingly, map and filter seem to beat numpy by a factor of 5-10x. See the following short code sample with times given (Full code below.):
n = 10000000
a = np.random.rand(n)
b = np.random.rand(n)
c = a + b # time = 0.07 s
d = a[a < 0.3] # time = 0.09 s
a = [random.random() for x in range(0, n, 1)]
b = [random.random() for x in range(0, n, 1)]
c = map(lambda x, y: x + y, a, b) # time = 0.006s
d = filter(lambda e: e < 0.3, a) # time = 0.001s
Is it really possible that map and filter are that much faster than the numpy operations? Or are my measurements flawed? You can see the full code below:
import numpy as np
import time
import random
class StopWatch:
def __init__(self, str):
self.str = str
self.t = time.time()
def stop(self):
t = time.time()
print("time = " + str(t - self.t) + " s for " + self.str)
n = 10000000
a = np.random.rand(n)
b = np.random.rand(n)
sw = StopWatch('numpy')
c = a + b
sw.stop()
sw = StopWatch('numpy')
d = a[a < 0.3]
sw.stop()
a = [random.random() for x in range(0, n, 1)]
b = [random.random() for x in range(0, n, 1)]
sw = StopWatch('list')
c = map(lambda x, y: x + y, a, b)
sw.stop()
sw = StopWatch('list')
d = filter(lambda e: e < 0.3, a)
sw.stop()
If my measurements are correct, WHY is it that much faster?

My guess is that c = map(lambda x, y: x + y, a, b) is actually not calculated. In python 3, map and filter are evaluated lazy, and therefore not before they have to be evaluated.
You can verify this by adding a list(c) before you stop the timer, though this might affect the time a little more for the list creation.

Fast math operations on an array in python

I have a fairly simple math operation I'd like to perform on a array. Let me write out the example:
A = numpy.ndarray((255, 255, 3), dtype=numpy.single)
# ..
for i in range(A.shape[0]):
for j in range(A.shape[1]):
x = simple_func1(i)
y = simple_func2(j)
A[i, j] = (alpha * x * y + beta * x**2 + gamma * y**2, 1, 0)
So basically, there's a mapping between (i, j) and the 3 values of that value (this is for visualization).
I'd like to roll this up and somehow vectorize this, but I'm not sure how to or if I can. Thanks.

Here is the vectorized version:
i = arange(255)
j = arange(255)
x = simple_func1(i)
y = simple_func2(j)
y = y.reshape(-1,1)
A = alpha * x * y + beta * x**2 + gamma * y**2 # broadcasting is your friend here
If you want to fill the last coordinates with 1 and 0:
B = empty(A.shape+(3,))
B[:,:,0] = A
B[:,:,1] = 1 # broadcasting again
B[:,:,2] = 0

You have to change simple_funcN so that they take arrays as input, and create arrays as output. After that, you could look into the numpy.meshgrid() or the cartesian() function here to build coordinate arrays. After that, you should be able to use the coordinate array(s) to fill A with a one-liner.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Speed up looping over three axis' in NumPy - python

Related

starting Summation value at i=2

Replacing multiprocessing pool.map with mpi4py

Numpy arange error with Lagrange Multiplier in Python

Are Python's map and filter really so ridiculously fast or are my measurements wrong?

Fast math operations on an array in python

Categories

Resources