Speed Up python code (multiple integral calculations) - python

Is there a way to speed up this code:
import mpmath as mp
import numpy as np
from time import time as epochTime
def func(E):
f = lambda theta: mp.sin(theta) * mp.exp(E * (mp.cos(theta**2) + \
mp.cos(theta)**2))
return f
start = epochTime()
mp.mp.dps = 15
mp.mp.pretty = True
E = np.linspace(0, 10, 200)
ints = [mp.quadgl(func(e), [0, mp.pi]) for e in E] # Main Job
print ('Took:{:.3}s'.format(epochTime() - start))

Running your code, I timed it to 5.84s
using Memoize and simplifying expressions:
cos = Memoize(mp.cos)
sin = Memoize(mp.sin)
def func(E):
def f(t):
cost = cos(t)
return sin(t) * mp.exp(E * (cos(t*t) + cost*cost))
return f
I got it down to 3.25s first time, and ~2.8s in the next iterations.
(An even better approach might be using lru_cache from the standard library, but I did not try to time it).
If you are running similar code many times, it may be sensible to Memoize() both func and f, so the computations become trivial ( ~0.364s ).
Replacing mp with math for cos/sin/exp, I got down to ~1.3s, and now memoizing make the performance worse, for some reason (~1.5s, I guess the lookup time became dominant).

In general, you want to avoid calls to transcendent functions like sin, cos, exp, ln as much as possible, especially in a "hot" function like an integrand.
Replace x**2 by x*x (often x**2 calls a generic=slow exponentiation function)
use variables for "expensive" intermediate terms which are used more than once
transform your equation to reduce or eliminate transcendent functions
special-case for typical parameter values. Integer exponents are a frequent candidate.
precompute everything that is constant, espc. in parameterized functions
For the particular example you can substitute z=cos(theta). It is dz = -sin(theta)dtheta. Your integrand becomes
-exp(E*(z^2 + cos(arccos(z)^2))
saving you some of the transcendent function calls. The boundaries [0, pi] become [1, -1]. Also avoid x**2, better use x*x.
Complete code:
import mpmath as mp
import numpy as np
from time import time as epochTime
def func(E):
def f(z):
acz = mp.acos(z)
return -mp.exp(E * (mp.cos(acz*acz) + z*z))
return f
start = epochTime()
mp.mp.dps = 15
mp.mp.pretty = True
E = np.linspace(0, 10, 200)
ints = [mp.quadgl(func(e), [1.0, -1.0]) for e in E] # Main Job
print ('Took:{:.3}s'.format(epochTime() - start))

Related

Best way to take a function (lambda) as input, and also have the variable within the computation itself?

I have a function that should compute an integral, taking in some function as input. I'd like the code to compute a definite integral of: <some function, in terms of x. e.g., 3*x or 3*x*(1-x), etc.> * np.sin(np.pi * x)). I'm using scipy for this:
import scipy.integrate as integrate
def calculate(a):
test = integrate.quad(a*np.sin(np.pi * x), 0, 1)
return test
a = lambda x: 3*x
calculate(a)
Now this implementation will fail because of the discrepancy between a and x. I tried defining x as x = lambda x: x, but that won't work because I get an error of multiplying a float by a function.
Any suggestions?
Since you are trying to combine two symbolic expressions before computing the definite integral numerically, I think this might be a good application for sympy's symbolic manipulation tools.
from sympy import symbols, Integral, sin, pi
def calculate(a_exp):
test = Integral(a_exp * sin(pi * x), (x, 0, 1)).evalf()
return test
x = symbols('x')
a_exp = 3*x
print(calculate(a_exp))
# 0.954929658551372
Note: I changed the name of a to a_exp to make it clear that this is an expression rather than a function.
If you decide to use sympy then note that you might also be able to compute the expression for the integral symbolically as well.
Update: Importing Sympy might be overkill for this
If computation speed is more important than precision, you can easily calculate the integral approximately using some simple discretized method.
For example, the functions below calculate the integral approximately with increasingly sophisticated methods. The accuracy of the first two will improve as n is increased and also depends on the nature of a_func etc.
import numpy as np
from scipy.integrate import trapz, quad
def calculate2(a_func, n=100):
dx = 1/n
x = np.linspace(0, 1-dx, n)
y = a_func(x) * np.sin(np.pi*x)
return np.sum(y) * dx
def calculate3(a_func, n=100):
x = np.linspace(0, 1, n+1)
y = a_func(x) * np.sin(np.pi*x)
return trapz(y, x)
def calculate4(a_func):
f = lambda x: a_func(x) * np.sin(np.pi*x)
return quad(f, 0, 1)
a_func = lambda x: 3*x
print(calculate2(a_func))
# 0.9548511174430737
print(calculate3(a_func))
# 0.9548511174430737
print(calculate4(a_func)[0])
# 0.954929658551372
I'm not an expert on numerical integration so there may be better ways to do this than these.

Using Scipy routines with Numba

I was writing a program in which Scipy CubicSpline routine is used at certain points,
because of the use of Scipy routine I cannot use Numba #jit on my whole program.
I recently came across the #overload feature and I was wondering if it could be used in this way,
from numba.extending import overload
from numba import jit
from scipy.interpolate import CubicSpline
import numpy as np
x = np.arange(10)
y = np.sin(x)
xs = np.arange(-0.5, 9.6, 0.1)
def Spline_interp(xs,x,y):
cs = CubicSpline(x, y)
ds = cs(xs)
return ds
#overload(Spline_interp)
def jit_Spline_interp(xs,x,y):
ds = Spline_interp(xs,x,y)
def jit_Spline_interp_impl(xs,x, y):
return ds
return jit_Spline_interp_impl
#jit(nopython=True)
def main():
# other codes compatible with #njit
ds = Spline_interp(xs,x,y)
# other codes compatible with #njit
return ds
print(main())
kindly correct me if my understanding of the #overload feature is wrong and what is the possible solution for using such Scipy libraries with Numba.
Wrapping compiled functions using ctypes (Numba)
Especially for more complex functions, reimplementing everything in numba-compileable Python code can be quite a lot of work, and sometimes slower. The following answer will be on calling C-like functions directly from a shared object or dynamic library.
Compiling the fortran routines
This example will show a way to do this on windows, but it should be straight forward on other os. For a portable interface defining an ISO_C_BINDING is highly recommendable. In this answer I will try it without an interface.
dll.def
EXPORTS
SPLEV #1
Compilation
ifort /dll dll.def splev.f fpbspl.f /O3 /fast
Calling this function directly from Numba
Have a look what is expected by the Fortran routine
Check every input in the wrapper (datatype, contiguousness). You just provide some pointers to the fortran function. There is no additional safety check.
Wrapper
The following code shows two ways on how to call this functions. In Numba it isn't directly possible to pass scalar by reference. You can either allocate an array on the heap (slow for small functions), or use an intrinsic to use stack arrays.
import numba as nb
import numpy as np
import ctypes
lib = ctypes.cdll.LoadLibrary("splev.dll")
dble_p=ctypes.POINTER(ctypes.c_double)
int_p =ctypes.POINTER(ctypes.c_longlong)
SPLEV=lib.SPLEV
SPLEV.restype = ctypes.c_void_p
SPLEV.argtypes = (dble_p,int_p,dble_p,int_p,dble_p,dble_p,int_p,int_p,int_p)
from numba import types
from numba.extending import intrinsic
from numba.core import cgutils
#intrinsic
def val_to_ptr(typingctx, data):
def impl(context, builder, signature, args):
ptr = cgutils.alloca_once_value(builder,args[0])
return ptr
sig = types.CPointer(nb.typeof(data).instance_type)(nb.typeof(data).instance_type)
return sig, impl
#intrinsic
def ptr_to_val(typingctx, data):
def impl(context, builder, signature, args):
val = builder.load(args[0])
return val
sig = data.dtype(types.CPointer(data.dtype))
return sig, impl
#with intrinsics, temporary arrays are allocated on stack
#faster but much more relevant for functions with very low runtime
#nb.njit()
def splev_wrapped(x, coeff,e):
#There are just pointers passed to the fortran function.
#The arrays have to be contiguous!
t=np.ascontiguousarray(coeff[0])
x=np.ascontiguousarray(x)
c=coeff[1]
k=coeff[2]
y=np.empty(x.shape[0],dtype=np.float64)
n_arr=val_to_ptr(nb.int64(t.shape[0]))
k_arr=val_to_ptr(nb.int64(k))
m_arr=val_to_ptr(nb.int64(x.shape[0]))
e_arr=val_to_ptr(nb.int64(e))
ier_arr=val_to_ptr(nb.int64(0))
SPLEV(t.ctypes,n_arr,c.ctypes,k_arr,x.ctypes,
y.ctypes,m_arr,e_arr,ier_arr)
return y, ptr_to_val(ier_arr)
#without using intrinsics
#nb.njit()
def splev_wrapped_2(x, coeff,e):
#There are just pointers passed to the fortran function.
#The arrays have to be contiguous!
t=np.ascontiguousarray(coeff[0])
x=np.ascontiguousarray(x)
c=coeff[1]
k=coeff[2]
y=np.empty(x.shape[0],dtype=np.float64)
n_arr = np.empty(1, dtype=np.int64)
k_arr = np.empty(1, dtype=np.int64)
m_arr = np.empty(1, dtype=np.int64)
e_arr = np.empty(1, dtype=np.int64)
ier_arr = np.zeros(1, dtype=np.int64)
n_arr[0]=t.shape[0]
k_arr[0]=k
m_arr[0]=x.shape[0]
e_arr[0]=e
SPLEV(t.ctypes,n_arr.ctypes,c.ctypes,k_arr.ctypes,x.ctypes,
y.ctypes,m_arr.ctypes,e_arr.ctypes,ier_arr.ctypes)
return y, ier_arr[0]
You would either need to fallback to object-mode (locally, like #max9111 suggested), or implement the CubicSpline function yourself in Numba.
For as far as I understand, the overload decorator "only" makes the compiler aware that it can use a Numba-compatible implementation if it encounters the overloaded function. It doesn't magically convert the function the be Numba compatible.
There is a package which expose some Scipy functionality to Numba, but that seems early days and only contains some scipy.special functions so far.
https://github.com/numba/numba-scipy
This is a repost of my solution, posted on numba discourse https://numba.discourse.group/t/call-scipy-splev-routine-in-numba-jitted-function/1122/7.
I had originally gone ahead with #max9111 suggestion of using objmode. It gave a temporary fix. But, since the code was performance critical, I finally ended up writing a numba version of scipy's 'interpolate.splev' subroutine for the spline interpolation.
import numpy as np
import numba
from scipy import interpolate
import matplotlib.pyplot as plt
import time
# Custom wrap of scipy's splrep
def custom_splrep(x, y, k=3):
"""
Custom wrap of scipy's splrep for calculating spline coefficients,
which also check if the data is equispaced.
"""
# Check if x is equispaced
x_diff = np.diff(x)
equi_spaced = all(np.round(x_diff,5) == np.round(x_diff[0],5))
dx = x_diff[0]
# Calculate knots & coefficients (cubic spline by default)
t,c,k = interpolate.splrep(x,y, k=k)
return (t,c,k,equi_spaced,dx)
# Numba accelerated implementation of scipy's splev
#numba.njit(cache=True)
def numba_splev(x, coeff):
"""
Custom implementation of scipy's splev for spline interpolation,
with additional section for faster search of knot interval, if knots are equispaced.
Spline is extrapolated from the end spans for points not in the support.
"""
t,c,k, equi_spaced, dx = coeff
t0 = t[0]
n = t.size
m = x.size
k1 = k+1
k2 = k1+1
nk1 = n - k1
l = k1
l1 = l+1
y = np.zeros(m)
h = np.zeros(20)
hh = np.zeros(19)
for i in range(m):
# fetch a new x-value arg
arg = x[i]
# search for knot interval t[l] <= arg <= t[l+1]
if(equi_spaced):
l = int((arg-t0)/dx) + k
l = min(max(l, k1), nk1)
else:
while not ((arg >= t[l-1]) or (l1 == k2)):
l1 = l
l = l-1
while not ((arg < t[l1-1]) or (l == nk1)):
l = l1
l1 = l+1
# evaluate the non-zero b-splines at arg.
h[:] = 0.0
hh[:] = 0.0
h[0] = 1.0
for j in range(k):
for ll in range(j+1):
hh[ll] = h[ll]
h[0] = 0.0
for ll in range(j+1):
li = l + ll
lj = li - j - 1
if(t[li] != t[lj]):
f = hh[ll]/(t[li]-t[lj])
h[ll] += f*(t[li]-arg)
h[ll+1] = f*(arg-t[lj])
else:
h[ll+1] = 0.0
break
sp = 0.0
ll = l - 1 - k1
for j in range(k1):
ll += 1
sp += c[ll]*h[j]
y[i] = sp
return y
######################### Testing and comparison #############################
# Generate a data set for interpolation
x, dx = np.linspace(10,100,200, retstep=True)
y = np.sin(x)
# Calculate the cubic spline spline coeff's
coeff_1 = interpolate.splrep(x,y, k=3) # scipy's splrep
coeff_2 = custom_splrep(x,y, k=3) # Custom wrap of scipy's splrep
# Generate data for interpolation and randomize
x2 = np.linspace(0,110,10000)
np.random.shuffle(x2)
# Interpolate
y2 = interpolate.splev(x2, coeff_1) # scipy's splev
y3 = numba_splev(x2, coeff_2) # Numba accelerated implementation of scipy's splev
# Plot data
plt.plot(x,y,'--', linewidth=1.0,color='green', label='data')
plt.plot(x2,y2,'o',color='blue', markersize=2.0, label='scipy splev')
plt.plot(x2,y3,'.',color='red', markersize=1.0, label='numba splev')
plt.legend()
plt.show()
print("\nTime for random interpolations")
# Calculation time evaluation for scipy splev
t1 = time.time()
for n in range(0,10000):
y2 = interpolate.splev(x2, coeff_1)
print("scipy splev", time.time() - t1)
# Calculation time evaluation for numba splev
t1 = time.time()
for n in range(0,10000):
y2 = numba_splev(x2, coeff_2)
print("numba splev",time.time() - t1)
print("\nTime for non random interpolations")
# Generate data for interpolation without randomize
x2 = np.linspace(0,110,10000)
# Calculation time evaluation for scipy splev
t1 = time.time()
for n in range(0,10000):
y2 = interpolate.splev(x2, coeff_1)
print("scipy splev", time.time() - t1)
# Calculation time evaluation for numba splev
t1 = time.time()
for n in range(0,10000):
y2 = numba_splev(x2, coeff_2)
print("numba splev",time.time() - t1)
The above code is optimised for faster knot search if the knots are equispaced.
On my corei7 machine, if the interpolation is done at random values, numba version is faster,
Scipy’s splev = 0.896s
Numba splev = 0.375s
If the interpolation is not done at random values scipy’s version is faster,
Scipy’s splev = 0.281s
Numba splev = 0.375s
Ref : https://github.com/scipy/scipy/tree/v1.7.1/scipy/interpolate/fitpack ,
https://github.com/dbstein/fast_splines

how to optimize the minimization of a vector function in python?

I have an issue: I'm trying to find the minimum of a function which depends on several parameters that I'd like to change as well. let's take as a simplified example:
import numpy as np
import scipy.optimize as opt
def f(x, a, b, c):
f = a * x**2 + b * x + c
return f
I'd like to find the x which minimizes the function for different set of values of a, b, c, let's say for
a = [-1, 0, 1]
b = [0, 1, 2]
c = [0, 1]
ATM I have three nested loops and a minimization:
for p1 in a:
for p2 in b:
for p3 in c:
y = opt.minimize(f, x0=[0, ], args=(p1, p2, p3, ))
print(y)
which is really slow for the calculation I'm doing, but I haven't found any better so far. So, does anyone know a way or a package that would allow me to improve the efficiency?
You could use a combination of different techniques to improve the efficiency of your script:
Use itertools.product to generate every possible combination in the list a, b, c
Use multiprocessingto execute the minimizations in parallel.
Other than this, i can't think of a way to optimize the efficiency of the code. As was pointed out in the comment, the constant value c has no influence on the minimization. But i'm sure the quadratic function is just an example.
I took the code of the multiprocessing part from here.
Here's the working code.
import numpy as np
import scipy.optimize as opt
import itertools
from multiprocessing import Pool
def f(x, a, b, c):
f = a * x**2 + b * x + c
return f
def mini(args):
res = opt.minimize(f, x0=np.array([0]), args=args)
return res.x
if __name__=="__main__":
a = np.linspace(-1,2,100)
b = np.linspace(0,2,100)
c = [0, 1]
args = list(itertools.product(a,b,c))
print("Number of combos:" + str(len(args)))
p = Pool(4)
import time
t0 = time.time()
res = p.map(mini, args)
print(time.time()-t0)
Even these 20000 combinations only need 5,28 seconds on my average laptop.
scipy.optimize.newton can do this.

Multiprocessing python function for numerical calculations

Hoping to get some help here with parallelising my python code, I've been struggling with it for a while and come up with several errors in whichever way I try, currently running the code will take about 2-3 hours to complete, The code is given below;
import numpy as np
from scipy.constants import Boltzmann, elementary_charge as kb, e
import multiprocessing
from functools import partial
Tc = 9.2
x = []
g= []
def Delta(T):
'''
Delta(T) takes a temperature as an input and calculates a
temperature dependent variable based on Tc which is defined as a
global parameter
'''
d0 = (pi/1.78)*kb*Tc
D0 = d0*(np.sqrt(1-(T**2/Tc**2)))
return D0
def element_in_sum(T, n, phi):
D = Delta(T)
matsubara_frequency = (np.pi * kb * T) * (2*n + 1)
factor_d = np.sqrt((D**2 * cos(phi/2)**2) + matsubara_frequency**2)
element = ((2 * D * np.cos(phi/2))/ factor_d) * np.arctan((D * np.sin(phi/2))/factor_d)
return element
def sum_elements(T, M, phi):
'''
sum_elements(T,M,phi) is the most computationally heavy part
of the calculations, the larger the M value the more accurate the
results are.
T: temperature
M: number of steps for matrix calculation the larger the more accurate the calculation
phi: The phase of the system can be between 0- pi
'''
X = list(np.arange(0,M,1))
Y = [element_in_sum(T, n, phi) for n in X]
return sum(Y)
def KO_1(M, T, phi):
Iko1Rn = (2 * np.pi * kb * T /e) * sum_elements(T, M, phi)
return Iko1Rn
def main():
for j in range(1, 92):
T = 0.1*j
for i in range(1, 314):
phi = 0.01*i
pool = multiprocessing.Pool()
result = pool.apply_async(KO_1,args=(26000, T, phi,))
g.append(result)
pool.close()
pool.join()
A = max(g);
x.append(A)
del g[:]
My approach was to try and send the KO1 function into a multiprocessing pool but I either get a Pickling error or a too many files open, Any help is greatly appreciated, and if multiprocessing is the wrong approach I would love any guide.
I haven't tested your code, but you can do several things to improve it.
First of all, don't create arrays unnecessarily. sum_elements creates three array-like objects when it can use just one generator. First, np.arange creates a numpy array, then the list function creates a list object and and then the list comprehension creates another list. The function does 4 times the work it should.
The correct way to implement it (in python3) would be:
def sum_elements(T, M, phi):
return sum(element_in_sum(T, n, phi) for n in range(0, M, 1))
If you use python2, replace range with xrange.
This tip will probably help you in any python script you'll write.
Also, try to utilize multiprocessing better. It seems what you need to do is to create a multiprocessing.Pool object once, and use the pool.map function.
The main function should look like this:
def job(args):
i, j = args
T = 0.1*j
phi = 0.01*i
return K0_1(26000, T, phi)
def main():
pool = multiprocessing.Pool(processes=4) # You can change this number
x = [max(pool.imap(job, ((i, j) for i in range(1, 314)) for j in range(1, 92)]
Notice that I used a tuple in order to pass multiple arguments to job.
This is not an answer to the question, but if I may, I would propose how to speed up the code using simple numpy array operations. Have a look at the following code:
import numpy as np
from scipy.constants import Boltzmann, elementary_charge as kb, e
import time
Tc = 9.2
RAM = 4*1024**2 # 4GB
def Delta(T):
'''
Delta(T) takes a temperature as an input and calculates a
temperature dependent variable based on Tc which is defined as a
global parameter
'''
d0 = (np.pi/1.78)*kb*Tc
D0 = d0*(np.sqrt(1-(T**2/Tc**2)))
return D0
def element_in_sum(T, n, phi):
D = Delta(T)
matsubara_frequency = (np.pi * kb * T) * (2*n + 1)
factor_d = np.sqrt((D**2 * np.cos(phi/2)**2) + matsubara_frequency**2)
element = ((2 * D * np.cos(phi/2))/ factor_d) * np.arctan((D * np.sin(phi/2))/factor_d)
return element
def KO_1(M, T, phi):
X = np.arange(M)[:,np.newaxis,np.newaxis]
sizeX = int((float(RAM) / sum(T.shape))/sum(phi.shape)/8) #8byte
i0 = 0
Iko1Rn = 0. * T * phi
while (i0+sizeX) <= M:
print "X = %i"%i0
indices = slice(i0, i0+sizeX)
Iko1Rn += (2 * np.pi * kb * T /e) * element_in_sum(T, X[indices], phi).sum(0)
i0 += sizeX
return Iko1Rn
def main():
T = np.arange(0.1,9.2,0.1)[:,np.newaxis]
phi = np.linspace(0,np.pi, 361)
M = 26000
result = KO_1(M, T, phi)
return result, result.max()
T0 = time.time()
r, rmax = main()
print time.time() - T0
It runs a bit more than 20sec on my PC. One has to be careful not to use too much memory, that is why there is still a loop with a bit complicated construction to use only pieces of X. If enough memory is present, then it is not necessary.
One should also note that this is just the first step of speeding up. Much improvement could be reached still using e.g. just in time compilation or cython.

How to use JIT in python with mpmath / gmpy effectively?

This is my first attempt to use JIT for python and this is the use case I want to speed up. I read a bit about numba and it seemed simple enough but the following code didn't provide any speedup. Please excuse any obvious mistakes I may be making.
I also tried to do what the basic tutorial of cython suggests but again no difference in time.
http://docs.cython.org/src/tutorial/cython_tutorial.html
I'm guessing I have to do something like declare variables? Use other libraries? Use for loops exclusively for everything? I'd appreciate any guidance or examples I can refer to.
For example I know from a previous question Elementwise operations in mpmath slow compared to numpy and its solution that using gmpy instead of mpmath was significantly faster.
import numpy as np
from scipy.special import eval_genlaguerre
from sympy import mpmath as mp
from sympy.mpmath import laguerre as genlag2
import collections
from numba import jit
import time
def len2(x):
return len(x) if isinstance(x, collections.Sized) else 1
#jit # <-- removing this doesn't change the output time if anything it's slower with this
def laguerre(a, b, x):
fun = np.vectorize(genlag2)
return fun(a, b, x)
def f1( a, b, c ):
t = time.time()
M = np.ones( [ len2(a), len2(b), len2(c) ] )
A, B, C = np.meshgrid( a, b, c, indexing = 'ij' )
temp = laguerre(A, B, C)
M *= temp
print 'part1: ', time.time() - t
t = time.time()
A, B = np.meshgrid( a, b, indexing= 'ij' )
temp = np.array( [[ mp.fac(x1)/mp.fac(y1) for x1,y1 in zip(x2,y2)] for x2,y2 in zip(A, B)] )
temp = np.reshape( temp, [ len(a), len(b), 1 ] )
temp = np.repeat( temp, len(c), axis = 2 )
print 'part2 so far:', time.time() - t
M *= temp
print 'part2 finally', time.time() - t
t = time.time()
a = mp.arange( 30 )
b = mp.arange( 10 )
c = mp.linspace( 0, 100, 100 )
M = f1( a, b, c)
Better to use numba with vectorize with self-defined decorators, if not defined lazy action will execute, which may lead to slow down the process.
Jit is slow compared to vectorize in my opinion.

Categories

Resources