I was writing a program in which Scipy CubicSpline routine is used at certain points,
because of the use of Scipy routine I cannot use Numba #jit on my whole program.
I recently came across the #overload feature and I was wondering if it could be used in this way,
from numba.extending import overload
from numba import jit
from scipy.interpolate import CubicSpline
import numpy as np
x = np.arange(10)
y = np.sin(x)
xs = np.arange(-0.5, 9.6, 0.1)
def Spline_interp(xs,x,y):
cs = CubicSpline(x, y)
ds = cs(xs)
return ds
#overload(Spline_interp)
def jit_Spline_interp(xs,x,y):
ds = Spline_interp(xs,x,y)
def jit_Spline_interp_impl(xs,x, y):
return ds
return jit_Spline_interp_impl
#jit(nopython=True)
def main():
# other codes compatible with #njit
ds = Spline_interp(xs,x,y)
# other codes compatible with #njit
return ds
print(main())
kindly correct me if my understanding of the #overload feature is wrong and what is the possible solution for using such Scipy libraries with Numba.
Wrapping compiled functions using ctypes (Numba)
Especially for more complex functions, reimplementing everything in numba-compileable Python code can be quite a lot of work, and sometimes slower. The following answer will be on calling C-like functions directly from a shared object or dynamic library.
Compiling the fortran routines
This example will show a way to do this on windows, but it should be straight forward on other os. For a portable interface defining an ISO_C_BINDING is highly recommendable. In this answer I will try it without an interface.
dll.def
EXPORTS
SPLEV #1
Compilation
ifort /dll dll.def splev.f fpbspl.f /O3 /fast
Calling this function directly from Numba
Have a look what is expected by the Fortran routine
Check every input in the wrapper (datatype, contiguousness). You just provide some pointers to the fortran function. There is no additional safety check.
Wrapper
The following code shows two ways on how to call this functions. In Numba it isn't directly possible to pass scalar by reference. You can either allocate an array on the heap (slow for small functions), or use an intrinsic to use stack arrays.
import numba as nb
import numpy as np
import ctypes
lib = ctypes.cdll.LoadLibrary("splev.dll")
dble_p=ctypes.POINTER(ctypes.c_double)
int_p =ctypes.POINTER(ctypes.c_longlong)
SPLEV=lib.SPLEV
SPLEV.restype = ctypes.c_void_p
SPLEV.argtypes = (dble_p,int_p,dble_p,int_p,dble_p,dble_p,int_p,int_p,int_p)
from numba import types
from numba.extending import intrinsic
from numba.core import cgutils
#intrinsic
def val_to_ptr(typingctx, data):
def impl(context, builder, signature, args):
ptr = cgutils.alloca_once_value(builder,args[0])
return ptr
sig = types.CPointer(nb.typeof(data).instance_type)(nb.typeof(data).instance_type)
return sig, impl
#intrinsic
def ptr_to_val(typingctx, data):
def impl(context, builder, signature, args):
val = builder.load(args[0])
return val
sig = data.dtype(types.CPointer(data.dtype))
return sig, impl
#with intrinsics, temporary arrays are allocated on stack
#faster but much more relevant for functions with very low runtime
#nb.njit()
def splev_wrapped(x, coeff,e):
#There are just pointers passed to the fortran function.
#The arrays have to be contiguous!
t=np.ascontiguousarray(coeff[0])
x=np.ascontiguousarray(x)
c=coeff[1]
k=coeff[2]
y=np.empty(x.shape[0],dtype=np.float64)
n_arr=val_to_ptr(nb.int64(t.shape[0]))
k_arr=val_to_ptr(nb.int64(k))
m_arr=val_to_ptr(nb.int64(x.shape[0]))
e_arr=val_to_ptr(nb.int64(e))
ier_arr=val_to_ptr(nb.int64(0))
SPLEV(t.ctypes,n_arr,c.ctypes,k_arr,x.ctypes,
y.ctypes,m_arr,e_arr,ier_arr)
return y, ptr_to_val(ier_arr)
#without using intrinsics
#nb.njit()
def splev_wrapped_2(x, coeff,e):
#There are just pointers passed to the fortran function.
#The arrays have to be contiguous!
t=np.ascontiguousarray(coeff[0])
x=np.ascontiguousarray(x)
c=coeff[1]
k=coeff[2]
y=np.empty(x.shape[0],dtype=np.float64)
n_arr = np.empty(1, dtype=np.int64)
k_arr = np.empty(1, dtype=np.int64)
m_arr = np.empty(1, dtype=np.int64)
e_arr = np.empty(1, dtype=np.int64)
ier_arr = np.zeros(1, dtype=np.int64)
n_arr[0]=t.shape[0]
k_arr[0]=k
m_arr[0]=x.shape[0]
e_arr[0]=e
SPLEV(t.ctypes,n_arr.ctypes,c.ctypes,k_arr.ctypes,x.ctypes,
y.ctypes,m_arr.ctypes,e_arr.ctypes,ier_arr.ctypes)
return y, ier_arr[0]
You would either need to fallback to object-mode (locally, like #max9111 suggested), or implement the CubicSpline function yourself in Numba.
For as far as I understand, the overload decorator "only" makes the compiler aware that it can use a Numba-compatible implementation if it encounters the overloaded function. It doesn't magically convert the function the be Numba compatible.
There is a package which expose some Scipy functionality to Numba, but that seems early days and only contains some scipy.special functions so far.
https://github.com/numba/numba-scipy
This is a repost of my solution, posted on numba discourse https://numba.discourse.group/t/call-scipy-splev-routine-in-numba-jitted-function/1122/7.
I had originally gone ahead with #max9111 suggestion of using objmode. It gave a temporary fix. But, since the code was performance critical, I finally ended up writing a numba version of scipy's 'interpolate.splev' subroutine for the spline interpolation.
import numpy as np
import numba
from scipy import interpolate
import matplotlib.pyplot as plt
import time
# Custom wrap of scipy's splrep
def custom_splrep(x, y, k=3):
"""
Custom wrap of scipy's splrep for calculating spline coefficients,
which also check if the data is equispaced.
"""
# Check if x is equispaced
x_diff = np.diff(x)
equi_spaced = all(np.round(x_diff,5) == np.round(x_diff[0],5))
dx = x_diff[0]
# Calculate knots & coefficients (cubic spline by default)
t,c,k = interpolate.splrep(x,y, k=k)
return (t,c,k,equi_spaced,dx)
# Numba accelerated implementation of scipy's splev
#numba.njit(cache=True)
def numba_splev(x, coeff):
"""
Custom implementation of scipy's splev for spline interpolation,
with additional section for faster search of knot interval, if knots are equispaced.
Spline is extrapolated from the end spans for points not in the support.
"""
t,c,k, equi_spaced, dx = coeff
t0 = t[0]
n = t.size
m = x.size
k1 = k+1
k2 = k1+1
nk1 = n - k1
l = k1
l1 = l+1
y = np.zeros(m)
h = np.zeros(20)
hh = np.zeros(19)
for i in range(m):
# fetch a new x-value arg
arg = x[i]
# search for knot interval t[l] <= arg <= t[l+1]
if(equi_spaced):
l = int((arg-t0)/dx) + k
l = min(max(l, k1), nk1)
else:
while not ((arg >= t[l-1]) or (l1 == k2)):
l1 = l
l = l-1
while not ((arg < t[l1-1]) or (l == nk1)):
l = l1
l1 = l+1
# evaluate the non-zero b-splines at arg.
h[:] = 0.0
hh[:] = 0.0
h[0] = 1.0
for j in range(k):
for ll in range(j+1):
hh[ll] = h[ll]
h[0] = 0.0
for ll in range(j+1):
li = l + ll
lj = li - j - 1
if(t[li] != t[lj]):
f = hh[ll]/(t[li]-t[lj])
h[ll] += f*(t[li]-arg)
h[ll+1] = f*(arg-t[lj])
else:
h[ll+1] = 0.0
break
sp = 0.0
ll = l - 1 - k1
for j in range(k1):
ll += 1
sp += c[ll]*h[j]
y[i] = sp
return y
######################### Testing and comparison #############################
# Generate a data set for interpolation
x, dx = np.linspace(10,100,200, retstep=True)
y = np.sin(x)
# Calculate the cubic spline spline coeff's
coeff_1 = interpolate.splrep(x,y, k=3) # scipy's splrep
coeff_2 = custom_splrep(x,y, k=3) # Custom wrap of scipy's splrep
# Generate data for interpolation and randomize
x2 = np.linspace(0,110,10000)
np.random.shuffle(x2)
# Interpolate
y2 = interpolate.splev(x2, coeff_1) # scipy's splev
y3 = numba_splev(x2, coeff_2) # Numba accelerated implementation of scipy's splev
# Plot data
plt.plot(x,y,'--', linewidth=1.0,color='green', label='data')
plt.plot(x2,y2,'o',color='blue', markersize=2.0, label='scipy splev')
plt.plot(x2,y3,'.',color='red', markersize=1.0, label='numba splev')
plt.legend()
plt.show()
print("\nTime for random interpolations")
# Calculation time evaluation for scipy splev
t1 = time.time()
for n in range(0,10000):
y2 = interpolate.splev(x2, coeff_1)
print("scipy splev", time.time() - t1)
# Calculation time evaluation for numba splev
t1 = time.time()
for n in range(0,10000):
y2 = numba_splev(x2, coeff_2)
print("numba splev",time.time() - t1)
print("\nTime for non random interpolations")
# Generate data for interpolation without randomize
x2 = np.linspace(0,110,10000)
# Calculation time evaluation for scipy splev
t1 = time.time()
for n in range(0,10000):
y2 = interpolate.splev(x2, coeff_1)
print("scipy splev", time.time() - t1)
# Calculation time evaluation for numba splev
t1 = time.time()
for n in range(0,10000):
y2 = numba_splev(x2, coeff_2)
print("numba splev",time.time() - t1)
The above code is optimised for faster knot search if the knots are equispaced.
On my corei7 machine, if the interpolation is done at random values, numba version is faster,
Scipy’s splev = 0.896s
Numba splev = 0.375s
If the interpolation is not done at random values scipy’s version is faster,
Scipy’s splev = 0.281s
Numba splev = 0.375s
Ref : https://github.com/scipy/scipy/tree/v1.7.1/scipy/interpolate/fitpack ,
https://github.com/dbstein/fast_splines
Related
I thought of checking the time difference for a matrix multiplication using numpy matrix multiplication vs normal for loop method. I understand numpy will be faster because of vectorization, but I couldn't prove it using a simple code like below.
I am getting python normal for loop is faster than numpy in all of my test. What did I do wrong here?
My code:
# testing numpy vs normal method
def compare_np_vs_normal(datapoints):
x = []
for i in range(datapoints):
x.append(math.ceil(random.random() * 10)) # random data
# linear activation function
m = math.ceil(random.random() * 10) # Random value for slope
c = math.ceil(random.random() * 10) # Random value for intercept
# linear activation result for all datapoints using normal method and np methods
def normal_method():
y = []
for x_ in x:
y_ = x_ * m + c
y.append(y_)
return y
def np_method():
x_ = np.c_[np.array(x),np.ones(len(x))]
a_ = np.array([[m],[c]])
return np.matmul(x_,a_)
print("Comparing for {} datapoints".format(datapoints))
print("Normal method:")
t1 = time.perf_counter()
y_result_normal = normal_method()
t2 = time.perf_counter()
print("Time taken {}".format(t2-t1))
print("Numpy method:")
t1 = time.perf_counter()
y_result_np = np_method()
t2 = time.perf_counter()
print("Time taken {}".format(t2-t1))
return y_result_normal, y_result_np
The result I get was
Comparing for 1000 datapoints
Normal method:
Time taken 7.759999971312936e-05
Numpy method:
Time taken 0.0007053999997879146
You are making an overly complicated calculation in the np_method function.
replace it with:
def np_method():
x_ = np.array(x)
return x_ * m + c
to see the improvement
I'm working on an algorithm (for some sort of PCA) and I test its performance according to a metric. It is evaluated by Monte Carlo i.e. averaging over several samples for a choice of parameter.
Initiality, my code consists of simple loops for parameters and samples with Numpy functions for the algorithm.
I heard about the package Numba to speeed up my code. I found out that I get a 5x improvement in speed only rewriting the Numpy part to fit Numba standarts and adding the #njit decorator.
However, I tested the same improved version on another PC (both are laptops with Intel i7's) and found out that I get no speed improvement on this second PC.
I am dumbfounded about why this could be the case.
edit:
More details about the two PCs.
PC 1:
Python 3.7.6
Numba 0.48.0
Intel Core i7-8565U CPU # 1.80GHz (4 cores)
RAM 8 Go
PC 2:
Python 3.7.4
Numba 0.49.0
Intel Core i7-8750H CPU # 2.20GHz (6 cores)
RAM 16,0 Go
And here is a minimal reproducible example.
import numpy as np
from numba import jit
import matplotlib.pyplot as plt
import time
#jit(nopython=True)
def multivariate_normal(m,C):
L = np.linalg.cholesky(C)
X = np.random.randn(m.shape[0])
return(L # X + m)
#jit(nopython=True)
def gendata(p, d, k, n, sig_s):
H = np.random.randn(d,k)
u, s, vh = np.linalg.svd(H, full_matrices=False)
H = u # vh
rest = np.zeros((p-d, k))
U0 = np.concatenate((H,rest),axis=0)
X = np.empty((p,n))
for i in range(n):
b = multivariate_normal(np.zeros(p),np.identity(p))
s = multivariate_normal(np.zeros(k),(sig_s**2) * np.identity(k))
x = U0 # s + b
X[:,i] = x
return((X,U0))
#jit(nopython=True)
def geomspace(a,b,n):
seq = np.linspace(np.log(a),np.log(b),n)
return(np.exp(seq))
#jit(nopython=True)
def Afe(U,U0):
'''
Average Fraction of Energy of subspace U based on original subspace U0.
'''
Oorth,r = np.linalg.qr(U)
afe = np.trace(Oorth.T # U0 # U0.T # Oorth) / np.trace(U0 # U0.T)
return(afe)
#jit(nopython=True)
def sim2a():
p = 50
d = int(0.99 * p)
k = 5
sig_s = np.sqrt(10)
ns = geomspace(3*k,3*p,10).astype(np.int32)
echs = 50
mc_afe_pca = np.empty(ns.shape[0])
mc_afe_sphpca = np.empty(ns.shape[0])
for n in range(ns.shape[0]):
res_pca = np.empty(echs)
res_sphpca = np.empty(echs)
for e in range(echs):
#print(n,e)
X,U0 = gendata(p,d,k,ns[n],sig_s)
sX = np.diag(X.T # X)
#PCA
u, sig, vh = np.linalg.svd(X.T, full_matrices=False)
Upca = vh.T[:,:k]
res_pca[e]= Afe(Upca,U0)
#Spherical PCA
Xn = X / np.sqrt(sX)
u, sig, vh = np.linalg.svd(Xn.T, full_matrices=False)
Usph = vh.T[:,:k]
res_sphpca[e] = Afe(Usph,U0)
mc_afe_pca[n] = np.mean(res_pca)
mc_afe_sphpca[n] = np.mean(res_sphpca)
return((mc_afe_pca,mc_afe_sphpca))
#first run for compilation before measurements
mc_afe_pca,mc_afe_sphpca = sim2a()
#Graph
p = 50
k = 5
ns = geomspace(3*k,3*p,10).astype(np.int32)
plt.figure(figsize=(8,4))
plt.plot(ns,mc_afe_pca,'+-',label="PCA")
plt.plot(ns,mc_afe_sphpca,'x-',label="Spherical PCA")
plt.xlabel("n")
plt.ylabel("AFE")
plt.legend()
plt.plot()
#Measure
def sim2atimer():
start = time.time()
a,b= sim2a()
end = time.time()
print(end - start)
sim2atimer()
#Now, replace #jit by ##jit and reexecute the code to compare.
# The option Parallel=True was erased from #jit()
In fact, while working on this example, I had the idea to erase the Parallel=True option in the #jit decorator, which prompted a warning that it was not used, and that fixed the discrepancy problem.
Now, I still don't know why only one PC was affected.
I am trying to fit below mentioned two equations using python leastsq method but am not sure whether this is the right approach. First equation has incomplete gamma function in it while the second one is slightly complex, and along with an exponential function contains a term which is obtained by using a separate fitting formula.
J_mg = T_incomplete(hw/T_mag)
J_nmg = e^(-hw/T)*g(w,T)
Here g is a function of w and T and is calucated using a given fitting formula.
I am following the steps outlined in this question.
Here is what I have done
import numpy as np
from scipy.optimize import leastsq
from scipy.special import gammaincc
from scipy.special import gamma
from matplotlib.pyplot import plot
# generating data
NPTS = 10
hw = np.linspace(0.5, 10, NPTS)
j1 = np.linspace(0.001,10,NPTS)
j2 = np.linspace(0.003,10,NPTS)
T_mag = np.linspace(0.3,0.5,NPTS)
#defining functions
def calc_gaunt_factor(hw,T):
fitting_coeff= np.loadtxt('fitting_coeff.txt', skiprows=1)
#T is in KeV
#K_b = 8.6173303(50)e−5 ev/K
g = 0
gamma = 0.0136/T
theta= hw/T
A= (np.log10(gamma**2) +0.5)*0.4
B= (np.log10(theta)+1.5)*0.4
for i in range(11):
for j in range(11):
g_ij = fitting_coeff[i][j]*(A**i)*(B**j)
g = g_ij+g
return g
def j_w_mag(hw,T_mag):
order= 0.001
return np.sqrt(1/T_mag)*gamma(order)*gammaincc(order,hw/T_mag)
def j_w_nonmag(hw,T):
gamma = 0.0136/T
theta= hw/T
return np.sqrt(1/T)*np.exp((-hw)/T)*calc_gaunt_factor(hw,T)
def residual_func(T,T_mag,hw,j1,j2):
err_unmag = np.nan_to_num(j1 - j_w_nonmag(hw,T))
err_mag = np.nan_to_num(j2 - j_w_mag(hw,T_mag))
err= np.concatenate((err_unmag, err_mag))
return err
par_init = np.array([.35])
best, cov, info, message, ler = leastsq(residual_func,par_init,args=(T_mag,hw,j1,j2),full_output=True)
print("Best-Fit Parameters:")
print("T=%s" %(best[0]))
I am getting weird value for my fitting parameter, T. Is this the right approach? Thanks.
Is there a way to speed up this code:
import mpmath as mp
import numpy as np
from time import time as epochTime
def func(E):
f = lambda theta: mp.sin(theta) * mp.exp(E * (mp.cos(theta**2) + \
mp.cos(theta)**2))
return f
start = epochTime()
mp.mp.dps = 15
mp.mp.pretty = True
E = np.linspace(0, 10, 200)
ints = [mp.quadgl(func(e), [0, mp.pi]) for e in E] # Main Job
print ('Took:{:.3}s'.format(epochTime() - start))
Running your code, I timed it to 5.84s
using Memoize and simplifying expressions:
cos = Memoize(mp.cos)
sin = Memoize(mp.sin)
def func(E):
def f(t):
cost = cos(t)
return sin(t) * mp.exp(E * (cos(t*t) + cost*cost))
return f
I got it down to 3.25s first time, and ~2.8s in the next iterations.
(An even better approach might be using lru_cache from the standard library, but I did not try to time it).
If you are running similar code many times, it may be sensible to Memoize() both func and f, so the computations become trivial ( ~0.364s ).
Replacing mp with math for cos/sin/exp, I got down to ~1.3s, and now memoizing make the performance worse, for some reason (~1.5s, I guess the lookup time became dominant).
In general, you want to avoid calls to transcendent functions like sin, cos, exp, ln as much as possible, especially in a "hot" function like an integrand.
Replace x**2 by x*x (often x**2 calls a generic=slow exponentiation function)
use variables for "expensive" intermediate terms which are used more than once
transform your equation to reduce or eliminate transcendent functions
special-case for typical parameter values. Integer exponents are a frequent candidate.
precompute everything that is constant, espc. in parameterized functions
For the particular example you can substitute z=cos(theta). It is dz = -sin(theta)dtheta. Your integrand becomes
-exp(E*(z^2 + cos(arccos(z)^2))
saving you some of the transcendent function calls. The boundaries [0, pi] become [1, -1]. Also avoid x**2, better use x*x.
Complete code:
import mpmath as mp
import numpy as np
from time import time as epochTime
def func(E):
def f(z):
acz = mp.acos(z)
return -mp.exp(E * (mp.cos(acz*acz) + z*z))
return f
start = epochTime()
mp.mp.dps = 15
mp.mp.pretty = True
E = np.linspace(0, 10, 200)
ints = [mp.quadgl(func(e), [1.0, -1.0]) for e in E] # Main Job
print ('Took:{:.3}s'.format(epochTime() - start))
I am a python beginner, currently using scipy's odeint to compute a coupled ODE system, however, when I run, python shell always tell me that
>>>
Excess work done on this call (perhaps wrong Dfun type).
Run with full_output = 1 to get quantitative information.
>>>
So, I have to change my time step and final time, in order to make it integratable. To do this, I need to try a different combinations, which is quite a pain. Could anyone tell me how can I ask odeint to automatically vary the time step and final time to successfully integrate this ode system?
and here is part of the code which has called odeint:
def main(t, init_pop_a, init_pop_b, *args, **kwargs):
"""
solve the obe for a given set of parameters
"""
# construct initial condition
# initially, rho_ee = 0
rho_init = zeros((16,16))*1j ########
rho_init[1,1] = init_pop_a
rho_init[2,2] = init_pop_b
rho_init[0,0] = 1 - (init_pop_a + init_pop_b)########
rho_init_ravel, params = to_1d(rho_init)
# perform the integration
result = odeint(wrapped_bloch3, rho_init_ravel, t, args=args)
# BUG: need to pass kwargs
# rewrap the result
return from_1d(result, params, prepend=(len(t),))
things = [2*pi, 20*pi, 0,0, 0,0, 0.1,100]
Omega_a, Omega_b, Delta_a, Delta_b, \
init_pop_a, init_pop_b, tstep, tfinal = things
args = ( Delta_a, Delta_b, Omega_a, Omega_b )
t = arange(0, tfinal + tstep, tstep)
data = main(t, init_pop_a, init_pop_b, *args)
plt.plot(t,abs(data[:,4,4]))
where wrapped_bloch3 is the function compute dy/dt.
EDIT: I note you already got an answer here: complex ODE systems in scipy
odeint does not work with complex-valued equations. I get
from scipy.integrate import odeint
import numpy as np
def func(t, y):
return 1 + 1j
t = np.linspace(0, 1, 200)
y = odeint(func, 0, t)
# -> This outputs:
#
# TypeError: can't convert complex to float
# odepack.error: Result from function call is not a proper array of floats.
You can solve your equation by the other ode solver:
from scipy.integrate import ode
import numpy as np
def myodeint(func, y0, t):
y0 = np.array(y0, complex)
func2 = lambda t, y: func(y, t) # odeint has these the other way :/
sol = ode(func2).set_integrator('zvode').set_initial_value(y0, t=t[0])
y = [sol.integrate(tp) for tp in t[1:]]
y.insert(0, y0)
return np.array(y)
def func(y, t, alpha):
return 1j*alpha*y
alpha = 3.3
t = np.linspace(0, 1, 200)
y = myodeint(lambda y, t: func(y, t, alpha), [1, 0, 0], t)