Matrix multiplication in numpy vs normal for loop in python - python

I thought of checking the time difference for a matrix multiplication using numpy matrix multiplication vs normal for loop method. I understand numpy will be faster because of vectorization, but I couldn't prove it using a simple code like below.
I am getting python normal for loop is faster than numpy in all of my test. What did I do wrong here?
My code:
# testing numpy vs normal method
def compare_np_vs_normal(datapoints):
x = []
for i in range(datapoints):
x.append(math.ceil(random.random() * 10)) # random data
# linear activation function
m = math.ceil(random.random() * 10) # Random value for slope
c = math.ceil(random.random() * 10) # Random value for intercept
# linear activation result for all datapoints using normal method and np methods
def normal_method():
y = []
for x_ in x:
y_ = x_ * m + c
y.append(y_)
return y
def np_method():
x_ = np.c_[np.array(x),np.ones(len(x))]
a_ = np.array([[m],[c]])
return np.matmul(x_,a_)
print("Comparing for {} datapoints".format(datapoints))
print("Normal method:")
t1 = time.perf_counter()
y_result_normal = normal_method()
t2 = time.perf_counter()
print("Time taken {}".format(t2-t1))
print("Numpy method:")
t1 = time.perf_counter()
y_result_np = np_method()
t2 = time.perf_counter()
print("Time taken {}".format(t2-t1))
return y_result_normal, y_result_np
The result I get was
Comparing for 1000 datapoints
Normal method:
Time taken 7.759999971312936e-05
Numpy method:
Time taken 0.0007053999997879146

You are making an overly complicated calculation in the np_method function.
replace it with:
def np_method():
x_ = np.array(x)
return x_ * m + c
to see the improvement

Related

Converting a matlab ODE solver to python

I am attempting to convert a MATLAB code to python but I am getting answers that are completely different. I've attempted using scipy.ode,solve_ivp, and odeint.When running the code I get values that range from 1 to 0.2 but in MATLAB they range from 30 to 70.
MATLAB code:
function dydt = onegroup(t,y,tsi,rho)
%Point Reactor Kinetics equation parameters
Lambda = 10^-4;
beta = 0.0065;
lambda = 0.08;
%Reactivity
rho = interp1(tsi,rho,t);
dydt = zeros(2,1);
%One Group-Delayed Precursor
dydt(1) = -lambda*y(1)+beta*y(2);
%Power
dydt(2) = ((rho-beta)/Lambda)*y(2)+(lambda*y(1))/Lambda;
end
The input file is as follows:
%%
clear;
clc;
tsi=linspace(0,20,21);
rho=ones(1,21)*0.0025;
y0= [1; 0];
ts=[0 20];
ode23(#(t,y) onegroup(t,y,tsi,rho),ts,y0)
My python code is as follows:
from scipy.integrate import ode
from numpy import arange,vstack,array,sqrt,ones
from pylab import plot,close,show,xlabel,ylabel,title,grid,pi,clf
from scipy.interpolate import interp1d
# Function defining derivates of the positions and velocities.
def dydt(t,y,tsi,rho):
Lambda = 10^-4
beta = 0.0065
lambda2 = 0.08
rho = interp1d(tsi,rho, fill_value = 'extrapolate')
#one group delayed precursor
dydt1 = (-lambda2*y[0] + beta*y[1])
#power
dydt2 = (((rho(t)-beta)/Lambda)*y[1]+(lambda2*y[0])/Lambda)
return array([dydt1, dydt2])
'''
#Final time
x = 21
#Time steps
dt = 1
tsi=np.arange(0,x,dt)
j0 = 0
times = np.arange(0,x,dt)
dt = 1
rho=np.ones(x)*0.0025
y0= [1,0]
t0 = [0,x-1]
'''
# Initial Conditions
y0, t0 = [1.,0.], 0.
# Model parameters
k = arange(0,21)
m = ones(21)*0.0025
# CREATE ODE OBJECT
i = ode(dydt) # Create an ode object and bind the rhs function.
i.set_integrator('dopri5') # Which integrator to use.
i.set_initial_value(y0,t0) # The initial values
# Define parameters for the derivatives function.
# These will be passed to the function at each time.
i.set_f_params(k,m)
tf = 21 # Final time
dt = 1 # Output interval
yf=[y0] # List for storing the output
times = arange(t0,tf,dt) # Times to evaluate a solution.
# Main loop for the integration
for t in times[1:]:
i.integrate(i.t+dt)
yf.append(i.y)
yf = array(yf)
^ in python is a bitwise logical and.
Use
Lambda = 1e-4
for that parameter.

Using Scipy routines with Numba

I was writing a program in which Scipy CubicSpline routine is used at certain points,
because of the use of Scipy routine I cannot use Numba #jit on my whole program.
I recently came across the #overload feature and I was wondering if it could be used in this way,
from numba.extending import overload
from numba import jit
from scipy.interpolate import CubicSpline
import numpy as np
x = np.arange(10)
y = np.sin(x)
xs = np.arange(-0.5, 9.6, 0.1)
def Spline_interp(xs,x,y):
cs = CubicSpline(x, y)
ds = cs(xs)
return ds
#overload(Spline_interp)
def jit_Spline_interp(xs,x,y):
ds = Spline_interp(xs,x,y)
def jit_Spline_interp_impl(xs,x, y):
return ds
return jit_Spline_interp_impl
#jit(nopython=True)
def main():
# other codes compatible with #njit
ds = Spline_interp(xs,x,y)
# other codes compatible with #njit
return ds
print(main())
kindly correct me if my understanding of the #overload feature is wrong and what is the possible solution for using such Scipy libraries with Numba.
Wrapping compiled functions using ctypes (Numba)
Especially for more complex functions, reimplementing everything in numba-compileable Python code can be quite a lot of work, and sometimes slower. The following answer will be on calling C-like functions directly from a shared object or dynamic library.
Compiling the fortran routines
This example will show a way to do this on windows, but it should be straight forward on other os. For a portable interface defining an ISO_C_BINDING is highly recommendable. In this answer I will try it without an interface.
dll.def
EXPORTS
SPLEV #1
Compilation
ifort /dll dll.def splev.f fpbspl.f /O3 /fast
Calling this function directly from Numba
Have a look what is expected by the Fortran routine
Check every input in the wrapper (datatype, contiguousness). You just provide some pointers to the fortran function. There is no additional safety check.
Wrapper
The following code shows two ways on how to call this functions. In Numba it isn't directly possible to pass scalar by reference. You can either allocate an array on the heap (slow for small functions), or use an intrinsic to use stack arrays.
import numba as nb
import numpy as np
import ctypes
lib = ctypes.cdll.LoadLibrary("splev.dll")
dble_p=ctypes.POINTER(ctypes.c_double)
int_p =ctypes.POINTER(ctypes.c_longlong)
SPLEV=lib.SPLEV
SPLEV.restype = ctypes.c_void_p
SPLEV.argtypes = (dble_p,int_p,dble_p,int_p,dble_p,dble_p,int_p,int_p,int_p)
from numba import types
from numba.extending import intrinsic
from numba.core import cgutils
#intrinsic
def val_to_ptr(typingctx, data):
def impl(context, builder, signature, args):
ptr = cgutils.alloca_once_value(builder,args[0])
return ptr
sig = types.CPointer(nb.typeof(data).instance_type)(nb.typeof(data).instance_type)
return sig, impl
#intrinsic
def ptr_to_val(typingctx, data):
def impl(context, builder, signature, args):
val = builder.load(args[0])
return val
sig = data.dtype(types.CPointer(data.dtype))
return sig, impl
#with intrinsics, temporary arrays are allocated on stack
#faster but much more relevant for functions with very low runtime
#nb.njit()
def splev_wrapped(x, coeff,e):
#There are just pointers passed to the fortran function.
#The arrays have to be contiguous!
t=np.ascontiguousarray(coeff[0])
x=np.ascontiguousarray(x)
c=coeff[1]
k=coeff[2]
y=np.empty(x.shape[0],dtype=np.float64)
n_arr=val_to_ptr(nb.int64(t.shape[0]))
k_arr=val_to_ptr(nb.int64(k))
m_arr=val_to_ptr(nb.int64(x.shape[0]))
e_arr=val_to_ptr(nb.int64(e))
ier_arr=val_to_ptr(nb.int64(0))
SPLEV(t.ctypes,n_arr,c.ctypes,k_arr,x.ctypes,
y.ctypes,m_arr,e_arr,ier_arr)
return y, ptr_to_val(ier_arr)
#without using intrinsics
#nb.njit()
def splev_wrapped_2(x, coeff,e):
#There are just pointers passed to the fortran function.
#The arrays have to be contiguous!
t=np.ascontiguousarray(coeff[0])
x=np.ascontiguousarray(x)
c=coeff[1]
k=coeff[2]
y=np.empty(x.shape[0],dtype=np.float64)
n_arr = np.empty(1, dtype=np.int64)
k_arr = np.empty(1, dtype=np.int64)
m_arr = np.empty(1, dtype=np.int64)
e_arr = np.empty(1, dtype=np.int64)
ier_arr = np.zeros(1, dtype=np.int64)
n_arr[0]=t.shape[0]
k_arr[0]=k
m_arr[0]=x.shape[0]
e_arr[0]=e
SPLEV(t.ctypes,n_arr.ctypes,c.ctypes,k_arr.ctypes,x.ctypes,
y.ctypes,m_arr.ctypes,e_arr.ctypes,ier_arr.ctypes)
return y, ier_arr[0]
You would either need to fallback to object-mode (locally, like #max9111 suggested), or implement the CubicSpline function yourself in Numba.
For as far as I understand, the overload decorator "only" makes the compiler aware that it can use a Numba-compatible implementation if it encounters the overloaded function. It doesn't magically convert the function the be Numba compatible.
There is a package which expose some Scipy functionality to Numba, but that seems early days and only contains some scipy.special functions so far.
https://github.com/numba/numba-scipy
This is a repost of my solution, posted on numba discourse https://numba.discourse.group/t/call-scipy-splev-routine-in-numba-jitted-function/1122/7.
I had originally gone ahead with #max9111 suggestion of using objmode. It gave a temporary fix. But, since the code was performance critical, I finally ended up writing a numba version of scipy's 'interpolate.splev' subroutine for the spline interpolation.
import numpy as np
import numba
from scipy import interpolate
import matplotlib.pyplot as plt
import time
# Custom wrap of scipy's splrep
def custom_splrep(x, y, k=3):
"""
Custom wrap of scipy's splrep for calculating spline coefficients,
which also check if the data is equispaced.
"""
# Check if x is equispaced
x_diff = np.diff(x)
equi_spaced = all(np.round(x_diff,5) == np.round(x_diff[0],5))
dx = x_diff[0]
# Calculate knots & coefficients (cubic spline by default)
t,c,k = interpolate.splrep(x,y, k=k)
return (t,c,k,equi_spaced,dx)
# Numba accelerated implementation of scipy's splev
#numba.njit(cache=True)
def numba_splev(x, coeff):
"""
Custom implementation of scipy's splev for spline interpolation,
with additional section for faster search of knot interval, if knots are equispaced.
Spline is extrapolated from the end spans for points not in the support.
"""
t,c,k, equi_spaced, dx = coeff
t0 = t[0]
n = t.size
m = x.size
k1 = k+1
k2 = k1+1
nk1 = n - k1
l = k1
l1 = l+1
y = np.zeros(m)
h = np.zeros(20)
hh = np.zeros(19)
for i in range(m):
# fetch a new x-value arg
arg = x[i]
# search for knot interval t[l] <= arg <= t[l+1]
if(equi_spaced):
l = int((arg-t0)/dx) + k
l = min(max(l, k1), nk1)
else:
while not ((arg >= t[l-1]) or (l1 == k2)):
l1 = l
l = l-1
while not ((arg < t[l1-1]) or (l == nk1)):
l = l1
l1 = l+1
# evaluate the non-zero b-splines at arg.
h[:] = 0.0
hh[:] = 0.0
h[0] = 1.0
for j in range(k):
for ll in range(j+1):
hh[ll] = h[ll]
h[0] = 0.0
for ll in range(j+1):
li = l + ll
lj = li - j - 1
if(t[li] != t[lj]):
f = hh[ll]/(t[li]-t[lj])
h[ll] += f*(t[li]-arg)
h[ll+1] = f*(arg-t[lj])
else:
h[ll+1] = 0.0
break
sp = 0.0
ll = l - 1 - k1
for j in range(k1):
ll += 1
sp += c[ll]*h[j]
y[i] = sp
return y
######################### Testing and comparison #############################
# Generate a data set for interpolation
x, dx = np.linspace(10,100,200, retstep=True)
y = np.sin(x)
# Calculate the cubic spline spline coeff's
coeff_1 = interpolate.splrep(x,y, k=3) # scipy's splrep
coeff_2 = custom_splrep(x,y, k=3) # Custom wrap of scipy's splrep
# Generate data for interpolation and randomize
x2 = np.linspace(0,110,10000)
np.random.shuffle(x2)
# Interpolate
y2 = interpolate.splev(x2, coeff_1) # scipy's splev
y3 = numba_splev(x2, coeff_2) # Numba accelerated implementation of scipy's splev
# Plot data
plt.plot(x,y,'--', linewidth=1.0,color='green', label='data')
plt.plot(x2,y2,'o',color='blue', markersize=2.0, label='scipy splev')
plt.plot(x2,y3,'.',color='red', markersize=1.0, label='numba splev')
plt.legend()
plt.show()
print("\nTime for random interpolations")
# Calculation time evaluation for scipy splev
t1 = time.time()
for n in range(0,10000):
y2 = interpolate.splev(x2, coeff_1)
print("scipy splev", time.time() - t1)
# Calculation time evaluation for numba splev
t1 = time.time()
for n in range(0,10000):
y2 = numba_splev(x2, coeff_2)
print("numba splev",time.time() - t1)
print("\nTime for non random interpolations")
# Generate data for interpolation without randomize
x2 = np.linspace(0,110,10000)
# Calculation time evaluation for scipy splev
t1 = time.time()
for n in range(0,10000):
y2 = interpolate.splev(x2, coeff_1)
print("scipy splev", time.time() - t1)
# Calculation time evaluation for numba splev
t1 = time.time()
for n in range(0,10000):
y2 = numba_splev(x2, coeff_2)
print("numba splev",time.time() - t1)
The above code is optimised for faster knot search if the knots are equispaced.
On my corei7 machine, if the interpolation is done at random values, numba version is faster,
Scipy’s splev = 0.896s
Numba splev = 0.375s
If the interpolation is not done at random values scipy’s version is faster,
Scipy’s splev = 0.281s
Numba splev = 0.375s
Ref : https://github.com/scipy/scipy/tree/v1.7.1/scipy/interpolate/fitpack ,
https://github.com/dbstein/fast_splines

What is the efficient way to perform 2D matrix filtering in python?

I am trying to implement a matrix filtering in Python, and so far the implementation appears to be very slow and inefficient. I wonder if there is an efficient way of performing such filtering.
Provided a large matrix A and a filtering matrix M, the function should return a "remixed" matrix R which is obtained by multiplying each element (i,j) of A by M, then the result is superposed/inserted into R at position (i,j). Please find below the code that is expected to do this.
The example below takes about 68 sec (!) at my computer, which seems very inefficient.
I would be very grateful if you could recommend the way to speed-up this function. Many thanks in advance!
import numpy as np
import time
nx = ny = 1500
n_mix = 50
# matrix to be filtered
A = np.random.random_sample( (nx, ny) )
# filter to be applied to each point:
M = np.random.random_sample( (2*n_mix+1, 2*n_mix+1) )
# the result is stored in "remix":
remix = np.zeros_like(A)
start = time.time()
for i in range(n_mix, nx-n_mix):
for j in range(n_mix, ny-n_mix):
remix[i - n_mix:i + n_mix + 1, j - n_mix:j + n_mix + 1 ] += M * A[i,j]
print remix
duration = time.time() - start
print(round(duration))
UPDATE
In fact the ndimage package in scipy has the general convolution function that does the job. I post below the 3 variants of doing the filtering, with respected times. The fastest is with ndimage.convolution (24 seconds vs. 56 and 68 by other methods). However, it still seems rather slow...
import numpy as np
from scipy import ndimage
import time
import sys
def remix_function(A, M):
n = (np.shape(M)[0]-1)/2
R = np.zeros_like(A)
for k in range(-n, n+1):
for l in range(-n, n+1):
# Ak = np.roll(A, -k, axis = 0)
# Akl = np.roll(Ak, -l, axis = 1)
R += np.roll(A, (-k,-l), axis = (0,1) ) * M[n-k, n-l]
return R
if __name__ == '__main__':
np.set_printoptions(precision=2)
nx = ny = 1500
n_mix = 50
nb = 2*n_mix+1
# matrix to be filtered
A = np.random.random_sample( (nx, ny) )
# filter to be applied to each point:
M = np.random.random_sample( (nb, nb) )
# the result is stored in "remix":
remix1 = np.zeros_like(A)
remix2 = np.zeros_like(A)
remix3 = np.zeros_like(A)
#------------------------------------------------------------------------------
# var 1
#------------------------------------------------------------------------------
start = time.time()
remix1 = remix_function(A, M)
duration = time.time() - start
print('time for var1 =', round(duration))
#------------------------------------------------------------------------------
# var 2
#------------------------------------------------------------------------------
start = time.time()
for i in range(n_mix, nx-n_mix):
for j in range(n_mix, ny-n_mix):
remix2[i - n_mix:i + n_mix + 1, j - n_mix:j + n_mix + 1 ] += M * A[i,j]
duration = time.time() - start
print('time for var2 =', round(duration))
#------------------------------------------------------------------------------
# var 3
#------------------------------------------------------------------------------
start = time.time()
remix3 = ndimage.convolve(A, M)
duration = time.time() - start
print('time for var3 (convolution) =', round(duration))
I can't comment on posts yet, but your double for loop is the problem. Have you tried defining a function and then using np.vectorize?

How to use scipy.optimize minimize_scalar when objective function has multiple arguments?

I have a function of multiple arguments. I want to optimize it with respect to a single variable while holding others constant. For that I want to use minimize_scalar from spicy.optimize. I read the documentation, but I am still confused how to tell minimize_scalar that I want to minimize with respect to variable:w1. Below is a minimal working code.
import numpy as np
from scipy.optimize import minimize_scalar
def error(w0,w1,x,y_actual):
y_pred = w0+w1*x
mse = ((y_actual-y_pred)**2).mean()
return mse
w0=50
x = np.array([1,2,3])
y = np.array([52,54,56])
minimize_scalar(error,args=(w0,x,y),bounds=(-5,5))
You can use a lambda function
minimize_scalar(lambda w1: error(w0,w1,x,y),bounds=(-5,5))
You can also use a partial function.
from functools import partial
error_partial = partial(error, w0=w0, x=x, y_actual=y)
minimize_scalar(error_partial, bounds=(-5, 5))
In case you are wondering about the performance ... it is the same as with lambdas.
import time
from functools import partial
import numpy as np
from scipy.optimize import minimize_scalar
def error(w1, w0, x, y_actual):
y_pred = w0 + w1 * x
mse = ((y_actual - y_pred) ** 2).mean()
return mse
w0 = 50
x = np.arange(int(1e5))
y = np.arange(int(1e5)) + 52
error_partial = partial(error, w0=w0, x=x, y_actual=y)
p_time = []
for _ in range(100):
p_time_ = time.time()
p = minimize_scalar(error_partial, bounds=(-5, 5))
p_time_ = time.time() - p_time_
p_time.append(p_time_ / p.nfev)
l_time = []
for _ in range(100):
l_time_ = time.time()
l = minimize_scalar(lambda w1: error(w1, w0, x, y), bounds=(-5, 5))
l_time_ = time.time() - l_time_
l_time.append(l_time_ / l.nfev)
print(f'Same performance? {np.median(p_time) == np.median(l_time)}')
# Same performance? True
The marked correct answer is actually minimizing with respect to W0. It should be:
minimize_scalar(lambda w1: error(w1,w0,x,y),bounds=(-5,5))

How to do IIR filter when most of the coefficients are zero

I what to do some audio effect in Python. For example, the simplest echo effect formula:
y[n] = x[n] + k*y[n-1000]
This is an IIR filter, and can be calculated by lfilter() in scipy.signal:
import numpy as np
import time
import scipy.signal as signal
pulse = np.zeros(10000)
pulse[0] = 1.0
a = np.zeros(1000)
a[[0,999]] = 1, -0.7
start = time.clock()
out = signal.lfilter([1], a, pulse)
print time.clock() - start
import pylab as pl
pl.plot(out)
pl.show()
The problem is: most of the coefficients of a are zero, and the filter can be calculated very quickly, but lfilter() can't realize this, and use all the zero coefficients.
I know I can code some particular calculation for this simplest example, but I am looking for a general solution.
Try this:
import scipy
import scipy.signal as sig
import time
# Input signal.
x = scipy.randn(50000)
# Filter coefficients.
a = scipy.zeros(1001)
a[[0,-1]] = [1, -0.7]
# Method using lfilter.
start = time.clock()
y0 = sig.lfilter([1], a, x)
end = time.clock() - start
print end
# Method using for loop.
start = time.clock()
y1 = x
for i in range(1000, y1.size):
y1[i] += 0.7*y1[i-1000]
end = time.clock() - start
print end
# Check that both outputs are equal.
print scipy.square(y0-y1).sum()
On my laptop: 0.38 seconds for method 1, 0.13 seconds for method 2.
Note: For delay of N samples, you must set a[N], not a[N-1].

Categories

Resources