Improve performance of function without parallelization

Improve performance of function without parallelization - python

Some weeks ago I posted a question (Speed up nested for loop with elements exponentiation) which got a very good answer by abarnert. This question is related to that one since it makes use of the performance improvements suggested by said user.
I need to improve the performance of a function that involves calculating three factors and then applying an exponential on them.
Here's a MWE of my code:
import numpy as np
import timeit
def random_data(N):
# Generate some random data.
return np.random.uniform(0., 10., N)
# Data lists.
array1 = np.array([random_data(4) for _ in range(1000)])
array2 = np.array([random_data(3) for _ in range(2000)])
# Function.
def func():
# Empty list that holds all values obtained in for loop.
lst = []
for elem in array1:
# Avoid numeric errors if one of these values is 0.
e_1, e_2 = max(elem[0], 1e-10), max(elem[1], 1e-10)
# Obtain three parameters.
A = 1./(e_1*e_2)
B = -0.5*((elem[2]-array2[:,0])/e_1)**2
C = -0.5*((elem[3]-array2[:,1])/e_2)**2
# Apply exponential.
value = A*np.exp(B+C)
# Store value in list.
lst.append(value)
return lst
# time function.
func_time = timeit.timeit(func, number=100)
print func_time
Is it possible to speed up func without having to recurr to parallelization?

Here's what I have so far. My approach is to do as much of the math as possible across numpy arrays.
Optimizations:
Calculate As within numpy
Re-factor calculation of B and C by splitting them into factors, some of which can be computed within numpy
Code:
def optfunc():
e0 = array1[:, 0]
e1 = array1[:, 1]
e2 = array1[:, 2]
e3 = array1[:, 3]
ar0 = array2[:, 0]
ar1 = array2[:, 1]
As = 1./(e0 * e1)
Bfactors = -0.5 * (1 / e0**2)
Cfactors = -0.5 * (1 / e1**2)
lst = []
for i, elem in enumerate(array1):
B = ((elem[2] - ar0) ** 2) * Bfactors[i]
C = ((elem[3] - ar1) ** 2) * Cfactors[i]
value = As[i]*np.exp(B+C)
lst.append(value)
return lst
print np.allclose(optfunc(), func())
# time function.
func_time = timeit.timeit(func, number=10)
opt_func_time = timeit.timeit(optfunc, number=10)
print "%.3fs --> %.3fs" % (func_time, opt_func_time)
Result:
True
0.759s --> 0.485s
At this point I'm stuck. I managed to do it entirely without python for loops, but it is slower than the above version for a reason I do not yet understand:
def optfunc():
x = array1
y = array2
x0 = x[:, 0]
x1 = x[:, 1]
x2 = x[:, 2]
x3 = x[:, 3]
y0 = y[:, 0]
y1 = y[:, 1]
A = 1./(x0 * x1)
Bfactors = -0.5 * (1 / x0**2)
Cfactors = -0.5 * (1 / x1**2)
B = (np.transpose([x2]) - y0)**2 * np.transpose([Bfactors])
C = (np.transpose([x3]) - y1)**2 * np.transpose([Cfactors])
return np.transpose([A]) * np.exp(B + C)
Result:
True
0.780s --> 0.558s
However note that the latter gets you an np.array whereas the former only gets you a Python list... this might account for the difference but I'm not sure.

Related

Diagonal of a numpy matrix without compute the entire matrix

I have a simple algebric problem and I would like to solve it with numpy (of course that I could solve it easily with numba, but that is not the point).
Let us consider a first random matrix A with size (m x n), with n a big value, and a second random matrix B with size (n x n).
A = np.random.random((1E6, 1E2))
B = np.random.random((1E2, 1E2))
We want to compute the following expression:
np.diag(np.dot(np.dot(A,B),B.T))
The problem is that the entire matrix is loaded to the memory and only then is extracted the diagonal. Is it possible to do this operation in a more efficient way?

This is how I would approach it from your starting expression
np.diag(np.dot(np.dot(A,B),B.T))
You can start by grouping terms:
np.diag(np.dot(A, np.dot(B,B.T)))
then only use the first relevant (square) part of A:
np.diag(np.dot(A[:B.shape[0], :], np.dot(B,B.T)))
and then avoid the extra multiplications (that will fall out of the diagonal), by doing the element-wise multiplications yourself:
np.sum( np.multiply(A[:B.shape[0], :].T, np.dot(B,B.T)), 0)

Changed (A*B)*B.T to A*(B*B.T)
Multiplied only this part of A (A[:B.shape[0]]) that would result in the diagonal part of the matrix
import numpy as np
import time
A = np.random.random((1000_000, 100))
B = np.random.random((100, 100))
start_time = time.time()
result = np.diag(np.dot(np.dot(A, B), B.T))
print('Baseline: ', time.time() - start_time)
start_time = time.time()
for i in range(100):
result2 = np.diag(np.dot(A[:B.shape[0]], np.dot(B, B.T)))
print('Optimized: ', (time.time() - start_time) / 100)
stop = 1
assert np.allclose(result, result2)
Baseline: 1.7957241535186768
Optimized: 0.00016015291213989258

Yes.
N = 1E6
A = np.random.random((N, 1E2))
B = np.random.random((1E2, 1E2))
result = 0;
for i in range(N):
result += np.dot(np.dot(A[i,:], B[i,:])[i, :], B.T[i, :])
# Replacing B.T[i, :] with B[:, i].T might be a little more efficient
Explanation:
Say we have: K = np.dot(np.dot(A,B),B.T).
Then, K[0,0] = (A[0, :] * B[:,0])[0, :] * B.T[:])
Let X = (A[0, :] * B[:,0]), which is the [0, 0] element of np.dot(A,B)
Then X[0, :] * B.T[:, 0] is the [0, 0] element of np.dot(np.dot(A,B),B.T)
Then X[0, :] * B.T[:, 0] = (A[0, :] * B[:,0])[0, :] * B.T[:])
We can also generalize this result to: K[i,i] = (A[i, :] * B[:,i])[i, :] * B.T[:, i])

Simple Gradient Descent Implementation Error

I have tried to use a toy problem of linear regression for implanting the optimisation on the MSE function using the algorithm of gradient decent.
import numpy as np
# Data points
x = np.array([1, 2, 3, 4])
y = np.array([1, 1, 2, 2])
# MSE function
f = lambda a, b: 1 / len(x) * np.sum(np.power(y - (a * x + b), 2))
# Gradient
def grad_f(v_coefficients):
a = v_coefficients[0, 0]
b = v_coefficients[1, 0]
return np.array([1 / len(x) * np.sum(2 * (y - (a * x + b)) * x),
1 / len(x) * np.sum(2 * (y - (a * x + b)))]).reshape(2, 1)
# Gradient Decent with epsilon as tol vector and alpha as the step/learning rate
def gradient_decent(v_prev):
tol = 10 ** -3
epsilon = np.array([tol * np.ones([2, 1], int)])
alpha = 0.2
v_next = v_prev - alpha * grad_f(v_prev)
if (np.abs(v_next - v_prev) <= epsilon).all():
return v_next
else:
gradient_decent(v_next)
# v_0 is the initial guess
v_0 = np.array([[1], [1]])
gradient_decent(v_0)
I have tried different alpha values but the code never converges (infinite recursion) it seems that the issue is with the stop condition of the recursion, but after few runs the v_next and v_prev bounces between -infinte to infinite

It's great that you are learning machine learning (^_^) by implementing some base algorithms by yourself. Regarding your question, there are two problems in your code, first one is mathematical, the sign in:
def grad_f(v_coefficients):
a = v_coefficients[0, 0]
b = v_coefficients[1, 0]
return np.array([1 / len(x) * np.sum(2 * (y - (a * x + b)) * x),
1 / len(x) * np.sum(2 * (y - (a * x + b)))]).reshape(2, 1)
should be
return -np.array(...)
since
the second one is programming, this kind of code will not return you a result in Python:
def add(x):
new_x = x + 1
if new_x > 10:
return new_x
else:
add(new_x)
you must use return in both clauses of the if statement, so it should be
def add(x):
new_x = x + 1
if new_x > 10:
return new_x
else:
return add(new_x)
There is also a minor issue with the alpha coefficient for these particular data points alpha=0.2 is too big for algorithm to converge, you need to use smaller alpha. I also slightly refactor your initial code using numpy broadcasting convention (https://numpy.org/doc/stable/user/basics.broadcasting.html) to get the following result:
import numpy as np
# Data points
x = np.array([1, 2, 3, 4])
y = np.array([1, 1, 2, 2])
# MSE function
f = lambda a, b: np.mean(np.power(y - (a * x + b), 2))
# Gradient
def grad_f(v_coefficients):
a = v_coefficients[0, 0]
b = v_coefficients[1, 0]
return -np.array([np.mean(2 * (y - (a * x + b)) * x),
np.mean(2 * (y - (a * x + b)))]).reshape(2, 1)
# Gradient Decent with epsilon as tol vector and alpha as the step/learning rate
def gradient_decent(v_prev):
tol = 1e-3
# epsilon = np.array([tol * np.ones([2, 1], int)]) do not need this, due to numpy broadcasting rules
alpha = 0.1
v_next = v_prev - alpha * grad_f(v_prev)
if (np.abs(v_next - v_prev) <= alpha).all():
return v_next
else:
return gradient_decent(v_next)
# v_0 is the initial guess
v_0 = np.array([[1], [1]])
gradient_decent(v_0)

How can these 2 loops be vectorized in Python?

I'm retrieving close to 400k values in values, which is pretty slow by itself (that code is not being shown), and then I try to do a prediction of those values through a Kalmann filter, the first loop is taking a little over a minute to run, and the second aroun 2 and half minutes, I think the first can be vectorized, but I'm not sure how, specially the window_sma. The second loop I'm not sure how I could deal with the i increasing the x array (x = np.append(x, new_x_col, axis=1)).
This is the first one, which tries to do a prediction based on the values from SMA, using polyfit and polyval:
window_sma = 200
sma_index = 500
offset = 50
SMA = talib.SMA(values, timeperiod = window_sma)
vector_X = [1, 2, 3, 15]
sma_predicted = []
start_time = time.time()
for i in range (sma_index, len(SMA)):
j = int(i - offset)
k = int(i - offset / 2)
window_sma = [SMA[j], SMA[k], SMA[i]]
polyfit = np.polyfit([1, 2, 3], window_sma, 2)
y_hat = np.polyval(polyfit, vector_X)
sma_predicted.append(y_hat[-1])
And the second one, which attemps to filter the output of the first for loop to have a better prediction of the values I got from SMA:
# Kalman Filter
km = KalmanFilter(dim_x = 2, dim_z = 1)
# state transition matrix
km.F = np.array([[1.,1.],
[0.,1.]])
# Measurement function
km.H = np.array([[1.,0.]])
# Change in time
dt = 0.0001
a = 1.5
# Covariance Matrix
km.Q = np.power(a, 2) * \
np.array([[np.power(dt,4)/4, np.power(dt,3)/2],
[np.power(dt,3)/2, np.power(dt,2)]])
# Variance
km.R = 1000
# Identity Matrix
I = np.array([[1, 0], [0, 1]])
# Measurement Matrix
km.Z = np.array(sma_predicted)
# Initial state
x = np.zeros((2,1))
x = np.array([[sma_predicted[0]], [0]])
# Initial distribution state's covariance matrix
km.P = np.array([[1000, 0], [0, 1000]])
for i in range (0, len(sma_predicted) - 1):
# Prediction
new_x_col = np.dot(km.F, x[:, i]).reshape(2, 1)
x = np.append(x, new_x_col, axis=1)
km.P = km.F * km.P * km.F.T + km.Q
# Correction
K = np.dot(km.P, km.H.T) / (np.dot(np.dot(km.H, km.P), km.H.T) + km.R)
x[:, -1] = x[:, -1] + np.dot(K, (km.Z[i + 1] - np.dot(km.H, x[:, -1])))
#x[:, -1] = (x[:, -1] + K * (km.Z[i + 1] - km.H * x[:, -1])).reshape(2, i + 2)
km.P = (I - K * km.H) * km.P
Thanks!

The second one is worth attacking first, so I'll just do that.
You have this:
x = np.array([[sma_predicted[0]], [0]])
for i in range (0, len(sma_predicted) - 1):
new_x_col = np.dot(km.F, x[:, i]).reshape(2, 1)
x = np.append(x, new_x_col, axis=1)
# ...
Repeatedly appending to the same array is always bad practice in NumPy, so start with something like this:
x = np.zeros((2, len(sma_predicted)))
x[0, 0] = sma_predicted[0]
for i in range(len(sma_predicted) - 1):
x[:, i+1] = np.dot(km.F, x[:, i])
# ...
Note the reshape(2, 1) is not needed, thanks to NumPy broadcasting.
I realize this does not answer all of your implicit questions, but perhaps it gets you started.
It would be nice if dot were a ufunc so we could do something like np.dot.outer(km.F, x.T), but it isn't (see this from 2009), so we can't. You could implement more speedups using Numba (with the append() removed as I showed, your code is a good candidate for Numba).

Is there a faster way of repeating a chunk of code x times and taking an average?

Starting with:
a,b=np.ogrid[0:n+1:1,0:n+1:1]
B=np.exp(1j*(np.pi/3)*np.abs(a-b))
B[z,b] = np.exp(1j * (np.pi/3) * np.abs(z - b +x))
B[a,z] = np.exp(1j * (np.pi/3) * np.abs(a - z +x))
B[diag,diag]=1-1j/np.sqrt(3)
this produces an n*n grid that acts as a matrix.
n is just a number chosen to represent the indices, i.e. an a*b matrix where a and b both go up to n.
Where z is a constant I choose to replace a row and column with the B[z,b] and B[a,z] formulas. (Essentially the same formula but with a small number added to the np.abs(a-b))
The diagonal of the matrix is given by the bottom line:
B[diag,diag]=1-1j/np.sqrt(3)
where,
diag=np.arange(n+1)
I would like to repeat this code 50 times where the only thing that changes is x so I will end up with 50 versions of the B np.ogrid. x is a randomly generated number between -0.8 and 0.8 each time.
x=np.random.uniform(-0.8,0.8)
I want to generate 50 versions of B with random values of x each time and take a geometric average of the 50 versions of B using the definition:
def geo_mean(y):
y = np.asarray(y)
return np.prod(y ** (1.0 / y.shape[0]), axis=-1)
I have tried to set B as a function of some index and then use a for _ in range(): loop, this doesn't work. Aside from copy and pasting the block 50 times and denoting each one as B1, B2, B3 etc; I can't think of another way of working this out.
EDIT:
I'm now using part of a given solution in order to show clearly what I am looking for:
#A matrix with 50 random values between -0.8 and 0.8 to be used in the loop
X=np.random.uniform(-0.8,0.8, (50,1))
#constructing the base array before modification by random x values in position z
a,b = np.ogrid[0:n+1:1,0:n+1:1]
B = np.exp(1j * ( np.pi / 3) * np.abs( a - b ))
B[diag,diag] = 1 - 1j / np.sqrt(3)
#list to store all modified arrays
randomarrays = []
for i in range( 0,50 ):
#copy array and modify it
Bnew = np.copy( B )
Bnew[z, b] = np.exp( 1j * ( np.pi / 3 ) * np.abs(z - b + X[i]))
Bnew[a, z] = np.exp( 1j * ( np.pi / 3 ) * np.abs(a - z + X[i]))
randomarrays.append(Bnew)
Bstack = np.dstack(randomarrays)
#calculate the geometric mean value along the axis that was the row in 2D arrays
B0 = geo_mean(Bstack)
From this example, every iteration of i uses the same value of X, I can't seem to get a way to get each new loop of i to use the next value in the matrix X. I am unsure of the ++ action in python, I know it does not work in python, I just don't know how to use the python equivalent. I want a loop to use a value of X, then the next loop to use the next value and so on and so forth so I can dstack all the matrices at the end and find a geo_mean for each element in the stacked matrices.

One pedestrian way would be to use a list comprehension or generator expression:
>>> def f(n, z, x):
... diag = np.arange(n+1)
... a,b=np.ogrid[0:n+1:1,0:n+1:1]
... B=np.exp(1j*(np.pi/3)*np.abs(a-b))
... B[z,b] = np.exp(1j * (np.pi/3) * np.abs(z - b +x))
... B[a,z] = np.exp(1j * (np.pi/3) * np.abs(a - z +x))
... B[diag,diag]=1-1j/np.sqrt(3)
... return B
...
>>> X = np.random.uniform(-0.8, 0.8, (10,))
>>> np.prod((*map(np.power, map(f, 10*(4,), 10*(2,), X), 10 * (1/10,)),), axis=0)
But in your concrete example we can do much better than that;
using the identity exp(a) x exp(b) = exp(a + b) we can convert the geometric mean after exponentiation to an arithmetic mean before exponentition. A bit of care is required because of the multivaluedness of the complex n-th root which occurs in the geometric mean. In the code below we normalize the angles occurring to range -pi, pi so as to always hit the same branch as the n-th root.
Please also note that the geo_mean function you provide is definitely wrong. It fails the basic sanity check that taking the average of copies of the same thing should return the same thing. I've provided a better version. It is still not perfect, but I think there actually is no perfect solution, because of the nonuniqueness of the complex root.
Because of this I recommend taking the average before exponentiating. As long as your random spread is less than pi this allows a well-defined averaging procedure with an average that is actually close to the samples
import numpy as np
def f(n, z, X, do_it_pps_way=True):
X = np.asanyarray(X)
diag = np.arange(n+1)
a,b=np.ogrid[0:n+1:1,0:n+1:1]
B=np.exp(1j*(np.pi/3)*np.abs(a-b))
X = X.reshape(-1,1,1)
if do_it_pps_way:
zbx = np.mean(np.abs(z-b+X), axis=0)
azx = np.mean(np.abs(a-z+X), axis=0)
else:
zbx = np.mean((np.abs(z-b+X)+3) % 6 - 3, axis=0)
azx = np.mean((np.abs(a-z+X)+3) % 6 - 3, axis=0)
B[z,b] = np.exp(1j * (np.pi/3) * zbx)
B[a,z] = np.exp(1j * (np.pi/3) * azx)
B[diag,diag]=1-1j/np.sqrt(3)
return B
def geo_mean(y):
y = np.asarray(y)
dim = len(y.shape)
y = np.atleast_2d(y)
v = np.prod(y, axis=0) ** (1.0 / y.shape[0])
return v[0] if dim == 1 else v
def geo_mean_correct(y):
y = np.asarray(y)
return np.prod(y ** (1.0 / y.shape[0]), axis=0)
# demo that orig geo_mean is wrong
B = np.exp(1j * np.random.random((5, 5)))
# the mean of four times the same thing should be the same thing:
if not np.allclose(B, geo_mean([B, B, B, B])):
print('geo_mean failed')
if np.allclose(B, geo_mean_correct([B, B, B, B])):
print('but geo_mean_correct works')
n, z, m = 10, 3, 50
X = np.random.uniform(-0.8, 0.8, (m,))
B0 = f(n, z, X, do_it_pps_way=False)
B1 = np.prod((*map(np.power, map(f, m*(n,), m*(z,), X), m * (1/m,)),), axis=0)
B2 = geo_mean_correct([f(n, z, x) for x in X])
# This is the recommended way:
B_recommended = f(n, z, X, do_it_pps_way=True)
print()
print(np.allclose(B1, B0))
print(np.allclose(B2, B1))

I think you should rely more on numpy functionality, when approaching your problem. Not a numpy expert myself, so there is surely room for improvement:
from scipy.stats import gmean
n = 2
z = 1
a = np.arange(n + 1).reshape(1, n + 1)
#constructing the base array before modification by random x values in position z
B = np.exp(1j * (np.pi / 3) * np.abs(a - a.T))
B[a, a] = 1 - 1j / np.sqrt(3)
#list to store all modified arrays
random_arrays = []
for _ in range(50):
#generate random x value
x=np.random.uniform(-0.8, 0.8)
#copy array and modify it
B_new = np.copy(B)
B_new[z, a] = np.exp(1j * (np.pi / 3) * np.abs(z - a + x))
B_new[a, z] = np.exp(1j * (np.pi / 3) * np.abs(a - z + x))
random_arrays.append(B_new)
#store all B arrays as a 3D array
B_stack = np.stack(random_arrays)
#calculate the geometric mean value along the axis that was the row in 2D arrays
geom_mean_for_rows = gmean(B_stack, axis = 2)
It uses the geometric mean function from scipy.stats module to have a vectorised approach for this calculation.

Apply 3-argument function to 3D numpy array

I have a 3D numpy array A of shape (2133, 3, 3). Basically this is a list of 2133 lists with three 3D points. Furthermore I have a function which takes three 3D points and returns one 3D point, x = f(a, b, c), with a, b, c, x numpy arrays of length 3. Now I want to apply f to A, so that the output is an array of shape (2133, 3). So something like numpy.array([f(*A[0]),...,f(*A[2132])).
I tried numpy.apply_along_axis and numpy.vectorize without success.
To be more precise the function f I consider is given by:
def f(a, b, c, r1, r2=None, r3=None):
a = np.asarray(a)
b = np.asarray(b)
c = np.asarray(c)
if np.linalg.matrix_rank(np.matrix([a, b, c])) != 3:
# raise ValueError('The points are not collinear.')
return None
a, b, c, = sort_triple(a, b, c)
if any(r is None for r in (r2, r3)):
r2, r3 = (r1, r1)
ex = (b - a) / (np.linalg.norm(b - a))
i = np.dot(ex, c - a)
ey = (c - a - i*ex) / (np.linalg.norm(c - a - i*ex))
ez = np.cross(ex, ey)
d = np.linalg.norm(b - a)
j = np.dot(ey, c - a)
x = (pow(r1, 2) - pow(r2, 2) + pow(d, 2)) / (2 * d)
y = ((pow(r1, 2) - pow(r3, 2) + pow(i, 2) + pow(j, 2)) / (2*j)) - ((i/j)*x)
z_square = pow(r1, 2) - pow(x, 2) - pow(y, 2)
if z_square >= 0:
z = np.sqrt(z_square)
intersection = a + x * ex + y*ey + z*ez
return intersection
A = np.array([[[131.83, 25.2, 0.52], [131.51, 22.54, 0.52],[133.65, 23.65, 0.52]], [[13.02, 86.98, 0.52], [61.02, 87.12, 0.52],[129.05, 87.32, 0.52]]])
r1 = 1.7115

Thanks to the great help of #jdehesa I was able to produce an alternative solution to the one given by #hpaulj. I am not sure if this solution is the most elegant one but it worked so far. Comments are appreciated.
def sort_triple(a, b, c):
pts = np.stack((a, b, c), axis=1)
xSorted = pts[np.arange(pts.shape[0])[:, None], np.argsort(pts[:, :, 0])]
orientation = np.cross(xSorted[:, 1] - xSorted[:, 0], xSorted[:, 2] -
xSorted[:, 0])[:, 2] >= 0
xSorted_flipped = np.stack((xSorted[:, 0], xSorted[:, 2], xSorted[:, 1]),
axis=1)
xSorted = np.where(orientation[:, np.newaxis, np.newaxis], xSorted,
xSorted_flipped)
return map(np.squeeze, np.split(xSorted, 3, axis=1))
def f(A, r1, r2=None, r3=None):
a, b, c = map(np.squeeze, np.split(A, 3, axis=1))
a, b, c = sort_triple(a, b, c)
if any(r is None for r in (r2, r3)):
r2, r3 = (r1, r1)
ex = (b - a) / (np.linalg.norm(b - a, axis=1))[:, np.newaxis]
i = inner1d(ex, (c - a))
ey = ((c - a - i[:, np.newaxis]*ex) /
(np.linalg.norm(c - a - i[:, np.newaxis]*ex, axis=1))[:, np.newaxis])
ez = np.cross(ex, ey)
d = np.linalg.norm(b - a, axis=1)
j = inner1d(ey, c - a)
x = (np.square(r1) - np.square(r2) + np.square(d)) / (2 * d)
y = ((np.square(r1) - np.square(r3) + np.square(i) + np.square(j)) / (2*j) -
i/j*x)
z_square = np.square(r1) - np.square(x) - np.square(y)
mask = z_square < 0
z_square[mask] *= 0
z = np.sqrt(z_square)
z[mask] = np.nan
intersection = (a + x[:, np.newaxis] * ex + y[:, np.newaxis] * ey +
z[:, np.newaxis] * ez)
return intersection
Probably the map parts in each function could be done better. Maybe also the excessive use of np.newaxis.

This works fine (after commenting out sort_triple):
res = [f(*row,r1) for row in A]
print(res)
producing:
[array([ 132.21182324, 23.80481826, 1.43482849]), None]
That looks like one row produced a (3,) array, the other had some sort of problem and produced None. I don't know if that None was due to removing the sort or not. But in any case, turning a mix of arrays and None back into an array would be a problem. If all items of res were matching arrays, we could stack them back into a 2d array.
There are ways of getting modest speed improvements (compared to this list comprehension). But with a complex function like this, the time spent in the function (called 2000 times) dominates the time spent by the iteration mechanism.
And since you are iterating on the 1st dimension, and passing the other 2 (as 3 arrays), this explicit loop is a lot easier to use than vectorize, frompyfunc or apply_along/over...
To get significant time savings you have to write f() to work with the 3d array directly.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Improve performance of function without parallelization - python

Related

Diagonal of a numpy matrix without compute the entire matrix

Simple Gradient Descent Implementation Error

How can these 2 loops be vectorized in Python?

Is there a faster way of repeating a chunk of code x times and taking an average?

Apply 3-argument function to 3D numpy array

Categories

Resources