This seems more of a direct question. I will generalize it a bit at the end.
I am trying to this function in numpy. I have been successful using nested for loops but I can't think of a numpy way to do it.
My way of implementation:
bs = 10 # batch_size
nb = 8 # number of bounding boxes
nc = 15 # number of classes
bbox = np.random.random(size=(bs, nb, 4)) # model output bounding boxes
p = np.random.random(size=(bs, nb, nc)) # model output probability
p = softmax(p, axis=-1)
s_rand = np.random.random(size=(nc, nc))
s = (s_rand + s_rand.T)/2 # similarity matrix
pp = np.random.random(size=(bs, nb, nc)) # proposed probability
pp = softmax(pp, axis=-1)
first_term = 0
for b in range(nb):
for b_1 in range(nb):
if b_1 == b:
continue
for l in range(nc):
for l_1 in range(nc):
first_term += (s[l, l_1] * (pp[:, b, l] - pp[:, b_1, l_1])**2)
second_term = 0
for b in range(nb):
for l in range(nc):
second_term += (np.linalg.norm(s[l, :], ord=1) * (pp[:, b, l] - p[:, b, l])**2)
second_term *= nb
epsilon = 0.5
output = ((1 - epsilon) * first_term) + (epsilon * second_term)
I have tried hard to remove the loops and use np.tile and np.repeat instead, in order to achieve the task. But can't think of a possible way.
I have tried searching google for finding exercises like such which can help me learn such conversions in numpy but wasn't successful.
P_hat.shape is (B,L), S.shape is (L,L), P.shape is (B,L).
array_before_sum = S[None,:,None,:]*(P_hat[:,:,None,None]- P_hat[None,None,:,:])**2
array_after_sum = array_before_sum.sum(axis=(1,3))
array_sum_again = (array_after_sum*(1-np.ones((B,B)))).sum()
first_term = (1-epsilon)*array_sum_again
second_term = epsilon*(B*np.abs(S).sum(axis=1)[None,:]*(P_hat - P)**2).sum()
I think you can do both with einsum
first_term = np.einsum('km, ijklm -> i', s, (pp[..., None, None] - pp[:, None, None, ...])**2 )
second_term = np.einsum('k, ijk -> i', np.linalg.norm(s, axis = 1), (pp - p)**2 )
Now there's a problem: that ijklm tensor in first_term is going to get huge if nb and nc get large. You should probably distribute it so that you get 3 smaller tensors:
first_term = np.einsum('km, ijk, ijk -> i', s, pp, pp) +\
np.einsum('km, ilm, ilm -> i', s, pp, pp) -\
2 * np.einsum('km, ijk, ilm -> i', s, pp, pp)
This takes advantage of the fact that (a-b)**2 = a**2 + b**2 - 2ab to allow you to break the problem into three parts that can each be done in one step with the dot product
Maximally optimized code: (removal of first two loops is inspired from L.Iridium's answer)
squared_diff = (pp[:, :, None, :, None] - pp[:, None, :, None, :]) ** 2
weighted_diff = s * squared_diff
b_eq_b_1_removed = b.sum(axis=(3,4)) * (1 - np.eye(nb))
first_term = b_eq_b_1_removed.sum(axis=(1,2))
normalized_s = np.linalg.norm(s, ord=1, axis=1)
squared_diff = (pp - p)**2
second_term = nb * (normalized_s * squared_diff).sum(axis=(1,2))
loss = ((1 - epsilon) * first_term) + (epsilon * second_term)
Timeit track:
512 µs ± 13 µs per loop
Timeit track of code posted in question:
62.5 ms ± 197 µs per loop
That's a huge improvement.
Related
I'm having trouble finding out a way to optimize a triple loop in Python. I will directly give the code for a better and simpler representation of what I have to compute :
Given two 2-D arrays named samples (M x N) and D(N x N) along with the output results (NxN):
for sigma in range(M):
for i in range(N):
for j in range(N):
results[i, j] += (1/N) * (samples[sigma, i]*samples[sigma, j]
- samples[sigma, i]*D[j, i]
- samples[sigma, j]*D[i, j])
return results
It does the job but is not effective at all in python. I tried to unloop the for i.. for j.. loop but I cannot compute it correctly with the sigma in the way.
Does someone have an idea on how to optimize those few lines ? Any suggestions are welcomed such as numpy, numexpr, etc...
One way I found to improve your code (i.e reduce the number of loops) is by using np.meshgrid.
Here is the impovement I found. It took some fiddling but it gives the same output as your triple loop code. I kept the same code structure so you can see what parts correspond to what part. I hope this is of use to you!
for sigma in range(M):
xx, yy = np.meshgrid(samples[sigma], samples[sigma])
results += (1/N) * (xx * yy
- yy * D.T
- xx * D)
print(results) # or return results
.
Edit: Here's a small script to verify that the results are as expected:
import numpy as np
M, N = 3, 4
rng = np.random.default_rng(seed=42)
samples = rng.random((M, N))
D = rng.random((N, N))
results = rng.random((N, N))
results_old = results.copy()
results_new = results.copy()
for sigma in range(M):
for i in range(N):
for j in range(N):
results_old[i, j] += (1/N) * (samples[sigma, i]*samples[sigma, j]
- samples[sigma, i]*D[j, i]
- samples[sigma, j]*D[i, j])
print('\n\nresults_old', results_old, sep='\n')
for sigma in range(M):
xx, yy = np.meshgrid(samples[sigma], samples[sigma])
results_new += (1/N) * (xx * yy
- yy * D.T
- xx * D)
print('\n\nresults_new', results_new, sep='\n')
Edit 2: Entirely getting rid of loops: it is a bit convoluted but it essentially does the same thing.
M, N = samples.shape
xxx, yyy = np.meshgrid(samples, samples)
split_x = np.array(np.hsplit(np.vsplit(xxx, M)[0], M))
split_y = np.array(np.vsplit(np.hsplit(yyy, M)[0], M))
results += np.sum(
(1/N) * (split_x*split_y
- split_y*D.T
- split_x*D), axis=0)
print(results) # or return results
In order to vectorize for loops, we can make use of broadcasting and then reducing along any axes that are not reflected by the output array. To do so, we can "assign" one axis to each of the for loop indices (as a convention). For your example this means that all input arrays can be reshaped to have dimension 3 (i.e. len(a.shape) == 3); the axes correspond then to sigma, i, j respectively. Then we can perform all operations with the broadcasted arrays and finally reduce (sum) the result along the sigma axis (since only i, j are reflected in the result):
# Ordering of axes: (sigma, i, j)
samples_i = samples[:, :, np.newaxis]
samples_j = samples[:, np.newaxis, :]
D_ij = D[np.newaxis, :, :]
D_ji = D.T[np.newaxis, :, :]
return (samples_i*samples_j - samples_i*D_ji - samples_j*D_ij).sum(axis=0) / N
The following is a complete example that compares the reference code (using for loops) with the above version; note that I've removed the 1/N part in order to keep computations in the domain of integers and thus make the array equality test exact.
import time
import numpy as np
def timeit(func):
def wrapper(*args):
t_start = time.process_time()
res = func(*args)
t_total = time.process_time() - t_start
print(f'{func.__name__}: {t_total:.3f} seconds')
return res
return wrapper
rng = np.random.default_rng()
M, N = 100, 200
samples = rng.integers(0, 100, size=(M, N))
D = rng.integers(0, 100, size=(N, N))
#timeit
def reference(samples, D):
results = np.zeros(shape=(N, N))
for sigma in range(M):
for i in range(N):
for j in range(N):
results[i, j] += (samples[sigma, i]*samples[sigma, j]
- samples[sigma, i]*D[j, i]
- samples[sigma, j]*D[i, j])
return results
#timeit
def new(samples, D):
# Ordering of axes: (sigma, i, j)
samples_i = samples[:, :, np.newaxis]
samples_j = samples[:, np.newaxis, :]
D_ij = D[np.newaxis, :, :]
D_ji = D.T[np.newaxis, :, :]
return (samples_i*samples_j - samples_i*D_ji - samples_j*D_ij).sum(axis=0)
assert np.array_equal(reference(samples, D), new(samples, D))
This gives me the following benchmark results:
reference: 6.465 seconds
new: 0.133 seconds
I found easier to break the problem into smaller steps and work on it, until we have a single equation.
Going from your original formulation:
for sigma in range(M):
for i in range(N):
for j in range(N):
results[i, j] += (1/N) * (samples[sigma, i]*samples[sigma, j]
- samples[sigma, i]*D[j, i]
- samples[sigma, j]*D[i, j])
The first thing is to eliminate the j index in the inner most loop. For this we start working with vectors instead of single elements:
for sigma in range(M):
for i in range(N):
results[i, :] += (1/N) * (samples[sigma, i]*samples[sigma, :] - samples[sigma, i]*D[:, i] - samples[sigma, :]*D[i, :])
Then, we eliminate the second loop, the one with i index. In this step we start to think in matrices. Therefore, each loop is the direct summation of "sigma matrices".
for sigma in range(M):
results += (1/N) * (samples[sigma, :, np.newaxis] * samples[sigma] - samples[sigma, :, np.newaxis] * D.T - samples[sigma, :] * D)
I strongly recommend to use this step as the solution since vectorizing even more would require too much memory for a big value of M. But, just for knowlegde...
think of the matrices as 3-dimensional objects. We do the calculations and sum at the end in index zero as:
results = (1/N) * (samples[:, :, np.newaxis] * samples[:,np.newaxis] - samples[:, :, np.newaxis] * D.T - samples[:, np.newaxis, :] * D).sum(axis=0)
I am currently to new to sympy and I am trying to reproduce the Mathematica example in the attached image in Python. My attempt is written below but it returns an empty list
import sympy
m , n, D_star, a, j = sympy.symbols('m , n, D_star, a, j')
s1 = sympy.Sum(a**(j-1),(j, 1, m-1))
rhs = 6 * sympy.sqrt((D_star * (1 + a)*(n - 1))/2)
expand_expr = sympy.solve(s1 - rhs, m)
temp = sympy.lambdify((a, n, D_star), expand_expr, 'numpy')
n = 100
a = 1.2
D_star = 2.0
ms = temp(1.2, 100, 2.0)
ms
# what I get is an empty list []
# expected answer using Mma FindRoot function is 17.0652
Adding .doit() to expand the sum seems to help. It gives Piecewise((m - 1, Eq(a, 1)), ((a - a**m)/(1 - a), True))/a for the sum in s1.
from sympy import symbols, Eq, Sum, sqrt, solve, lambdify
m, n, j, a, D_star = symbols('m n j a D_star')
s1 = Sum(a**(j - 1), (j, 1, m - 1)).doit()
rhs = 6 * sqrt((D_star * (1 + a) * (n - 1)) / 2)
expand_expr = solve(Eq(s1, rhs), m)
temp = lambdify((a, n, D_star), expand_expr, 'numpy')
n = 100
a = 1.2
D_star = 2.0
ms = temp(1.2, 100, 2.0)
This gives for expand_expr:
[Piecewise((log(a*(3*sqrt(2)*a*sqrt(D_star*(a*n - a + n - 1)) - 3*sqrt(2)*sqrt(D_star*(a*n - a + n - 1)) + 1))/log(a), Ne(a, 1)), (nan, True)),
Piecewise((3*sqrt(2)*a*sqrt(D_star*(a*n - a + n - 1)) + 1, Eq(a, 1)), (nan, True))]
which separates into a != 1 and a == 1.
The result of ms gives [array(17.06524172), array(nan)], again in a bit awkward way to separate a hypothetical a == 1.
I have a 1d ndarray A of shape (n,) and a 2d ndarray E of shape (n,m). I am trying to preform the following calculation (the circle-dot denotes element wise multiplication):
I have written it using with a for loop, but this block of code is called thousands of times, and I was hoping there was a way to accomplish this with broadcasting or numpy functions. The following is my for loop solution I'm trying to rewrite:
def fun(E, A):
X = E * A[:,np.newaxis]
R = np.zeros(E.shape[-1])
for ii in xrange(len(E)-1):
for jj in xrange(ii+1, len(E)):
R += X[ii] * X[jj]
return R
Any help would be appreciated.
Current approach, but still not working:
def fun1(E, A):
X = E * A[:,np.newaxis]
R = np.zeros(E.shape[-1])
for ii in xrange(len(E)-1):
for jj in xrange(ii+1, len(E)):
R += X[ii] * X[jj]
return R
def fun2(E, A):
n = E.shape[0]
m = E.shape[1]
A_ = np.triu(A[1:] * A[:-1].reshape(-1,1))
E_ = E[1:] * E[:-1]
R = np.sum((A_.reshape(n-1, 1, n-1) * E_.T).transpose(0,2,1).reshape(n-1*n-1,m), axis=0)
return R
A = np.arange(4,9)
E = np.arange(20).reshape((5,4))
print fun1(E,A)
print fun2(E,A)
Now, this should work:
def fun3(E,A):
n,m = E.shape
n_ = n - 1
X = E * A[:, np.newaxis]
a = (X[:-1].reshape(n_, 1, m) * X[1:])
b = np.tril(np.ones((m, n_, n_))).T
R = np.sum((a*b).reshape(n_*n_, m), axis=0)
return R
Last function was only based on the given formula. This is instead based on fun and tested with your added test case.
Hope this works for you!
Starting with:
a,b=np.ogrid[0:n+1:1,0:n+1:1]
B=np.exp(1j*(np.pi/3)*np.abs(a-b))
B[z,b] = np.exp(1j * (np.pi/3) * np.abs(z - b +x))
B[a,z] = np.exp(1j * (np.pi/3) * np.abs(a - z +x))
B[diag,diag]=1-1j/np.sqrt(3)
this produces an n*n grid that acts as a matrix.
n is just a number chosen to represent the indices, i.e. an a*b matrix where a and b both go up to n.
Where z is a constant I choose to replace a row and column with the B[z,b] and B[a,z] formulas. (Essentially the same formula but with a small number added to the np.abs(a-b))
The diagonal of the matrix is given by the bottom line:
B[diag,diag]=1-1j/np.sqrt(3)
where,
diag=np.arange(n+1)
I would like to repeat this code 50 times where the only thing that changes is x so I will end up with 50 versions of the B np.ogrid. x is a randomly generated number between -0.8 and 0.8 each time.
x=np.random.uniform(-0.8,0.8)
I want to generate 50 versions of B with random values of x each time and take a geometric average of the 50 versions of B using the definition:
def geo_mean(y):
y = np.asarray(y)
return np.prod(y ** (1.0 / y.shape[0]), axis=-1)
I have tried to set B as a function of some index and then use a for _ in range(): loop, this doesn't work. Aside from copy and pasting the block 50 times and denoting each one as B1, B2, B3 etc; I can't think of another way of working this out.
EDIT:
I'm now using part of a given solution in order to show clearly what I am looking for:
#A matrix with 50 random values between -0.8 and 0.8 to be used in the loop
X=np.random.uniform(-0.8,0.8, (50,1))
#constructing the base array before modification by random x values in position z
a,b = np.ogrid[0:n+1:1,0:n+1:1]
B = np.exp(1j * ( np.pi / 3) * np.abs( a - b ))
B[diag,diag] = 1 - 1j / np.sqrt(3)
#list to store all modified arrays
randomarrays = []
for i in range( 0,50 ):
#copy array and modify it
Bnew = np.copy( B )
Bnew[z, b] = np.exp( 1j * ( np.pi / 3 ) * np.abs(z - b + X[i]))
Bnew[a, z] = np.exp( 1j * ( np.pi / 3 ) * np.abs(a - z + X[i]))
randomarrays.append(Bnew)
Bstack = np.dstack(randomarrays)
#calculate the geometric mean value along the axis that was the row in 2D arrays
B0 = geo_mean(Bstack)
From this example, every iteration of i uses the same value of X, I can't seem to get a way to get each new loop of i to use the next value in the matrix X. I am unsure of the ++ action in python, I know it does not work in python, I just don't know how to use the python equivalent. I want a loop to use a value of X, then the next loop to use the next value and so on and so forth so I can dstack all the matrices at the end and find a geo_mean for each element in the stacked matrices.
One pedestrian way would be to use a list comprehension or generator expression:
>>> def f(n, z, x):
... diag = np.arange(n+1)
... a,b=np.ogrid[0:n+1:1,0:n+1:1]
... B=np.exp(1j*(np.pi/3)*np.abs(a-b))
... B[z,b] = np.exp(1j * (np.pi/3) * np.abs(z - b +x))
... B[a,z] = np.exp(1j * (np.pi/3) * np.abs(a - z +x))
... B[diag,diag]=1-1j/np.sqrt(3)
... return B
...
>>> X = np.random.uniform(-0.8, 0.8, (10,))
>>> np.prod((*map(np.power, map(f, 10*(4,), 10*(2,), X), 10 * (1/10,)),), axis=0)
But in your concrete example we can do much better than that;
using the identity exp(a) x exp(b) = exp(a + b) we can convert the geometric mean after exponentiation to an arithmetic mean before exponentition. A bit of care is required because of the multivaluedness of the complex n-th root which occurs in the geometric mean. In the code below we normalize the angles occurring to range -pi, pi so as to always hit the same branch as the n-th root.
Please also note that the geo_mean function you provide is definitely wrong. It fails the basic sanity check that taking the average of copies of the same thing should return the same thing. I've provided a better version. It is still not perfect, but I think there actually is no perfect solution, because of the nonuniqueness of the complex root.
Because of this I recommend taking the average before exponentiating. As long as your random spread is less than pi this allows a well-defined averaging procedure with an average that is actually close to the samples
import numpy as np
def f(n, z, X, do_it_pps_way=True):
X = np.asanyarray(X)
diag = np.arange(n+1)
a,b=np.ogrid[0:n+1:1,0:n+1:1]
B=np.exp(1j*(np.pi/3)*np.abs(a-b))
X = X.reshape(-1,1,1)
if do_it_pps_way:
zbx = np.mean(np.abs(z-b+X), axis=0)
azx = np.mean(np.abs(a-z+X), axis=0)
else:
zbx = np.mean((np.abs(z-b+X)+3) % 6 - 3, axis=0)
azx = np.mean((np.abs(a-z+X)+3) % 6 - 3, axis=0)
B[z,b] = np.exp(1j * (np.pi/3) * zbx)
B[a,z] = np.exp(1j * (np.pi/3) * azx)
B[diag,diag]=1-1j/np.sqrt(3)
return B
def geo_mean(y):
y = np.asarray(y)
dim = len(y.shape)
y = np.atleast_2d(y)
v = np.prod(y, axis=0) ** (1.0 / y.shape[0])
return v[0] if dim == 1 else v
def geo_mean_correct(y):
y = np.asarray(y)
return np.prod(y ** (1.0 / y.shape[0]), axis=0)
# demo that orig geo_mean is wrong
B = np.exp(1j * np.random.random((5, 5)))
# the mean of four times the same thing should be the same thing:
if not np.allclose(B, geo_mean([B, B, B, B])):
print('geo_mean failed')
if np.allclose(B, geo_mean_correct([B, B, B, B])):
print('but geo_mean_correct works')
n, z, m = 10, 3, 50
X = np.random.uniform(-0.8, 0.8, (m,))
B0 = f(n, z, X, do_it_pps_way=False)
B1 = np.prod((*map(np.power, map(f, m*(n,), m*(z,), X), m * (1/m,)),), axis=0)
B2 = geo_mean_correct([f(n, z, x) for x in X])
# This is the recommended way:
B_recommended = f(n, z, X, do_it_pps_way=True)
print()
print(np.allclose(B1, B0))
print(np.allclose(B2, B1))
I think you should rely more on numpy functionality, when approaching your problem. Not a numpy expert myself, so there is surely room for improvement:
from scipy.stats import gmean
n = 2
z = 1
a = np.arange(n + 1).reshape(1, n + 1)
#constructing the base array before modification by random x values in position z
B = np.exp(1j * (np.pi / 3) * np.abs(a - a.T))
B[a, a] = 1 - 1j / np.sqrt(3)
#list to store all modified arrays
random_arrays = []
for _ in range(50):
#generate random x value
x=np.random.uniform(-0.8, 0.8)
#copy array and modify it
B_new = np.copy(B)
B_new[z, a] = np.exp(1j * (np.pi / 3) * np.abs(z - a + x))
B_new[a, z] = np.exp(1j * (np.pi / 3) * np.abs(a - z + x))
random_arrays.append(B_new)
#store all B arrays as a 3D array
B_stack = np.stack(random_arrays)
#calculate the geometric mean value along the axis that was the row in 2D arrays
geom_mean_for_rows = gmean(B_stack, axis = 2)
It uses the geometric mean function from scipy.stats module to have a vectorised approach for this calculation.
I have a 3D numpy array A of shape (2133, 3, 3). Basically this is a list of 2133 lists with three 3D points. Furthermore I have a function which takes three 3D points and returns one 3D point, x = f(a, b, c), with a, b, c, x numpy arrays of length 3. Now I want to apply f to A, so that the output is an array of shape (2133, 3). So something like numpy.array([f(*A[0]),...,f(*A[2132])).
I tried numpy.apply_along_axis and numpy.vectorize without success.
To be more precise the function f I consider is given by:
def f(a, b, c, r1, r2=None, r3=None):
a = np.asarray(a)
b = np.asarray(b)
c = np.asarray(c)
if np.linalg.matrix_rank(np.matrix([a, b, c])) != 3:
# raise ValueError('The points are not collinear.')
return None
a, b, c, = sort_triple(a, b, c)
if any(r is None for r in (r2, r3)):
r2, r3 = (r1, r1)
ex = (b - a) / (np.linalg.norm(b - a))
i = np.dot(ex, c - a)
ey = (c - a - i*ex) / (np.linalg.norm(c - a - i*ex))
ez = np.cross(ex, ey)
d = np.linalg.norm(b - a)
j = np.dot(ey, c - a)
x = (pow(r1, 2) - pow(r2, 2) + pow(d, 2)) / (2 * d)
y = ((pow(r1, 2) - pow(r3, 2) + pow(i, 2) + pow(j, 2)) / (2*j)) - ((i/j)*x)
z_square = pow(r1, 2) - pow(x, 2) - pow(y, 2)
if z_square >= 0:
z = np.sqrt(z_square)
intersection = a + x * ex + y*ey + z*ez
return intersection
A = np.array([[[131.83, 25.2, 0.52], [131.51, 22.54, 0.52],[133.65, 23.65, 0.52]], [[13.02, 86.98, 0.52], [61.02, 87.12, 0.52],[129.05, 87.32, 0.52]]])
r1 = 1.7115
Thanks to the great help of #jdehesa I was able to produce an alternative solution to the one given by #hpaulj. I am not sure if this solution is the most elegant one but it worked so far. Comments are appreciated.
def sort_triple(a, b, c):
pts = np.stack((a, b, c), axis=1)
xSorted = pts[np.arange(pts.shape[0])[:, None], np.argsort(pts[:, :, 0])]
orientation = np.cross(xSorted[:, 1] - xSorted[:, 0], xSorted[:, 2] -
xSorted[:, 0])[:, 2] >= 0
xSorted_flipped = np.stack((xSorted[:, 0], xSorted[:, 2], xSorted[:, 1]),
axis=1)
xSorted = np.where(orientation[:, np.newaxis, np.newaxis], xSorted,
xSorted_flipped)
return map(np.squeeze, np.split(xSorted, 3, axis=1))
def f(A, r1, r2=None, r3=None):
a, b, c = map(np.squeeze, np.split(A, 3, axis=1))
a, b, c = sort_triple(a, b, c)
if any(r is None for r in (r2, r3)):
r2, r3 = (r1, r1)
ex = (b - a) / (np.linalg.norm(b - a, axis=1))[:, np.newaxis]
i = inner1d(ex, (c - a))
ey = ((c - a - i[:, np.newaxis]*ex) /
(np.linalg.norm(c - a - i[:, np.newaxis]*ex, axis=1))[:, np.newaxis])
ez = np.cross(ex, ey)
d = np.linalg.norm(b - a, axis=1)
j = inner1d(ey, c - a)
x = (np.square(r1) - np.square(r2) + np.square(d)) / (2 * d)
y = ((np.square(r1) - np.square(r3) + np.square(i) + np.square(j)) / (2*j) -
i/j*x)
z_square = np.square(r1) - np.square(x) - np.square(y)
mask = z_square < 0
z_square[mask] *= 0
z = np.sqrt(z_square)
z[mask] = np.nan
intersection = (a + x[:, np.newaxis] * ex + y[:, np.newaxis] * ey +
z[:, np.newaxis] * ez)
return intersection
Probably the map parts in each function could be done better. Maybe also the excessive use of np.newaxis.
This works fine (after commenting out sort_triple):
res = [f(*row,r1) for row in A]
print(res)
producing:
[array([ 132.21182324, 23.80481826, 1.43482849]), None]
That looks like one row produced a (3,) array, the other had some sort of problem and produced None. I don't know if that None was due to removing the sort or not. But in any case, turning a mix of arrays and None back into an array would be a problem. If all items of res were matching arrays, we could stack them back into a 2d array.
There are ways of getting modest speed improvements (compared to this list comprehension). But with a complex function like this, the time spent in the function (called 2000 times) dominates the time spent by the iteration mechanism.
And since you are iterating on the 1st dimension, and passing the other 2 (as 3 arrays), this explicit loop is a lot easier to use than vectorize, frompyfunc or apply_along/over...
To get significant time savings you have to write f() to work with the 3d array directly.