Numpy distance calculations of different shaped arrays - python

Not sure I titled this well, but basically I have a reference coordinate, in the format of (x,y,z), and a large list/array of coordinates also in that format. I need to get the euclidean distance between each, so with numpy and scipy in theory I should be able to do an operation such as:
import numpy, scipy.spatial.distance
a = numpy.array([1,1,1])
b = numpy.random.rand(20,3)
distances = scipy.spatial.distance.euclidean(b, a)
But instead of getting an array back I get an error: ValueError: Input vector should be 1-D.
Not sure how to resolve this error and get what I want without having to resort to loops and such, which sort of defeats the purpose of using Numpy.
Long term I want to use these distances to calculate truth masks for counting distance values in bins.
I'm not sure if I'm just using the function wrong or using the wrong function, I haven't been able to find anything in the documentation that would work better.

The documentation of scipy.spatial.distance.euclidean states, that only 1D-vectors are allowed as inputs. Thus you must loop over your arrays like:
distances = np.empty(b.shape[0])
for i in range(b.shape[0]):
distances[i] = scipy.spatial.distance.euclidean(a, b[i])
If you want to have a vectorized implementation, you need to write your own function. Perhaps using np.vectorize with a correct signature will also work, but this is in fact also just a short-hand for a for-loop and will thus have the same performance as a simple for-loop.
As stated in my comment to hannes wittingham's solution, I'll post a one-liner which is focussing on performance:
distances = ((b - a)**2).sum(axis=1)**0.5
Writing out all the calculations reduces the number of separate functions calls and thus assignments of the intermediate results to new arrays. Thus it is about 22% faster than using the solution of hannes wittingham for an array shape of b.shape == (20, 3) and about 5% faster for an array shape of
b.shape == (20000, 3):
a = np.array([1, 1, 1,])
b = np.random.rand(20, 3)
%timeit ((b - a)**2).sum(axis=1)**0.5
# 5.37 µs ± 140 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit euclidean_distances(a, b)
# 6.89 µs ± 345 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
b = np.random.rand(20000, 3)
%timeit ((b - a)**2).sum(axis=1)**0.5
# 588 µs ± 43.2 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit euclidean_distances(a, b)
# 616 µs ± 36.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
But your are losing the flexibility of being able to easily change to distance calculation routine. When using the scipy.spatial.distance module, you can change the calculation routing by simply calling another method.
To improve the calculation performance even further, you can use a jit (just in time) compiler like numba for your functions:
import numba as nb
#nb.njit
def euc(a, b):
return ((b - a)**2).sum(axis=1)**0.5
This reduces the time needed to do the calculations by about 70% for small arrays and by about 60% for large arrays. Unluckily the axis keyword for np.linalg.norm is not yet supported by numba.

It's not actually too hard to write your own function to do this - here's mine, which you're welcome to use.
If you are carrying out this operation over a large number of points and speed matters, I would guess this function will beat a for-loop based solution for speed by a long way - numpy is designed to be efficient when carrying out operations on a whole matrix.
import numpy
a = numpy.array([1,1,1])
b = numpy.random.rand(20,3)
def euclidean_distances(ref_point, co_ords_array):
diffs = co_ords_array - ref_point
sqrd_diffs = numpy.square(diffs)
sum_sqrd_diffs = numpy.sum(sqrd_diffs, axis = 1)
euc_dists = numpy.sqrt(sum_sqrd_diffs)
return euc_dists

This code will get the euclidean norm which should work in many cases, and is fairly quick, and one line. Other methods are more efficient or flexible depending on the needs, and I would favour some of the other solutions posted depending on the work being done.
import numpy
a = numpy.array([1,1,1])
b = numpy.random.rand(20,3)
distances = numpy.linalg.norm(a - b, axis = 1)

Note the extra set of [] in the definition of a
import numpy, scipy.spatial.distance
a = numpy.array([[1,1,1]])
b = numpy.random.rand(20,3)
distances = scipy.spatial.distance.cdist(b, a, metric='euclidean')

Related

Discrepancy in performance between log division and log subtraction using numba

I am trying to optimize some code that uses logs (the mathematical kind, not the timestamp record kind :)) and I found something strange that I haven't been able to find any answers for online. We have log(a/b) = log(a) - log(b), so I have written some code to compare the performance of the two methods.
import numpy as np
import numba as nb
# create some large random walk data
x = np.random.normal(0, 0.1, int(1e7))
x = abs(x.min()) + 100 + x # make all values >= 100
#nb.njit
def subtract_log(arr, tau):
"""arr is a numpy array, tau is an int"""
for t in range(tau, arr.shape[0]):
a = np.log(arr[t]) - np.log(arr[t - tau])
return None
#nb.njit
def divide_log(arr, tau):
"""arr is a numpy array, tau is an int"""
for t in range(tau, arr.shape[0]):
a = np.log(arr[t] / arr[t - tau])
return None
%timeit subtract_log(x, 100)
>>> 252 ns ± 0.319 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit divide_log(x, 100)
>>> 5.57 ms ± 48.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
So we see that subtracting logs is ~20,000 times faster than dividing by logs. I find this strange because I would have thought that in subtracting logs, the log series approximation would have to be calculated twice. But perhaps it's something to do with how numpy broadcasts operations?
The above example is trivial as we don't do anything with the result of the calculation. Below is a more realistic example where we return the result of the calculation.
#nb.njit
def subtract_log(arr, tau):
"""arr is a numpy array, tau is an int"""
out = np.empty(arr.shape[0] - tau)
for t in range(tau, arr.shape[0]):
f = t - tau
out[f] = np.log(arr[t]) - np.log(arr[f])
return out
#nb.njit
def divide_log(arr, tau):
"""arr is a numpy array, tau is an int"""
out = np.empty(arr.shape[0] - tau)
for t in range(tau, arr.shape[0]):
f = t - tau
out[f] = np.log(arr[t] / arr[f])
return out
out1 = subtract_log(x, 100)
out2 = divide_log(x, 100)
np.testing.assert_allclose(out1, out2, atol=1e-8) # True
%timeit subtract_log(x, 100)
>>> 129 ms ± 783 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit divide_log(x, 100)
>>> 93.4 ms ± 257 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Now we see the times are the same order of magnitude, but subtracting logs is some 40% slower than dividing.
Can anyone explain these discrepancies?
Why is subtracting logs so much faster than dividing logs for the trivial case?
Why is subtracting logs 40% slower than dividing logs when we store the value in an array? I know there is significant setup cost in initializing an array np.empty() - initializing an array in subtract_log() in the trivial case, but without storing values in it brings the time up from 252ns to 311us.
Don't measure "useless" things, a compiler may optimize it completely away
If you turn of division by zero check (error_model="numpy"), both functions take about 280ns. Not because of fast calculation, but because they are actually doing nothing.
Optimizing away useless calculations is expected, but sometimes LLVM can't detect all of it.
In the second case you are comparing the runtime of 2 logarithms, to 1 logarithm and one division. (the substractions/additions as well as multiplications are a lot faster). There can be differences in calculation time, depending on the log implementation and the processor. But also have a look at the results, they are not exactly the same.
At least for a floa64 division (FDIV) you can have a look at the instruction tables
from Agner Fog.

Speeding up Numba distance calculation

I've been recently trying to compute distances to top 2 nearest neighbors in Python Numba as follows
#jit(nopython=True)
def _latent_dim_kernel(data, pointers, indices, nrange, sampling_percentage = 1):
pdists_t2 = np.zeros((nrange, 2))
for a in range(nrange):
rct = 0
for b in range(nrange):
if np.random.random() > 1- sampling_percentage:
if a == b:
continue
r1 = _get_sparse_row(a, data, pointers, indices)
r2 = _get_sparse_row(b, data, pointers, indices)
dist = np.linalg.norm(r2 - r1)
if rct > 1:
if pdists_t2[a,0] > dist:
pdists_t2[a,0] = dist
elif pdists_t2[a,1] > dist:
pdists_t2[a,1] = dist
else:
pdists_t2[a,rct] = dist
rct += 1
return pdists_t2
The data, pointers and indices are x.data, x.indptr, x.indices of a CSR matrix (scipy).
This works fine, however, is substantially slower than doing
squareform(pdist(matrix)).sort(axis=1)[:,1:3]
How can I speed this further without additional memory overhead?
Thanks!
Make use of pairwise distances from sklearn
Pairwise distances of sparse matrices are supported (no dense temporary array needed)
This algorithm uses a algebraic reformulation like in this answer
It can be a lot faster on high dimensional problems like yours (20k) since most of the calculation is done within a highly optimized matrix-matrix product.
Check if this method is precise enough, it is less numerically stable
than a "naive" approach pdist uses
Example
import numpy as np
from scipy import sparse
from sklearn import metrics
from scipy.spatial import distance
matrix=sparse.random(1_000, 20_000, density=0.05, format='csr', dtype=np.float64)
%%timeit
dist_2=distance.squareform(distance.pdist(matrix.todense()))
dist_2.sort(axis=1)
dist_2=dist_2[:,1:3]
#10.1 s ± 23.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
dist=metrics.pairwise.euclidean_distances(matrix,squared=True)
dist.sort(axis=1)
dist=np.sqrt(dist[:,1:3])
#401 ms ± 13.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Numerical issues for alternative way to compute (squared) euclidean distance

I want to compute the squared euclidean in a fast way, as described here:
What is the fastest way to compute an RBF kernel in python?
Note1: I am only interested in the distance, not the RBF kernel.
Note2: I am neglecting numexpr here and only use numpy directly.
In short, I compute:
|| x - y ||^2 = ||x||^2 + ||y||^2 - 2. * (x # y.T)
I am able to compute the distance matrix faster by a factor of ~10 compared to scipy.pdist with this. However, I observe numerical issues, which get worse if I take the square root to get the euclidean distance. I have values that are in the order of 1E-8 - 1E-7, which should be exactly zero (i.e. duplicated points or distance to self point).
Question:
Are there ways or ideas to overcome these numerical issues (perferable without sacrificing too much of the evaluation speed)? Or are the numerical issues the reason why this path is not taken (e.g. by scipy.pdist) in the first place?
Example:
This is a small code example to show the numerical issues (not the speed ups, please look at the answers of the linked SO thread above).
import numpy as np
M = np.random.rand(1000, 10)
M_norm = np.sum(M**2, axis=1)
res = M_norm[:, np.newaxis] + M_norm[np.newaxis, :] - 2. * M # M.T
unique = np.unique(np.diag(res)) # analytically all diag values are exactly zero
sqrt_unique = np.sqrt(unique)
print(unique)
print(sqrt_unique)
Example output:
[-2.66453526e-15 -1.77635684e-15 -8.88178420e-16 -4.44089210e-16
0.00000000e+00 4.44089210e-16 8.88178420e-16 1.77635684e-15
3.55271368e-15]
[ nan nan nan nan
0.00000000e+00 2.10734243e-08 2.98023224e-08 4.21468485e-08
5.96046448e-08]
As you can see some values are also negative (which results in nan after taking the sqrt). Of course these are easy to catch -- but the small positives have a large error for the euclidean case (e.g. abs_error=5.96046448e-08)
as per my comment, using abs is probably your best option for cleaning up the numerical stability inherent in this algorithm. as you're concerned about performance you should probably be using the mutating assignment operators as they cause less garbage to be created and hence can be much faster. also, when running this with many features (e.g. 10k) I see pdist being slower than this implementation.
putting the above together we get:
import numpy as np
def edist0(M):
"calculate pairwise euclidean distance"
M_norm = np.sum(M**2, axis=1)
res = M_norm[:, np.newaxis] + M_norm[np.newaxis, :] - 2. * M # M.T
return np.sqrt(np.abs(res))
def edist1(M):
"optimised calculation of pairwise euclidean distance"
M_norm = np.einsum('ij,ij->i', M, M)
res = M # M.T
res *= -2.
res += M_norm[:, np.newaxis]
res += M_norm[np.newaxis, :]
return np.sqrt(np.abs(res, out=res), out=res)
timing this in IPython with:
from scipy.spatial import distance
M = np.random.rand(1000, 10000)
%timeit distance.squareform(distance.pdist(M))
%timeit edist0(M)
%timeit edist1(M)
I get:
2.82 s ± 60.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
296 ms ± 6.07 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
153 ms ± 1.58 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
and no errors/warnings from sqrt
the linked question also points to scikit-learn as having good distance kernel good implementations, the euclidean one being pairwise_distances which benchmarks as:
from sklearn.metrics import pairwise_distances
%timeit pairwise_distances(M)
170 ms ± 5.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
which might be nice to use if you're already using that package

A faster numpy.polynomial?

I have a very simple problem: in my python toolbox, I have to compute the values of polynomials (usually degree 3 or 2, seldom others, always integer degree) from a large vector (size >> 10^6). Storing the result in a buffer is not an option because I have several of these vectors so I would quickly run out of memory, and I usually have to compute it only once in any case. The performance of numpy.polyval is actually quite good, but still this is my bottleneck. Can I somehow make the evaluation of the polynomial faster?
Addendum
I think that the pure-numpy solution of Joe Kington is good for me, in particular because it avoids potential issues at installation time of other libraries or cython. For those who asked, the numbers in the vector are large (order 10^4), so I don't think that the suggested approximations would work.
You actually can speed it up slightly by doing the operations in-place (or using numexpr or numba which will automatically do what I'm doing manually below).
numpy.polyval is a very short function. Leaving out a few type checks, etc, it amounts to:
def polyval(p, x):
y = np.zeros_like(x)
for i in range(len(p)):
y = x * y + p[i]
return y
The downside to this approach is that a temporary array will be created inside the loop as opposed to doing the operation in-place.
What I'm about to do is a micro-optimization and is only worthwhile for very large x inputs. Furthermore, we'll have to assume floating-point output instead of letting the upcasting rules determine the output's dtype. However, it will speed this up slighly and make it use less memory:
def faster_polyval(p, x):
y = np.zeros(x.shape, dtype=float)
for i, v in enumerate(p):
y *= x
y += v
return y
As an example, let's say we have the following input:
# Third order polynomial
p = [4.5, 9.8, -9.2, 1.2]
# One-million element array
x = np.linspace(-10, 10, 1e6)
The results are identical:
In [3]: np_result = np.polyval(p, x)
In [4]: new_result = faster_polyval(p, x)
In [5]: np.allclose(np_result, new_result)
Out[5]: True
And we get a modest 2-3x speedup (which is mostly independent of array size, as it relates to memory allocation, not number of operations):
In [6]: %timeit np.polyval(p, x)
10 loops, best of 3: 20.7 ms per loop
In [7]: %timeit faster_polyval(p, x)
100 loops, best of 3: 7.46 ms per loop
For really huge inputs, the memory usage difference will matter more than the speed differences. The "bare" numpy version will use ~2x more memory at peak usage than the faster_polyval version.
I ended up here, when I wanted to know whether np.polyval or np.polynomial.polynomial.polyval is faster.
And it is interesting to see that simple implementations are faster as #Joe Kington shows. (I hoped for some optimisation by numpy.)
So here is my comparison with np.polynomial.polynomial.polyval and a slightly faster version.
def fastest_polyval(x, a):
y = a[-1]
for ai in a[-2::-1]:
y *= x
y += ai
return y
It avoids the initial zero array and needs one loop less.
y_np = np.polyval(p, x)
y_faster = faster_polyval(p, x)
prev = 1 * p[::-1] # reverse coefficients
y_np2 = np.polynomial.polynomial.polyval(x, prev)
y_fastest = fastest_polyval(x, prev)
np.allclose(y_np, y_faster), np.allclose(y_np, y_np2), np.allclose(y_np, y_fastest)
# (True, True, True)
%timeit np.polyval(p, x)
%timeit faster_polyval(p, x)
%timeit np.polynomial.polynomial.polyval(x, prev)
%timeit fastest_polyval(x, prev)
# 6.51 ms ± 17.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# 3.69 ms ± 27.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# 6.28 ms ± 43.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# 2.65 ms ± 35.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Is there a way to efficiently invert an array of matrices with numpy?

Normally I would invert an array of 3x3 matrices in a for loop like in the example below. Unfortunately for loops are slow. Is there a faster, more efficient way to do this?
import numpy as np
A = np.random.rand(3,3,100)
Ainv = np.zeros_like(A)
for i in range(100):
Ainv[:,:,i] = np.linalg.inv(A[:,:,i])
It turns out that you're getting burned two levels down in the numpy.linalg code. If you look at numpy.linalg.inv, you can see it's just a call to numpy.linalg.solve(A, inv(A.shape[0]). This has the effect of recreating the identity matrix in each iteration of your for loop. Since all your arrays are the same size, that's a waste of time. Skipping this step by pre-allocating the identity matrix shaves ~20% off the time (fast_inverse). My testing suggests that pre-allocating the array or allocating it from a list of results doesn't make much difference.
Look one level deeper and you find the call to the lapack routine, but it's wrapped in several sanity checks. If you strip all these out and just call lapack in your for loop (since you already know the dimensions of your matrix and maybe know that it's real, not complex), things run MUCH faster (Note that I've made my array larger):
import numpy as np
A = np.random.rand(1000,3,3)
def slow_inverse(A):
Ainv = np.zeros_like(A)
for i in range(A.shape[0]):
Ainv[i] = np.linalg.inv(A[i])
return Ainv
def fast_inverse(A):
identity = np.identity(A.shape[2], dtype=A.dtype)
Ainv = np.zeros_like(A)
for i in range(A.shape[0]):
Ainv[i] = np.linalg.solve(A[i], identity)
return Ainv
def fast_inverse2(A):
identity = np.identity(A.shape[2], dtype=A.dtype)
return array([np.linalg.solve(x, identity) for x in A])
from numpy.linalg import lapack_lite
lapack_routine = lapack_lite.dgesv
# Looking one step deeper, we see that solve performs many sanity checks.
# Stripping these, we have:
def faster_inverse(A):
b = np.identity(A.shape[2], dtype=A.dtype)
n_eq = A.shape[1]
n_rhs = A.shape[2]
pivots = zeros(n_eq, np.intc)
identity = np.eye(n_eq)
def lapack_inverse(a):
b = np.copy(identity)
pivots = zeros(n_eq, np.intc)
results = lapack_lite.dgesv(n_eq, n_rhs, a, n_eq, pivots, b, n_eq, 0)
if results['info'] > 0:
raise LinAlgError('Singular matrix')
return b
return array([lapack_inverse(a) for a in A])
%timeit -n 20 aI11 = slow_inverse(A)
%timeit -n 20 aI12 = fast_inverse(A)
%timeit -n 20 aI13 = fast_inverse2(A)
%timeit -n 20 aI14 = faster_inverse(A)
The results are impressive:
20 loops, best of 3: 45.1 ms per loop
20 loops, best of 3: 38.1 ms per loop
20 loops, best of 3: 38.9 ms per loop
20 loops, best of 3: 13.8 ms per loop
EDIT: I didn't look closely enough at what gets returned in solve. It turns out that the 'b' matrix is overwritten and contains the result in the end. This code now gives consistent results.
A few things have changed since this question was asked and answered, and now numpy.linalg.inv supports multidimensional arrays, handling them as stacks of matrices with matrix indices being last (in other words, arrays of shape (...,M,N,N)). This seems to have been introduced in numpy 1.8.0. Unsurprisingly this is by far the best option in terms of performance:
import numpy as np
A = np.random.rand(3,3,1000)
def slow_inverse(A):
"""Looping solution for comparison"""
Ainv = np.zeros_like(A)
for i in range(A.shape[-1]):
Ainv[...,i] = np.linalg.inv(A[...,i])
return Ainv
def direct_inverse(A):
"""Compute the inverse of matrices in an array of shape (N,N,M)"""
return np.linalg.inv(A.transpose(2,0,1)).transpose(1,2,0)
Note the two transposes in the latter function: the input of shape (N,N,M) has to be transposed to shape (M,N,N) for np.linalg.inv to work, then the result has to be permuted back to shape (M,N,N).
A check and timing results using IPython, on python 3.6 and numpy 1.14.0:
In [5]: np.allclose(slow_inverse(A),direct_inverse(A))
Out[5]: True
In [6]: %timeit slow_inverse(A)
19 ms ± 138 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [7]: %timeit direct_inverse(A)
1.3 ms ± 6.39 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Numpy-Blas calls are not always the fastest possibility
On problems where you have to calculate lots of inverses, eigenvalues, dot-products of small 3x3 matrices or similar cases, numpy-MKL which I use can often be outperformed by quite a margin.
This external Blas routines are usually made for problems with larger matrices, for smaller ones you can write out a standard algorithm or take a look at eg. Intel IPP.
Please keep also in mind that Numpy uses C-ordered arrays by default (last dimension changes fastest).
For this example I took the code from Matrix inversion (3,3) python - hard coded vs numpy.linalg.inv and modified it a bit.
import numpy as np
import numba as nb
import time
#nb.njit(fastmath=True)
def inversion(m):
minv=np.empty(m.shape,dtype=m.dtype)
for i in range(m.shape[0]):
determinant_inv = 1./(m[i,0]*m[i,4]*m[i,8] + m[i,3]*m[i,7]*m[i,2] + m[i,6]*m[i,1]*m[i,5] - m[i,0]*m[i,5]*m[i,7] - m[i,2]*m[i,4]*m[i,6] - m[i,1]*m[i,3]*m[i,8])
minv[i,0]=(m[i,4]*m[i,8]-m[i,5]*m[i,7])*determinant_inv
minv[i,1]=(m[i,2]*m[i,7]-m[i,1]*m[i,8])*determinant_inv
minv[i,2]=(m[i,1]*m[i,5]-m[i,2]*m[i,4])*determinant_inv
minv[i,3]=(m[i,5]*m[i,6]-m[i,3]*m[i,8])*determinant_inv
minv[i,4]=(m[i,0]*m[i,8]-m[i,2]*m[i,6])*determinant_inv
minv[i,5]=(m[i,2]*m[i,3]-m[i,0]*m[i,5])*determinant_inv
minv[i,6]=(m[i,3]*m[i,7]-m[i,4]*m[i,6])*determinant_inv
minv[i,7]=(m[i,1]*m[i,6]-m[i,0]*m[i,7])*determinant_inv
minv[i,8]=(m[i,0]*m[i,4]-m[i,1]*m[i,3])*determinant_inv
return minv
#I was to lazy to modify the code from the link above more thoroughly
def inversion_3x3(m):
m_TMP=m.reshape(m.shape[0],9)
minv=inversion(m_TMP)
return minv.reshape(minv.shape[0],3,3)
#Testing
A = np.random.rand(1000000,3,3)
#Warmup to not measure compilation overhead on the first call
#You may also use #nb.njit(fastmath=True,cache=True) but this has also about 0.2s
#overhead on fist call
Ainv = inversion_3x3(A)
t1=time.time()
Ainv = inversion_3x3(A)
print(time.time()-t1)
t1=time.time()
Ainv2 = np.linalg.inv(A)
print(time.time()-t1)
print(np.allclose(Ainv2,Ainv))
Performance
np.linalg.inv: 0.36 s
inversion_3x3: 0.031 s
For loops are indeed not necessarily much slower than the alternatives and also in this case, it will not help you much. But here is a suggestion:
import numpy as np
A = np.random.rand(100,3,3) #this is to makes it
#possible to index
#the matrices as A[i]
Ainv = np.array(map(np.linalg.inv, A))
Timing this solution vs. your solution yields a small but noticeable difference:
# The for loop:
100 loops, best of 3: 6.38 ms per loop
# The map:
100 loops, best of 3: 5.81 ms per loop
I tried to use the numpy routine 'vectorize' with the hope of creating an even cleaner solution, but I'll have to take a second look into that. The change of ordering in the array A is probably the most significant change, since it utilises the fact that numpy arrays are ordered column-wise and therefor a linear readout of the data is ever so slightly faster this way.

Categories

Resources