Related
What is the most efficient way to map a function over a numpy array? I am currently doing:
import numpy as np
x = np.array([1, 2, 3, 4, 5])
# Obtain array of square of each element in x
squarer = lambda t: t ** 2
squares = np.array([squarer(xi) for xi in x])
However, this is probably very inefficient, since I am using a list comprehension to construct the new array as a Python list before converting it back to a numpy array. Can we do better?
I've tested all suggested methods plus np.array(list(map(f, x))) with perfplot (a small project of mine).
Message #1: If you can use numpy's native functions, do that.
If the function you're trying to vectorize already is vectorized (like the x**2 example in the original post), using that is much faster than anything else (note the log scale):
If you actually need vectorization, it doesn't really matter much which variant you use.
Code to reproduce the plots:
import numpy as np
import perfplot
import math
def f(x):
# return math.sqrt(x)
return np.sqrt(x)
vf = np.vectorize(f)
def array_for(x):
return np.array([f(xi) for xi in x])
def array_map(x):
return np.array(list(map(f, x)))
def fromiter(x):
return np.fromiter((f(xi) for xi in x), x.dtype)
def vectorize(x):
return np.vectorize(f)(x)
def vectorize_without_init(x):
return vf(x)
b = perfplot.bench(
setup=np.random.rand,
n_range=[2 ** k for k in range(20)],
kernels=[
f,
array_for,
array_map,
fromiter,
vectorize,
vectorize_without_init,
],
xlabel="len(x)",
)
b.save("out1.svg")
b.show()
How about using numpy.vectorize.
import numpy as np
x = np.array([1, 2, 3, 4, 5])
squarer = lambda t: t ** 2
vfunc = np.vectorize(squarer)
vfunc(x)
# Output : array([ 1, 4, 9, 16, 25])
TL;DR
As noted by #user2357112, a "direct" method of applying the function is always the fastest and simplest way to map a function over Numpy arrays:
import numpy as np
x = np.array([1, 2, 3, 4, 5])
f = lambda x: x ** 2
squares = f(x)
Generally avoid np.vectorize, as it does not perform well, and has (or had) a number of issues. If you are handling other data types, you may want to investigate the other methods shown below.
Comparison of methods
Here are some simple tests to compare three methods to map a function, this example using with Python 3.6 and NumPy 1.15.4. First, the set-up functions for testing:
import timeit
import numpy as np
f = lambda x: x ** 2
vf = np.vectorize(f)
def test_array(x, n):
t = timeit.timeit(
'np.array([f(xi) for xi in x])',
'from __main__ import np, x, f', number=n)
print('array: {0:.3f}'.format(t))
def test_fromiter(x, n):
t = timeit.timeit(
'np.fromiter((f(xi) for xi in x), x.dtype, count=len(x))',
'from __main__ import np, x, f', number=n)
print('fromiter: {0:.3f}'.format(t))
def test_direct(x, n):
t = timeit.timeit(
'f(x)',
'from __main__ import x, f', number=n)
print('direct: {0:.3f}'.format(t))
def test_vectorized(x, n):
t = timeit.timeit(
'vf(x)',
'from __main__ import x, vf', number=n)
print('vectorized: {0:.3f}'.format(t))
Testing with five elements (sorted from fastest to slowest):
x = np.array([1, 2, 3, 4, 5])
n = 100000
test_direct(x, n) # 0.265
test_fromiter(x, n) # 0.479
test_array(x, n) # 0.865
test_vectorized(x, n) # 2.906
With 100s of elements:
x = np.arange(100)
n = 10000
test_direct(x, n) # 0.030
test_array(x, n) # 0.501
test_vectorized(x, n) # 0.670
test_fromiter(x, n) # 0.883
And with 1000s of array elements or more:
x = np.arange(1000)
n = 1000
test_direct(x, n) # 0.007
test_fromiter(x, n) # 0.479
test_array(x, n) # 0.516
test_vectorized(x, n) # 0.945
Different versions of Python/NumPy and compiler optimization will have different results, so do a similar test for your environment.
There are numexpr, numba and cython around, the goal of this answer is to take these possibilities into consideration.
But first let's state the obvious: no matter how you map a Python-function onto a numpy-array, it stays a Python function, that means for every evaluation:
numpy-array element must be converted to a Python-object (e.g. a Float).
all calculations are done with Python-objects, which means to have the overhead of interpreter, dynamic dispatch and immutable objects.
So which machinery is used to actually loop through the array doesn't play a big role because of the overhead mentioned above - it stays much slower than using numpy's built-in functionality.
Let's take a look at the following example:
# numpy-functionality
def f(x):
return x+2*x*x+4*x*x*x
# python-function as ufunc
import numpy as np
vf=np.vectorize(f)
vf.__name__="vf"
np.vectorize is picked as a representative of the pure-python function class of approaches. Using perfplot (see code in the appendix of this answer) we get the following running times:
We can see, that the numpy-approach is 10x-100x faster than the pure python version. The decrease of performance for bigger array-sizes is probably because data no longer fits the cache.
It is worth also mentioning, that vectorize also uses a lot of memory, so often memory-usage is the bottle-neck (see related SO-question). Also note, that numpy's documentation on np.vectorize states that it is "provided primarily for convenience, not for performance".
Other tools should be used, when performance is desired, beside writing a C-extension from the scratch, there are following possibilities:
One often hears, that the numpy-performance is as good as it gets, because it is pure C under the hood. Yet there is a lot room for improvement!
The vectorized numpy-version uses a lot of additional memory and memory-accesses. Numexp-library tries to tile the numpy-arrays and thus get a better cache utilization:
# less cache misses than numpy-functionality
import numexpr as ne
def ne_f(x):
return ne.evaluate("x+2*x*x+4*x*x*x")
Leads to the following comparison:
I cannot explain everything in the plot above: we can see bigger overhead for numexpr-library at the beginning, but because it utilize the cache better it is about 10 time faster for bigger arrays!
Another approach is to jit-compile the function and thus getting a real pure-C UFunc. This is numba's approach:
# runtime generated C-function as ufunc
import numba as nb
#nb.vectorize(target="cpu")
def nb_vf(x):
return x+2*x*x+4*x*x*x
It is 10 times faster than the original numpy-approach:
However, the task is embarrassingly parallelizable, thus we also could use prange in order to calculate the loop in parallel:
#nb.njit(parallel=True)
def nb_par_jitf(x):
y=np.empty(x.shape)
for i in nb.prange(len(x)):
y[i]=x[i]+2*x[i]*x[i]+4*x[i]*x[i]*x[i]
return y
As expected, the parallel function is slower for smaller inputs, but faster (almost factor 2) for larger sizes:
While numba specializes on optimizing operations with numpy-arrays, Cython is a more general tool. It is more complicated to extract the same performance as with numba - often it is down to llvm (numba) vs local compiler (gcc/MSVC):
%%cython -c=/openmp -a
import numpy as np
import cython
#single core:
#cython.boundscheck(False)
#cython.wraparound(False)
def cy_f(double[::1] x):
y_out=np.empty(len(x))
cdef Py_ssize_t i
cdef double[::1] y=y_out
for i in range(len(x)):
y[i] = x[i]+2*x[i]*x[i]+4*x[i]*x[i]*x[i]
return y_out
#parallel:
from cython.parallel import prange
#cython.boundscheck(False)
#cython.wraparound(False)
def cy_par_f(double[::1] x):
y_out=np.empty(len(x))
cdef double[::1] y=y_out
cdef Py_ssize_t i
cdef Py_ssize_t n = len(x)
for i in prange(n, nogil=True):
y[i] = x[i]+2*x[i]*x[i]+4*x[i]*x[i]*x[i]
return y_out
Cython results in somewhat slower functions:
Conclusion
Obviously, testing only for one function doesn't prove anything. Also one should keep in mind, that for the choosen function-example, the bandwidth of the memory was the bottle neck for sizes larger than 10^5 elements - thus we had the same performance for numba, numexpr and cython in this region.
In the end, the ultimative answer depends on the type of function, hardware, Python-distribution and other factors. For example Anaconda-distribution uses Intel's VML for numpy's functions and thus outperforms numba (unless it uses SVML, see this SO-post) easily for transcendental functions like exp, sin, cos and similar - see e.g. the following SO-post.
Yet from this investigation and from my experience so far, I would state, that numba seems to be the easiest tool with best performance as long as no transcendental functions are involved.
Plotting running times with perfplot-package:
import perfplot
perfplot.show(
setup=lambda n: np.random.rand(n),
n_range=[2**k for k in range(0,24)],
kernels=[
f,
vf,
ne_f,
nb_vf, nb_par_jitf,
cy_f, cy_par_f,
],
logx=True,
logy=True,
xlabel='len(x)'
)
squares = squarer(x)
Arithmetic operations on arrays are automatically applied elementwise, with efficient C-level loops that avoid all the interpreter overhead that would apply to a Python-level loop or comprehension.
Most of the functions you'd want to apply to a NumPy array elementwise will just work, though some may need changes. For example, if doesn't work elementwise. You'd want to convert those to use constructs like numpy.where:
def using_if(x):
if x < 5:
return x
else:
return x**2
becomes
def using_where(x):
return numpy.where(x < 5, x, x**2)
It seems that no one has mentioned a built-in factory method of producing ufunc in numpy package: np.frompyfunc, which I have tested against np.vectorize, and have outperformed it by about 20~30%. Of course it will not perform as well prescribed C code or even numba(which I have not tested), but it can a better alternative than np.vectorize
f = lambda x, y: x * y
f_arr = np.frompyfunc(f, 2, 1)
vf = np.vectorize(f)
arr = np.linspace(0, 1, 10000)
%timeit f_arr(arr, arr) # 307ms
%timeit vf(arr, arr) # 450ms
I have also tested larger samples, and the improvement is proportional. See the documentation also here
Edit: the original answer was misleading, np.sqrt was applied directly to the array, just with a small overhead.
In multidimensional cases where you want to apply a builtin function that operates on a 1d array, numpy.apply_along_axis is a good choice, also for more complex function compositions from numpy and scipy.
Previous misleading statement:
Adding the method:
def along_axis(x):
return np.apply_along_axis(f, 0, x)
to the perfplot code gives performance results close to np.sqrt.
I believe in newer version( I use 1.13) of numpy you can simply call the function by passing the numpy array to the fuction that you wrote for scalar type, it will automatically apply the function call to each element over the numpy array and return you another numpy array
>>> import numpy as np
>>> squarer = lambda t: t ** 2
>>> x = np.array([1, 2, 3, 4, 5])
>>> squarer(x)
array([ 1, 4, 9, 16, 25])
As mentioned in this post, just use generator expressions like so:
numpy.fromiter((<some_func>(x) for x in <something>),<dtype>,<size of something>)
All above answers compares well, but if you need to use custom function for mapping, and you have numpy.ndarray, and you need to retain the shape of array.
I have compare just two, but it will retain the shape of ndarray. I have used the array with 1 million entries for comparison. Here I use square function, which is also inbuilt in numpy and has great performance boost, since there as was need of something, you can use function of your choice.
import numpy, time
def timeit():
y = numpy.arange(1000000)
now = time.time()
numpy.array([x * x for x in y.reshape(-1)]).reshape(y.shape)
print(time.time() - now)
now = time.time()
numpy.fromiter((x * x for x in y.reshape(-1)), y.dtype).reshape(y.shape)
print(time.time() - now)
now = time.time()
numpy.square(y)
print(time.time() - now)
Output
>>> timeit()
1.162431240081787 # list comprehension and then building numpy array
1.0775556564331055 # from numpy.fromiter
0.002948284149169922 # using inbuilt function
here you can clearly see numpy.fromiter works great considering to simple approach, and if inbuilt function is available please use that.
Use numpy.fromfunction(function, shape, **kwargs)
See "https://docs.scipy.org/doc/numpy/reference/generated/numpy.fromfunction.html"
I have a numpy array of dimension (i, j) in which I would like to add up the first dimension to receive a array of shape (j,). Normally, I'd use NumPy's own sum
import numpy
a = numpy.random.rand(100, 77)
numpy.sum(a, axis=0)
but in my case it doesn't cut it: Some of the sums are very ill-conditioned, so the computed results only have a few correct digits.
math.fsum is fantastic at keeping the errors at bay, but it only applies to iterables of one dimension. numpy.vectorize doesn't do the job either.
How to efficiently apply math.fsum to an array of multiply dimensions?
This one works fast enough for me.
import numpy
import math
a = numpy.random.rand(100, 77)
a = numpy.swapaxes(a, 0, 1)
a = numpy.array([math.fsum(row) for row in a])
Hopefully it's the axis you are looking for (returns 77 sums).
Check out the signature keyword to vectorize.
_math_fsum_vec = numpy.vectorize(math.fsum, signature='(m)->()')
Unfortunately, it's slower than the for solution:
Code to reproduce the plot:
import math
import numpy
import perfplot
_math_fsum_vec = numpy.vectorize(math.fsum, signature='(m)->()')
def fsum_vectorize(a):
return _math_fsum_vec(a.T).T
def fsum_for(a):
return numpy.array([math.fsum(row) for row in a.T])
perfplot.save(
'fsum.png',
setup=lambda n: numpy.random.rand(n, 100),
kernels=[fsum_vectorize, fsum_for],
n_range=[2**k for k in range(12)],
logx=True,
logy=True,
)
I have a numpy array of different row size
a = np.array([[1,2,3,4,5],[1,2,3],[1]])
and I would like to become this one into a dense (fixed n x m size, no variable rows) matrix. Until now I tried with something like this
size = (len(a),5)
result = np.zeros(size)
result[[0],[len(a[0])]]=a[0]
But I receive an error telling me
shape mismatch: value array of shape (5,) could not be broadcast to
indexing result of shape (1,)
I also tried to do padding wit np.pad, but according to the documentation of numpy.pad it seems I need to specify in the pad_width, the previous size of the rows (which is variable and produced me errors trying with -1,0, and biggest row size).
I know I can do it padding padding lists per row as it's shown here, but I need to do that with a much bigger array of data.
If someone can help me with the answer to this question, I would be glad to know of it.
There's really no way to pad a jagged array such that it would loose its jaggedness, without having to iterate over the rows of the array. You'll have to iterate over the array twice even: once to find out the maximum length you need to pad to, another to actually do the padding.
The code proposal you've linked to will get the job done, but it's not very efficient, because it adds zeroes in a python for-loop that iterates over the elements of the rows, whereas that appending could have been precalculated, thereby pushing more of that code to C.
The code below precomputes an array of the required minimal dimensions, filled with zeroes and then simply adds the row from the jagged array M in place, which is far more efficient.
import random
import numpy as np
M = [[random.random() for n in range(random.randint(0,m))] for m in range(10000)] # play-data
def pad_to_dense(M):
"""Appends the minimal required amount of zeroes at the end of each
array in the jagged array `M`, such that `M` looses its jagedness."""
maxlen = max(len(r) for r in M)
Z = np.zeros((len(M), maxlen))
for enu, row in enumerate(M):
Z[enu, :len(row)] += row
return Z
To give you some idea for speed:
from timeit import timeit
n = [10, 100, 1000, 10000]
s = [timeit(stmt='Z = pad_to_dense(M)', setup='from __main__ import pad_to_dense; import numpy as np; from random import random, randint; M = [[random() for n in range(randint(0,m))] for m in range({})]'.format(ni), number=1) for ni in n]
print('\n'.join(map(str,s)))
# 7.838103920221329e-05
# 0.0005027339793741703
# 0.01208890089765191
# 0.8269036808051169
If you want to prepend zeroes to the arrays, rather than append, that's a simple enough change to the code, which I'll leave to you.
You can do something like this with numpy.pad
import numpy as np
a = np.array([[1,2,3,4,5],[1,2,3],[1]])
l = np.array([len(a[i]) for i in range(len(a))])
width = l.max()
b=[]
for i in range(len(a)):
if len(a[i]) != width:
x = np.pad(a[i], (0,width-len(a[i])), 'constant',constant_values = 0)
else:
x = a[i]
b.append(x)
b = np.array(b)
print(b)
Above piece of code outputs something like this.
b = [[1, 2, 3, 4, 5],
[1, 2, 3, 0, 0],
[1, 0, 0, 0, 0]]
You can read back your input version of data by doing something as follows
a = []
for i in range(len(b)):
a.append(b[i][0:l[i]])
a = np.array(a)
print(a)
where you get the following output
a = array([array([1, 2, 3, 4, 5]), array([1, 2, 3]), array([1])], dtype=object)
Hopefully this helps someone who struggled like me to solve the issue.
Thank you.
I have an array of values a = (2,3,0,0,4,3)
y=0
for x in a:
y = (y+x)*.95
Is there any way to use cumsum in numpy and apply the .95 decay to each row before adding the next value?
You're asking for a simple IIR Filter. Scipy's lfilter() is made for that:
import numpy as np
from scipy.signal import lfilter
data = np.array([2, 3, 0, 0, 4, 3], dtype=float) # lfilter wants floats
# Conventional approach:
result_conv = []
last_value = 0
for elmt in data:
last_value = (last_value + elmt)*.95
result_conv.append(last_value)
# IIR Filter:
result_IIR = lfilter([.95], [1, -.95], data)
if np.allclose(result_IIR, result_conv, 1e-12):
print("Values are equal.")
If you're only dealing with a 1D array, then short of scipy conveniences or writing a custom reduce ufunc for numpy, then in Python 3.3+, you can use itertools.accumulate, eg:
from itertools import accumulate
a = (2,3,0,0,4,3)
y = list(accumulate(a, lambda x,y: (x+y)*0.95))
# [2, 4.75, 4.5125, 4.286875, 7.87253125, 10.3289046875]
Numba provides an easy way to vectorize a function, creating a universal function (thus providing ufunc.accumulate):
import numpy
from numba import vectorize, float64
#vectorize([float64(float64, float64)])
def f(x, y):
return 0.95 * (x + y)
>>> a = numpy.array([2, 3, 0, 0, 4, 3])
>>> f.accumulate(a)
array([ 2. , 4.75 , 4.5125 , 4.286875 ,
7.87253125, 10.32890469])
I don't think that this can be done easily in NumPy alone, without using a loop.
One array-based idea would be to calculate the matrix M_ij = .95**i * a[N-j] (where N is the number of elements in a). The numbers that you are looking for are found by summing entries diagonally (with i-j constant). You could use thus use multiple numpy.diagonal(…).sum().
The good old algorithm that you outline is clearer and probably quite fast already (otherwise you can use Cython).
Doing what you want through NumPy without a single loop sounds like wizardry to me. Hats off to anybody who can pull this off.
I have a large matrix A of shape (n, n, 3, 3) with n is about 5000. Now I want find the inverse and transpose of matrix A:
import numpy as np
A = np.random.rand(1000, 1000, 3, 3)
identity = np.identity(3, dtype=A.dtype)
Ainv = np.zeros_like(A)
Atrans = np.zeros_like(A)
for i in range(1000):
for j in range(1000):
Ainv[i, j] = np.linalg.solve(A[i, j], identity)
Atrans[i, j] = np.transpose(A[i, j])
Is there a faster, more efficient way to do this?
This is taken from a project of mine, where I also do vectorized linear algebra on many 3x3 matrices.
Note that there is only a loop over 3; not a loop over n, so the code is vectorized in the important dimensions. I don't want to vouch for how this compares to a C/numba extension to do the same thing though, performance wise. This is likely to be substantially faster still, but at least this blows the loops over n out of the water.
def adjoint(A):
"""compute inverse without division by det; ...xv3xc3 input, or array of matrices assumed"""
AI = np.empty_like(A)
for i in xrange(3):
AI[...,i,:] = np.cross(A[...,i-2,:], A[...,i-1,:])
return AI
def inverse_transpose(A):
"""
efficiently compute the inverse-transpose for stack of 3x3 matrices
"""
I = adjoint(A)
det = dot(I, A).mean(axis=-1)
return I / det[...,None,None]
def inverse(A):
"""inverse of a stack of 3x3 matrices"""
return np.swapaxes( inverse_transpose(A), -1,-2)
def dot(A, B):
"""dot arrays of vecs; contract over last indices"""
return np.einsum('...i,...i->...', A, B)
A = np.random.rand(2,2,3,3)
I = inverse(A)
print np.einsum('...ij,...jk',A,I)
for the transpose:
testing a bit in ipython showed:
In [1]: import numpy
In [2]: x = numpy.ones((5,6,3,4))
In [3]: numpy.transpose(x,(0,1,3,2)).shape
Out[3]: (5, 6, 4, 3)
so you can just do
Atrans = numpy.transpose(A,(0,1,3,2))
to transpose the second and third dimensions (while leaving dimension 0 and 1 the same)
for the inversion:
the last example of http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.inv.html#numpy.linalg.inv
Inverses of several matrices can be computed at once:
from numpy.linalg import inv
a = np.array([[[1., 2.], [3., 4.]], [[1, 3], [3, 5]]])
>>> inv(a)
array([[[-2. , 1. ],
[ 1.5, -0.5]],
[[-5. , 2. ],
[ 3. , -1. ]]])
So i guess in your case, the inversion can be done with just
Ainv = inv(A)
and it will know that the last two dimensions are the ones it is supposed to invert over, and that the first dimensions are just how you stacked your data. This should be much faster
speed difference
for the transpose: your method needs 3.77557015419 sec, and mine needs 2.86102294922e-06 sec (which is a speedup of over 1 million times)
for the inversion: i guess my numpy version is not high enough to try that numpy.linalg.inv trick with (n,n,3,3) shape, to see the speedup there (my version is 1.6.2, and the docs i based my solution on are for 1.8, but it should work on 1.8, if someone else can test that?)
Numpy has the array.T properties which is a shortcut for transpose.
For inversions, you use np.linalg.inv(A).
As posted by wim A.I also works on matrix. e.g.
print (A.I)
for numpy-matrix object, use matrix.getI.
e.g.
A=numpy.matrix('1 3;5 6')
print (A.getI())