NumPy vectorization without the use of numpy.vectorize - python

I've found myself using NumPy arrays for memory management and speed of computation more and more lately, on large volumes of structured data (such as points and polygons). In doing so, there is always a situation where I need to perform some function f(x) on the entire array. From experience, and Googling, iterating over the array is not the way to do this, so insted a function should be vectorized and broadcast to the entire array.
Looking at the documentation for numpy.vectorize we get this example:
def myfunc(a, b):
"Return a-b if a>b, otherwise return a+b"
if a > b:
return a - b
else:
return a + b
>>> vfunc = np.vectorize(myfunc)
>>> vfunc([1, 2, 3, 4], 2)
array([3, 4, 1, 2])
And per the docs it really just creates a for loop so it doesnt access the lower C loops for truly vectorized operations (either in BLAS or SIMD). So that got me wondering, if the above is "vectorized", what is this?:
def myfunc_2(a, b):
cond = a > b
a[cond] -= b
a[~cond] += b
return a
>>> myfunc_2(np.array([1, 2, 3, 4], 2))
array([3, 4, 1, 2])
Or even this:
>>> a = np.array([1, 2, 3, 4]
>>> b = 2
>>> np.where(a > b, a - b, a + b)
array([3, 4, 1, 2])
So I ran some tests on these, what I believe to be comparable examples:
>>> arr = np.random.randint(200, size=(1000000,))
>>> setup = 'from __main__ import vfunc, arr'
>>> timeit('vfunc(arr, 50)', setup=setup, number=1)
0.60175449999997
>>> arr = np.random.randint(200, size=(1000000,))
>>> setup = 'from __main__ import myfunc_2, arr'
>>> timeit('myfunc_2(arr, 50)', setup=setup, number=1)
0.07464979999997468
>>> arr = np.random.randint(200, size=(1000000,))
>>> setup = 'from __main__ import myfunc_3, arr'
>>> timeit('myfunc_3(arr, 50)', setup=setup, number=1)
0.0222587000000658
And with larger run windows:
>>> arr = np.random.randint(200, size=(1000000,))
>>> setup = 'from __main__ import vfunc, arr'
>>> timeit('vfunc(arr, 50)', setup=setup, number=1000)
621.5853878000003
>>> arr = np.random.randint(200, size=(1000000,))
>>> setup = 'from __main__ import myfunc_2, arr'
>>> timeit('myfunc_2(arr, 50)', setup=setup, number=1000)
98.19819199999984
>>> arr = np.random.randint(200, size=(1000000,))
>>> setup = 'from __main__ import myfunc_3, arr'
>>> timeit('myfunc_3(arr, 50)', setup=setup, number=1000)
26.128515100000186
Clearly the other options are major improvements over using numpy.vectorize. This leads me to wonder several things about why anybody would use numpy.vectorize at all if you can write what appear to be "purely vectorized" functions or use battery provided functions like numpy.where.
Now for the questions:
What are the requirements to say a function is "vectorized" if not converted via numpy.vectorize? Just broadcastable in its entirety?
How does NumPy determine if a function is "vectorized"/broadcastable?
Why isn't this form of vectorization documented anywhere? (i.e., why doesn't NumPy have a "How to write a vectorized function" page?

"vectorization" can mean be different things depending on context. Use of low level C code with BLAS or SIMD is just one.
In physics 101, a vector represents a point or velocity whose numeric representation can vary with coordinate system. Thus I think of "vectorization", broadly speaking, as performing math operations on the "whole" array, without explicit control over numerical elements.
numpy basically adds a ndarray class to python. It has a large number of methods (and operators and ufunc) that do indexing and math in compiled code (not necessarily using processor specific SIMD). The big gain in speed, relative to Python level iteration, is the use of compiled code optimized for the ndarray data structure. Python level iteration (interpreted code) on arrays is actually slower than on equivalent lists.
I don't think numpy formally defines "vectorization". There isn't a "vector" class. I haven't searched the documentation for those terms. Here, and possibly on other forums, it just means, writing code that makes optimal use of ndarray methods. Generally that means avoiding python level iteration.
np.vectorize is a tool for applying arrays to functions that only accept scalar inputs. It doesn't not compile or otherwise "look inside" that function. But it does accept and apply arguments in a fully broadcasted sense, such as in:
In [162]: vfunc(np.arange(3)[:,None],np.arange(4))
Out[162]:
array([[0, 1, 2, 3],
[1, 2, 3, 4],
[2, 1, 4, 5]])
Speedwise np.vectorize is slower than the equivalent list comprehension, at least for smaller sample cases. Recent testing shows that it scales better, so for large inputs it may be better. But still the performance is nothing like your myfunc_2.
myfunc is not "vectorized" simply because expressions like if a > b do not work with arrays.
np.where(a > b, a - b, a + b) is "vectorized" because all arguments to the where work with arrays, and where itself uses them with full broadcasting powers.
In [163]: a,b = np.arange(3)[:,None], np.arange(4)
In [164]: np.where(a>b, a-b, a+b)
Out[164]:
array([[0, 1, 2, 3],
[1, 2, 3, 4],
[2, 1, 4, 5]])
myfunc_2 is "vectorized", at least in a:
In [168]: myfunc_2(a,2)
Out[168]:
array([[4],
[1],
[2]])
It does not work when b is array; it's trickier to match the a[cond] shape with anything but a scalar:
In [169]: myfunc_2(a,b)
Traceback (most recent call last):
Input In [169] in <cell line: 1>
myfunc_2(a,b)
Input In [159] in myfunc_2
a[cond] -= b
IndexError: boolean index did not match indexed array along dimension 1; dimension is 1 but corresponding boolean dimension is 4
===
What are the requirements to say a function is "vectorized" if not converted via numpy.vectorize? Just broadcastable in its entirety?
In your examples, my_func is not "vectorized" because it only works with scalars. vfunc is full "vectorized", but not faster. where is also "vectorized" and (probably) faster, though this may be scale dependent. my_func2 is only "vectorized" in a.
How does NumPy determine if a function is "vectorized"/broadcastable?
numpy doesn't determine anything like this. numpy is a ndarray class with many methods. It's just the use of those methods that makes a block of code "vectorized".
Why isn't this form of vectorization documented anywhere? (i.e., why doesn't NumPy have a "How to write a vectorized function" page?
Keep in mind the distinction between "vectorization" as a performance strategy, and the basic idea of operating on whole arrays.

Vectorize Documentation
The documentation provides a great example in def mypolyval(p, x):: there's no good way to write that as a where condition or using simple logic.
def mypolyval(p, x):
_p = list(p)
res = _p.pop(0)
while _p:
res = res*x + _p.pop(0)
return res
vpolyval = np.vectorize(mypolyval, excluded=['p'])
vpolyval(p=[1, 2, 3], x=[0, 1])
array([3, 6])
That is, np.vectorize is clearly what the reference documentation states: convenience to write code in the same fashion even without the performance benefits.
And as for the documentation telling you how to write vectorized code, it does though in the relevant documentation. It says in the documentation what you mentioned above:
The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.
Remember: the documentation is an API reference guide with some additional caveats: it's not a NumPy tutorial.
UFunc Documentation
The appropriate reference documentation and glossary document this clearly:
A universal function (or ufunc for short) is a function that operates on ndarrays in an element-by-element fashion, supporting array broadcasting, type casting, and several other standard features. That is, a ufunc is a “vectorized” wrapper for a function that takes a fixed number of specific inputs and produces a fixed number of specific outputs. For detailed information on universal functions, see Universal functions (ufunc) basics.
NumPy hands off array processing to C, where looping and computation are much faster than in Python. To exploit this, programmers using NumPy eliminate Python loops in favor of array-to-array operations. vectorization can refer both to the C offloading and to structuring NumPy code to leverage it.
Summary
Simply put, np.vectorize is for code legibility so you can write similar code to actually vectorized ufuncs. It is not for performance, but there are times when you have no good alternative.

Related

Faster alternative to using the `map` function

I have a function f, for exapmle:
def f(x):
return x**2
and want to obtain an array consisting of f evaluated over an interval, for example the unit interval (0,1). We ca do this as follows:
import numpy as np
X = np.arange(0,1,0.01)
arr = np.array(list(map(f, X)))
However, this last line is very time consuming when the function is complicated (in my case it involves some integrals). Is there a way to do this faster? I am happy to have a non-elegant solution - the focus is on speed.
You could use list comprehension to slightly decrease runtime.
arr = [f(x) for x in range(0, 5)] # range is the interval
This should work. It will only slightly decrease runtime though. You shouldn't be worried about runtime unless you use very large numbers with map().
If f is so complicated that it can't be expressed in terms of compiled array operations, and can only take scalars, I have found that frompyfunc gives the best performance (about 2x compared to an explicit loop)
In [76]: def f(x):
...: return x**2
...:
In [77]: foo = np.frompyfunc(f,1,1)
In [78]: foo(np.arange(4))
Out[78]: array([0, 1, 4, 9], dtype=object)
In [79]: foo(np.arange(4)).astype(int)
Out[79]: array([0, 1, 4, 9])
It returns dtype object, so needs an astype. np.vectorize uses this as well, but is a bit slower. Both generalize to various shapes of input array(s).
For a 1d result fromiter works with map (without the list) part:
In [84]: np.fromiter((f(x) for x in range(4)),int)
Out[84]: array([0, 1, 4, 9])
In [86]: np.fromiter(map(f, range(4)),int)
Out[86]: array([0, 1, 4, 9])
You'll have to do your own timings in a realistic case.
Use operations that operate on entire arrays. For example, with a function that just squares the input (slightly corrected from your example):
def f(x):
return x**2
then you'd just do
arr = f(X)
because NumPy defines operators like ** to operate on entire arrays at once.
Your real function might not be quite as straightforward. You say there are integrals involved; to make whole-array operations work with that, you might have to pass arguments differently or change what you're using to compute the integrals. In general, though, whole-array operations will vastly outperform anything that needs to call Python-level code in a loop.
You could try numpy.vectorize. It's very good way to apply function to list or array
import numpy as np
def foo(x):
return x**2
foo = np.vectorize(foo)
arr = np.arange(10)
In [1]: foo(arr)
Out[1]: array([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81])

When are generators converted to lists in Dask?

In Dask, when do generators get converted to lists, or are they generally consumed lazily?
For example, with the code:
from collections import Counter
import numpy as np
import dask.bag as db
def foo(n):
for _ in range(n):
yield np.random.randint(10)
def add_to_count(acc, x):
acc.update(x)
return acc
def add(x,y):
return x + y
b1 = db.from_sequence([1,2,3,4,5])
b2 = b1.map(foo)
result = b2.fold(add_to_count, add, Counter())
I get the following output, where the generators
have (reasonably) been converted to lists for me to inspect:
>>> b2.compute()
[[5], [5, 6], [3, 6, 1], [5, 6, 6, 0], [5, 6, 6, 0, 3]]
While reasonable, it differs from how I usually expect generators to behave in Python, which would be to require an explicit conversion to a list.
So, when computing the fold (result.compute()),
is the input argument x of add_to_count
a generator, or has it already been converted to a list?
I'm interested in the case where the lists are very long,
and so lazy evaluation is more efficient, say,
b1 = db.from_sequence([10**6]*10).
I'm guessing I could also solve the above problem with bag.frequencies, but I have similar concerns about lazy evaluation and efficient reduction.
Is there a fundamental aspect of Dask that I'm not grokking, or am I just being lazy, and where could I have looked into the code to figure this out myself?
Not exactly appropriate, but I'll provide the answer to a slightly different question:
Dask.bag adds in defensive calls to list` for you, just in case you decide to branch out and use the bag twice in a single computation:
x = b.map(func1, b)
y = b.map(func2, b)
compute(x.frequencies(), b.frequencies())
This is also useful when using backends like multiprocessing or distributed because we can't send generators across a process boundary, but can send lists.
However, these defensive calls to list are optimized away before computation when possible in an effort to promote laziness.
In summary, everything should just work the way you want when possible, but will revert to concrete non-lazy values when laziness would get in the way of correctness.

how to speed up enumerate for numpy array / how to enumerate over numpy array efficiently?

I need to generate a lot of random numbers. I've tried using random.random but this function is quite slow. Therefore I switched to numpy.random.random which is way faster! So far so good. The generated random numbers are actually used to calculate some thing (based on the number). I therefore enumerate over each number and replace the value. This seems to kill all my previously gained speedup. Here are the stats generated with timeit():
test_random - no enumerate
0.133111953735
test_np_random - no enumerate
0.0177130699158
test_random - enumerate
0.269361019135
test_np_random - enumerate
1.22525310516
as you can see, generating the number is almost 10 times faster using numpy, but enumerating over those numbers gives me equal run times.
Below is the code that I'm using:
import numpy as np
import timeit
import random
NBR_TIMES = 10
NBR_ELEMENTS = 100000
def test_random(do_enumerate=False):
y = [random.random() for i in range(NBR_ELEMENTS)]
if do_enumerate:
for index, item in enumerate(y):
# overwrite the y value, in reality this will be some function of 'item'
y[index] = 1 + item
def test_np_random(do_enumerate=False):
y = np.random.random(NBR_ELEMENTS)
if do_enumerate:
for index, item in enumerate(y):
# overwrite the y value, in reality this will be some function of 'item'
y[index] = 1 + item
if __name__ == '__main__':
from timeit import Timer
t = Timer("test_random()", "from __main__ import test_random")
print "test_random - no enumerate"
print t.timeit(NBR_TIMES)
t = Timer("test_np_random()", "from __main__ import test_np_random")
print "test_np_random - no enumerate"
print t.timeit(NBR_TIMES)
t = Timer("test_random(True)", "from __main__ import test_random")
print "test_random - enumerate"
print t.timeit(NBR_TIMES)
t = Timer("test_np_random(True)", "from __main__ import test_np_random")
print "test_np_random - enumerate"
print t.timeit(NBR_TIMES)
What's the best way to speed this up and why does enumerate slow things down so dramatically?
EDIT: the reason I use enumerate is because I need both the index and the value of the current element.
To take full advantage of numpy's speed, you want to create ufuncs whenever possible. Applying vectorize to a function as mgibsonbr suggests is one way to do that, but a better way, if possible, is simply to construct a function that takes advantage of numpy's built-in ufuncs. So something like this:
>>> import numpy
>>> a = numpy.random.random(10)
>>> a + 1
array([ 1.29738145, 1.33004628, 1.45825441, 1.46171177, 1.56863326,
1.58502855, 1.06693054, 1.93304272, 1.66056379, 1.91418473])
>>> (a + 1) * 0.25 / 4
array([ 0.08108634, 0.08312789, 0.0911409 , 0.09135699, 0.09803958,
0.09906428, 0.06668316, 0.12081517, 0.10378524, 0.11963655])
What is the nature of the function you want to apply across the numpy array? If you tell us, perhaps we can help you come up with a version that uses only numpy ufuncs.
It's also possible to generate an array of indices without using enumerate. Numpy provides ndenumerate, which is an iterator, and probably slower, but it also provides indices, which is a very quick way to generate the indices corresponding to the values in an array. So...
>>> numpy.indices(a.shape)
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
So to be more explicit, you can use the above and combine them using numpy.rec.fromarrays:
>>> a = numpy.random.random(10)
>>> ind = numpy.indices(a.shape)
>>> numpy.rec.fromarrays([ind[0], a])
rec.array([(0, 0.092473494150913438), (1, 0.20853257641948986),
(2, 0.35141455604686067), (3, 0.12212258656960817),
(4, 0.50986868372639049), (5, 0.0011439325711705139),
(6, 0.50412473457942508), (7, 0.28973489788728601),
(8, 0.20078799423168536), (9, 0.34527678271856999)],
dtype=[('f0', '<i8'), ('f1', '<f8')])
It's starting to sound like your main concern is performing the operation in-place. That's harder to do using vectorize but it's easy with the ufunc approach:
>>> def somefunc(a):
... a += 1
... a /= 15
...
>>> a = numpy.random.random(10)
>>> b = a
>>> somefunc(a)
>>> a
array([ 0.07158446, 0.07052393, 0.07276768, 0.09813235, 0.09429439,
0.08561703, 0.11204622, 0.10773558, 0.11878885, 0.10969279])
>>> b
array([ 0.07158446, 0.07052393, 0.07276768, 0.09813235, 0.09429439,
0.08561703, 0.11204622, 0.10773558, 0.11878885, 0.10969279])
As you can see, numpy performs these operations in-place.
Check numpy.vectorize, it should let you apply arbitrary functions to numpy arrays. For your simple example, you'd do something like this:
vecFunc = vectorize(lambda x: x + 1)
vecFunc(y)
However, that will create a new numpy array instead of modifying it in-place (which may or may not be a problem in your particular case).
In general, you'll always be better manipulating numpy structures with numpy functions than iterating with python functions, since the former are not only optimized but implemented in C, while the latter will be always interpreted.

Difference between frompyfunc and vectorize in numpy

What is the difference between vectorize and frompyfunc in numpy?
Both seem very similar. What is a typical use case for each of them?
Edit: As JoshAdel indicates, the class vectorize seems to be built upon frompyfunc. (see the source). It is still unclear to me whether frompyfunc may have any use case that is not covered by vectorize...
As JoshAdel points out, vectorize wraps frompyfunc. Vectorize adds extra features:
Copies the docstring from the original function
Allows you to exclude an argument from broadcasting rules.
Returns an array of the correct dtype instead of dtype=object
Edit: After some brief benchmarking, I find that vectorize is significantly slower (~50%) than frompyfunc for large arrays. If performance is critical in your application, benchmark your use-case first.
`
>>> a = numpy.indices((3,3)).sum(0)
>>> print a, a.dtype
[[0 1 2]
[1 2 3]
[2 3 4]] int32
>>> def f(x,y):
"""Returns 2 times x plus y"""
return 2*x+y
>>> f_vectorize = numpy.vectorize(f)
>>> f_frompyfunc = numpy.frompyfunc(f, 2, 1)
>>> f_vectorize.__doc__
'Returns 2 times x plus y'
>>> f_frompyfunc.__doc__
'f (vectorized)(x1, x2[, out])\n\ndynamic ufunc based on a python function'
>>> f_vectorize(a,2)
array([[ 2, 4, 6],
[ 4, 6, 8],
[ 6, 8, 10]])
>>> f_frompyfunc(a,2)
array([[2, 4, 6],
[4, 6, 8],
[6, 8, 10]], dtype=object)
`
I'm not sure what the different use cases for each is, but if you look at the source code (/numpy/lib/function_base.py), you'll see that vectorize wraps frompyfunc. My reading of the code is mostly that vectorize is doing proper handling of the input arguments. There might be particular instances where you would prefer one vs the other, but it would seem that frompyfunc is just a lower level instance of vectorize.
Although both methods provide you a way to build your own ufunc, numpy.frompyfunc method always returns a python object, while you could specify a return type when using numpy.vectorize method

How to sort an integer array in-place in Python?

how can one sort an integer array (not a list) in-place in Python 2.6? Is there a suitable function in one of the standard libraries?
In other words, I'm looking for a function that would do something like this:
>>> a = array.array('i', [1, 3, 2])
>>> some_function(a)
>>> a
array('i', [1, 2, 3])
Thanks in advance!
Well, you can't do it with array.array, but you can with numpy.array:
In [3]: a = numpy.array([0,1,3,2], dtype=numpy.int)
In [4]: a.sort()
In [5]: a
Out[5]: array([0, 1, 2, 3])
Or you can convert directly from an array.array if you have that already:
a = array.array('i', [1, 3, 2])
a = numpy.array(a)
#steven mentioned numpy.
Copies vs. in-place operation
-----------------------------
Most of the functions in `numpy` return a copy of the array argument
(e.g., `sort`). In-place versions of these functions are often
available as array methods, i.e. ``x = np.array([1,2,3]); x.sort()``.
Exceptions to this rule are documented.
Looking at the array docs, I don't see a method for sorting. I think the following is about as close as you can get using standard functions, although it is really clobbering the old object with a new one with the same name:
import array
a = array.array('i', [1,3,2])
a = array.array('i', sorted(a))
Or, you could write your own.
With the extra information from the comments that you're maxing out memory, this seems inapplicable for your situation; the numpy solution is the way to go. However, I'll leave this up for reference.

Categories

Resources