I'd like to take in a list of functions, funclist, and return a new function which takes in a list of arguments, arglist, and applies the ith function in funclist to the ith element of arglist, returning the results in a list:
def myfunc(funclist):
return lambda arglist: [ funclist[i](elt) for i, elt in enumerate(arglist) ]
This is not optimized for parallel/vectorized application of the independent functions in funclist to the independent arguments in argvec. Is there a built-in function in python or numpy (or otherwise) that will return a more optimized version of the lambda above? It would be similar in spirit to map or numpy.vectorize (but obviously not the same), and so far I haven't found anything.
In numpy terms true vectorization means performing the iterative stuff in compiled code. Usually that requires using numpy functions that work with whole arrays, doing thing like addition and indexing.
np.vectorize is a way of iterate of several arrays, and using their elements in a function that does not handle arrays. It doesn't do much in compiled code, so does not improve the speed much. It's most valuable as a way of applying numpy broadcasting rules to your own scalar function.
map is a variant on list comprehension, and has basically the same speed. And a list comprehension has more expressive power, working with several lists.
#Tore's zipped comprehension is a clear expression this task
[f(args) for f, args in zip(funclist, arglist)]
map can work with several input lists:
In [415]: arglist=[np.arange(3),np.arange(1,4)]
In [416]: fnlist=[np.sum, np.prod]
In [417]: [f(a) for f,a in zip(fnlist, arglist)]
Out[417]: [3, 6]
In [418]: list(map(lambda f,a: f(a), fnlist, arglist))
Out[418]: [3, 6]
Your version is a little wordier, but functionally the same.
In [423]: def myfunc(funclist):
...: return lambda arglist: [ funclist[i](elt) for i, elt in enumerate(arglist) ]
In [424]: myfunc(fnlist)
Out[424]: <function __main__.myfunc.<locals>.<lambda>>
In [425]: myfunc(fnlist)(arglist)
Out[425]: [3, 6]
It has the advantage of generating a function that can be applied to different arglists:
In [426]: flist=myfunc(fnlist)
In [427]: flist(arglist)
Out[427]: [3, 6]
In [428]: flist(arglist[::-1])
Out[428]: [6, 0]
I would have written myfunc more like:
def altfun(funclist):
def foo(arglist):
return [f(a) for f,a in zip(funclist, arglist)]
return foo
but the differences are just stylistic.
================
Time test for zip v enumerate:
In [154]: funclist=[sum]*N
In [155]: arglist=[list(range(N))]*N
In [156]: sum([funclist[i](args) for i,args in enumerate(arglist)])
Out[156]: 499500000
In [157]: sum([f(args) for f,args in zip(funclist, arglist)])
Out[157]: 499500000
In [158]: timeit [funclist[i](args) for i,args in enumerate(arglist)]
10 loops, best of 3: 43.5 ms per loop
In [159]: timeit [f(args) for f,args in zip(funclist, arglist)]
10 loops, best of 3: 43.1 ms per loop
Basically the same. But map is 2x faster
In [161]: timeit list(map(lambda f,a: f(a), funclist, arglist))
10 loops, best of 3: 23.1 ms per loop
Packaging the iteration in a callable is also faster
In [165]: timeit altfun(funclist)(arglist)
10 loops, best of 3: 23 ms per loop
In [179]: timeit myfunc(funclist)(arglist)
10 loops, best of 3: 22.6 ms per loop
Related
I am using numpy. I have a matrix with 1 column and N rows and I want to get an array from with N elements.
For example, if i have M = matrix([[1], [2], [3], [4]]), I want to get A = array([1,2,3,4]).
To achieve it, I use A = np.array(M.T)[0]. Does anyone know a more elegant way to get the same result?
Thanks!
If you'd like something a bit more readable, you can do this:
A = np.squeeze(np.asarray(M))
Equivalently, you could also do: A = np.asarray(M).reshape(-1), but that's a bit less easy to read.
result = M.A1
https://numpy.org/doc/stable/reference/generated/numpy.matrix.A1.html
matrix.A1
1-d base array
A, = np.array(M.T)
depends what you mean by elegance i suppose but thats what i would do
You can try the following variant:
result=np.array(M).flatten()
np.array(M).ravel()
If you care for speed; But if you care for memory:
np.asarray(M).ravel()
Or you could try to avoid some temps with
A = M.view(np.ndarray)
A.shape = -1
First, Mv = numpy.asarray(M.T), which gives you a 4x1 but 2D array.
Then, perform A = Mv[0,:], which gives you what you want. You could put them together, as numpy.asarray(M.T)[0,:].
This will convert the matrix into array
A = np.ravel(M).T
ravel() and flatten() functions from numpy are two techniques that I would try here. I will like to add to the posts made by Joe, Siraj, bubble and Kevad.
Ravel:
A = M.ravel()
print A, A.shape
>>> [1 2 3 4] (4,)
Flatten:
M = np.array([[1], [2], [3], [4]])
A = M.flatten()
print A, A.shape
>>> [1 2 3 4] (4,)
numpy.ravel() is faster, since it is a library level function which does not make any copy of the array. However, any change in array A will carry itself over to the original array M if you are using numpy.ravel().
numpy.flatten() is slower than numpy.ravel(). But if you are using numpy.flatten() to create A, then changes in A will not get carried over to the original array M.
numpy.squeeze() and M.reshape(-1) are slower than numpy.flatten() and numpy.ravel().
%timeit M.ravel()
>>> 1000000 loops, best of 3: 309 ns per loop
%timeit M.flatten()
>>> 1000000 loops, best of 3: 650 ns per loop
%timeit M.reshape(-1)
>>> 1000000 loops, best of 3: 755 ns per loop
%timeit np.squeeze(M)
>>> 1000000 loops, best of 3: 886 ns per loop
Came in a little late, hope this helps someone,
np.array(M.flat)
I am trying to use arpgpartition from numpy, but it seems there is something going wrong and I cannot seem to figure it out. Here is what's happening:
These are first 5 elements of the sorted array norms
np.sort(norms)[:5]
array([ 53.64759445, 54.91434479, 60.11617279, 64.09630585, 64.75318909], dtype=float32)
But when I use indices_sorted = np.argpartition(norms, 5)[:5]
norms[indices_sorted]
array([ 60.11617279, 64.09630585, 53.64759445, 54.91434479, 64.75318909], dtype=float32)
When I think I should get the same result as the sorted array?
It works just fine when I use 3 as the parameter indices_sorted = np.argpartition(norms, 3)[:3]
norms[indices_sorted]
array([ 53.64759445, 54.91434479, 60.11617279], dtype=float32)
This isn't making much sense to me, hoping someone can offer some insight?
EDIT: Rephrasing this question as whether argpartition preserves order of the k partitioned elements makes more sense.
We need to use list of indices that are to be kept in sorted order instead of feeding the kth param as a scalar. Thus, to maintain the sorted nature across the first 5 elements, instead of np.argpartition(a,5)[:5], simply do -
np.argpartition(a,range(5))[:5]
Here's a sample run to make things clear -
In [84]: a = np.random.rand(10)
In [85]: a
Out[85]:
array([ 0.85017222, 0.19406266, 0.7879974 , 0.40444978, 0.46057793,
0.51428578, 0.03419694, 0.47708 , 0.73924536, 0.14437159])
In [86]: a[np.argpartition(a,5)[:5]]
Out[86]: array([ 0.19406266, 0.14437159, 0.03419694, 0.40444978, 0.46057793])
In [87]: a[np.argpartition(a,range(5))[:5]]
Out[87]: array([ 0.03419694, 0.14437159, 0.19406266, 0.40444978, 0.46057793])
Please note that argpartition makes sense on performance aspect, if we are looking to get sorted indices for a small subset of elements, let's say k number of elems which is a small fraction of the total number of elems.
Let's use a bigger dataset and try to get sorted indices for all elems to make the above mentioned point clear -
In [51]: a = np.random.rand(10000)*100
In [52]: %timeit np.argpartition(a,range(a.size-1))[:5]
10 loops, best of 3: 105 ms per loop
In [53]: %timeit a.argsort()
1000 loops, best of 3: 893 µs per loop
Thus, to sort all elems, np.argpartition isn't the way to go.
Now, let's say I want to get sorted indices for only the first 5 elems with that big dataset and also keep the order for those -
In [68]: a = np.random.rand(10000)*100
In [69]: np.argpartition(a,range(5))[:5]
Out[69]: array([1647, 942, 2167, 1371, 2571])
In [70]: a.argsort()[:5]
Out[70]: array([1647, 942, 2167, 1371, 2571])
In [71]: %timeit np.argpartition(a,range(5))[:5]
10000 loops, best of 3: 112 µs per loop
In [72]: %timeit a.argsort()[:5]
1000 loops, best of 3: 888 µs per loop
Very useful here!
Given the task of indirectly sorting a subset (the top k, top meaning first in sort order) there are two builtin solutions: argsort and argpartition cf. #Divakar's answer.
If, however, performance is a consideration then it may (depending on the sizes of the data and the subset of interest) be well worth resisting the "lure of the one-liner", investing one more line and applying argsort on the output of argpartition:
>>> def top_k_sort(a, k):
... return np.argsort(a)[:k]
...
>>> def top_k_argp(a, k):
... return np.argpartition(a, range(k))[:k]
...
>>> def top_k_hybrid(a, k):
... b = np.argpartition(a, k)[:k]
... return b[np.argsort(a[b])]
>>> k = 100
>>> timeit.timeit('f(a,k)', 'a=rng((100000,))', number = 1000, globals={'f': top_k_sort, 'rng': np.random.random, 'k': k})
8.348663672804832
>>> timeit.timeit('f(a,k)', 'a=rng((100000,))', number = 1000, globals={'f': top_k_argp, 'rng': np.random.random, 'k': k})
9.869098862167448
>>> timeit.timeit('f(a,k)', 'a=rng((100000,))', number = 1000, globals={'f': top_k_hybrid, 'rng': np.random.random, 'k': k})
1.2305558240041137
argsort is O(n log n), argpartition with range argument appears to be O(nk) (?), and argpartition + argsort is O(n + k log k)
Therefore in an interesting regime n >> k >> 1 the hybrid method is expected to be fastest
UPDATE: ND version:
import numpy as np
from timeit import timeit
def top_k_sort(A,k,axis=-1):
return A.argsort(axis=axis)[(*axis%A.ndim*(slice(None),),slice(k))]
def top_k_partition(A,k,axis=-1):
return A.argpartition(range(k),axis=axis)[(*axis%A.ndim*(slice(None),),slice(k))]
def top_k_hybrid(A,k,axis=-1):
B = A.argpartition(k,axis=axis)[(*axis%A.ndim*(slice(None),),slice(k))]
return np.take_along_axis(B,np.take_along_axis(A,B,axis).argsort(axis),axis)
A = np.random.random((100,10000))
k = 100
from timeit import timeit
for f in globals().copy():
if f.startswith("top_"):
print(f, timeit(f"{f}(A,k)",globals=globals(),number=10)*100)
Sample run:
top_k_sort 63.72379460372031
top_k_partition 99.30561298970133
top_k_hybrid 10.714635509066284
Let's describe the partition method in a simplified way which helps a lot understand argpartition
Following the example in the picture if we execute C=numpy.argpartition(A, 3) C will be the resulting array of getting the position of every element in B with respect to the A array. ie:
Idx(z) = index of element z in array A
then C would be
C = [ Idx(B[0]), Idx(B[1]), Idx(B[2]), Idx(X), Idx(B[4]), ..... Idx(B[N]) ]
As previously mentioned this method is very helpful and comes very handy when you have a huge array and you are only interested in a selected group of ordered elements, not the whole array.
I was trying set comprehension for 2.6, and came across the following two ways. I thought the first method would be faster than the second, timeit suggested otherwise. Why is the second method faster even though the second method has got an extra list instantiation followed by a set instantiation?
Method 1:
In [16]: %timeit set(node[0] for node in pwnodes if node[1].get('pm'))
1000000 loops, best of 3: 568 ns per loop
Method 2:
In [17]: %timeit set([node[0] for node in pwnodes if node[1].get('pm')])
1000000 loops, best of 3: 469 ns per loop
where pwnodes = [('e1', dict(pm=1, wired=1)), ('e2', dict(pm=1, wired=1))].
Iteration is simply faster when using a list comprehension:
In [23]: from collections import deque
In [24]: %timeit deque((node[0] for node in pwnodes if node[1].get('pm')), maxlen=0)
1000 loops, best of 3: 305 µs per loop
In [25]: %timeit deque([node[0] for node in pwnodes if node[1].get('pm')], maxlen=0)
1000 loops, best of 3: 246 µs per loop
The deque is used to illustrate iteration speed; a deque with maxlen set to 0 discards all elements taken from the iterable so there are no memory allocation differences to skew the results.
That's because in Python 2, list comprehensions don't use a separate namespace, while a generator expression does (it has to, by necessity). That extra namespace requires a new frame on the stack, and this is expensive. The major advantage of generator expressions is their low memory footprint, not their speed.
In Python 3, list comprehensions have a separate namespace as well, and list comprehension and generator iteration speed is comparable. You also have set comprehensions, which are fastest still, even on Python 2.
My guess is because the second one involves a generator and the first one doesn't. Generators are generally slower than the equivalent list if the equivalent list fits in memory.
In [4]: timeit for i in [i for i in range(1000)]: pass
10000 loops, best of 3: 47.2 µs per loop
In [5]: timeit for i in (i for i in range(1000)): pass
10000 loops, best of 3: 57.8 µs per loop
Wondering why this tuple process;
x = tuple((t for t in range(100000)))
# 0.014001131057739258 seconds
Took longer than this list;
y = [z for z in range(100000)]
# 0.005000114440917969 seconds
I learned that tuple processes are faster than list since tuples are immutable.
Edit: After I changed the codes;
x = tuple(t for t in range(100000))
y = list(z for z in range(100000))
>>>
0.009999990463256836
0.0
>>>
These are the result: Still tuple is the slower one.
Tuple operations aren't necessarily faster. Being immutable at most opens the door to more optimisations, but that doesn't mean Python does them or that they apply in every case.
The difference here is very marginal, and - without profiling to confirm - it seems likely that it relates to the generator version having an extra name lookup and function call. As mentioned in the comments, rewriting the list comprehension as a call to list wrapped around a generator expression, the difference will likely shrink.
using comparative methods of testing the tuple is slightly faster:
In [12]: timeit tuple(t for t in range(100000))
100 loops, best of 3: 7.41 ms per loop
In [13]: timeit list(t for t in range(100000))
100 loops, best of 3: 7.53 ms per loop
calling list does actually create a list:
In [19]: x = list(t for t in range(10))
In [20]: x
Out[20]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
we can also see calling list on the generator does not allocate as much space as using a list comprehension:
In [28]: x = list(t for t in range(10))
In [29]: sys.getsizeof(x)
Out[29]: 168
In [30]: x = [t for t in range(10)]
In [31]: sys.getsizeof(x)
Out[31]: 200
So both operations are very similar.
A better comparison would be creating lists and tuples as subelements:
In [41]: timeit tuple((t,) for t in range(1000000))
10 loops, best of 3: 151 ms per loop
In [42]: timeit list([t] for t in range(1000000))
1 loops, best of 3: 247 ms per loop
Now we see a much larger difference.
Good evening!
I was running some tests on lists and list creation vs iterator creation and I came across some staggering time differences. Observe the following:
>>> timeit.timeit('map(lambda x: x**3, [1, 2, 3, 4, 5])')
0.4515998857965542
>>> timeit.timeit('list(map(lambda x: x**3, [1, 2, 3, 4, 5]))')
2.868906182460819
The iterator version returned by the first test runs more than 6x as fast as converting to a list. I understand basically why this might be occurring, but what I'm more interested in is a solution. Does anyone know of a data structure similar to a list that offers fast creation time? (Basically, I want to know if there is a way to go straight from iterator (i.e. map or filter function, etc.), to a list without any major performance hits)
Things I can sacrifice for speed:
Appending, inserting, popping and deleting elements.
Slicing of elements.
Reversing the list or any inplace operators like sort.
Contains (in) operator.
Concatenation and multiplication.
All suggestions are welcome thanks!
EDIT: Indeed this is for python 3.
In Python 3.x, map doesn't create a list, but just an iterator, unlike Python 2.x.
print(type(map(lambda x: x**3, [1, 2, 3, 4, 5])))
# <class 'map'>
To really get a list, iterate it with the list function, like this
print(type(list(map(lambda x: x**3, [1, 2, 3, 4, 5]))))
# <class 'list'>
So, you are really not comparing two similar things.
Expanding on thefourtheye's answer; The expressions inside the map function will not be evaluated before you iterate over it. This example should be pretty clear:
from time import sleep
def badass_heavy_function():
sleep(3600)
# Method call isn't evaluated
foo = map(lambda x: x(), [badass_heavy_function, badass_heavy_function])
# Methods call will be evaluated, please wait 2 hours
bar = list(map(lambda x: x(), [badass_heavy_function, badass_heavy_function]))
for _ in foo:
# Please wait one hour
pass
To further extend the answers of the other two guys:
You had a misconception about the iterator. But you refer to as "slow creation time", and then you look for a "faster container", because of your misinterpretation.
Note that the creation of a list object in python is fast:
%timeit list(range(10000))
10000 loops, best of 3: 164 µs per loop
What you experience as slow is the actual loop that you need to do calculate the values that need to go into the list.
see a very unoptimized example of slowly "creating" a new list of another list:
x = list(range(10000))
def slow_loop(x):
new = []
for i in x:
new.append(i**2)
return new
%timeit slow_loop(x)
100 loops, best of 3: 4.17 ms per loop
the time that is spent is actually on the loop, that is "slow" in python.
This is actually what you are doing here technically if you compare:
def your_loop(x):
return list(map(lambda y: y**2, x))
%timeit your_loop(x)
100 loops, best of 3: 4.5 ms per loop
There is a way to speed this up though:
def faster_loop(x):
return [i**2 for i in x]
%timeit faster_loop(x)
100 loops, best of 3: 3.67 ms per loop
although not by much given this kind of function. The thing is: the slow part here is the math, not the list and not the container. You can prove this by using numpy
arr = np.array(x)
%timeit arr ** 2
100000 loops, best of 3: 7.44 µs per loop
Woah... crazy speedup.
With the benchmarking - I find myself guilty of this quite often as well - people doubt the system too often but themselves not often enough. So it's not like python is very unoptimized or "slow" it's just that you're doing it wrong. Don't doubt the python list efficiency. Doubt your slow, inefficient code. You will probably get it right quicker...
It seems here the pure python ** operator is very slow, as a simple multiplication
is much quicker:
def faster_loop2(x):
return [i * i for i in x]
%timeit faster_loop2(x)
1000 loops, best of 3: 534 µs per loop