I was trying set comprehension for 2.6, and came across the following two ways. I thought the first method would be faster than the second, timeit suggested otherwise. Why is the second method faster even though the second method has got an extra list instantiation followed by a set instantiation?
Method 1:
In [16]: %timeit set(node[0] for node in pwnodes if node[1].get('pm'))
1000000 loops, best of 3: 568 ns per loop
Method 2:
In [17]: %timeit set([node[0] for node in pwnodes if node[1].get('pm')])
1000000 loops, best of 3: 469 ns per loop
where pwnodes = [('e1', dict(pm=1, wired=1)), ('e2', dict(pm=1, wired=1))].
Iteration is simply faster when using a list comprehension:
In [23]: from collections import deque
In [24]: %timeit deque((node[0] for node in pwnodes if node[1].get('pm')), maxlen=0)
1000 loops, best of 3: 305 µs per loop
In [25]: %timeit deque([node[0] for node in pwnodes if node[1].get('pm')], maxlen=0)
1000 loops, best of 3: 246 µs per loop
The deque is used to illustrate iteration speed; a deque with maxlen set to 0 discards all elements taken from the iterable so there are no memory allocation differences to skew the results.
That's because in Python 2, list comprehensions don't use a separate namespace, while a generator expression does (it has to, by necessity). That extra namespace requires a new frame on the stack, and this is expensive. The major advantage of generator expressions is their low memory footprint, not their speed.
In Python 3, list comprehensions have a separate namespace as well, and list comprehension and generator iteration speed is comparable. You also have set comprehensions, which are fastest still, even on Python 2.
My guess is because the second one involves a generator and the first one doesn't. Generators are generally slower than the equivalent list if the equivalent list fits in memory.
In [4]: timeit for i in [i for i in range(1000)]: pass
10000 loops, best of 3: 47.2 µs per loop
In [5]: timeit for i in (i for i in range(1000)): pass
10000 loops, best of 3: 57.8 µs per loop
Related
To copy a nested list in an existing list, it is unfortunately not sufficient to simply multiply it, otherwise references are created and not independent lists in the list, see this example:
x = [[1, 2, 3]] * 2
x[0] is x[1] # will evaluate to True
To achieve your goal, you could use the range function in a list comprehension, for example, see this:
x = [[1, 2, 3] for _ in range(2)]
x[0] is x[1] # will evaluate to False (wanted behaviour)
This is a good way to multiply items in a list without just creating references, and this is also explained multiple times on many different websites.
However, there is a more efficient way to copy the list elements. That code seems a little faster to me (measured by timeit via command line and with different paramater n ∈ {1, 50, 100, 10000} for code below and range(n) in code above):
x = [[1, 2, 3] for _ in [0] * n]
But I wonder, why does this code run faster? Are there other disadvantages (more memory consumption or similar)?
python -m timeit '[[1, 2, 3] for _ in range(1)]'
1000000 loops, best of 3: 0.243 usec per loop
python -m timeit '[[1, 2, 3] for _ in range(50)]'
100000 loops, best of 3: 3.79 usec per loop
python -m timeit '[[1, 2, 3] for _ in range(100)]'
100000 loops, best of 3: 7.39 usec per loop
python -m timeit '[[1, 2, 3] for _ in range(10000)]'
1000 loops, best of 3: 940 usec per loop
python -m timeit '[[1, 2, 3] for _ in [0] * 1]'
1000000 loops, best of 3: 0.242 usec per loop
python -m timeit '[[1, 2, 3] for _ in [0] * 50]'
100000 loops, best of 3: 3.77 usec per loop
python -m timeit '[[1, 2, 3] for _ in [0] * 100]'
100000 loops, best of 3: 7.3 usec per loop
python -m timeit '[[1, 2, 3] for _ in [0] * 10000]'
1000 loops, best of 3: 927 usec per loop
# difference will be greater for larger n
python -m timeit '[[1, 2, 3] for _ in range(1000000)]'
10 loops, best of 3: 144 msec per loop
python -m timeit '[[1, 2, 3] for _ in [0] * 1000000]'
10 loops, best of 3: 126 msec per loop
This is correct; range, even in Python 3 where it produces a compact range object, is more complicated than a list, in the classical tradeoff between computation and storage.
As the list grows too large to fit in cache (the primary question if we're concerned with performance), the range object runs into a different issue: as each number in the range is created, it destroys and creates new int objects (the first 256 or so are less costly because they're interned, but their difference can still cost a few cache misses). The list would keep referring to the same one.
There are still more efficient options, though; a bytearray, for instance, would consume far less memory than the list. Probably the best function for the task is hidden away in itertools: repeat. Like a range object, it doesn't need storage for all the copies, but like the repeated list it doesn't need to create distinct objects. Something like for _ in repeat(None, x) would therefore just poke at the same few cache lines (iteration count and reference count for the object).
In the end, the main reason people stick to using range is because it's what's prominently presented (both in the idiom for a fixed count loop and among the builtins).
In other Python implementations, it's quite possible for range to be faster than repeat; this would be because the counter itself already holds the value. I'd expect such behaviours from Cython or PyPy.
I have a list of 3D arrays that are all different shapes, but I need them to all be the same shape. Also, that shape needs to be the smallest shape in the list.
For example my_list with three arrays have the shapes (115,115,3), (111,111,3), and (113,113,3) then they all need to be (111,111,3). They are all square color images so they will be of shape (x,x,3).
So I have two main problems:
How do I find the smallest shape array without looping or keeping a variable while creating the list?
How do I efficiently set all arrays in a list to the smallest shape?
Currently I am keeping a variable for smallest shape while creating my_list so I can do this:
for idx, img in enumerate(my_list):
img = img[:smallest_shape,:smallest_shape]
my_list[idx] = img
I just feel like this is not the most efficient way, and I do realize I'm losing values by slicing, but I expect that.
I constructed a sample list with
In [513]: alist=[np.ones((512,512,3)) for _ in range(100)]
and did some timings.
Collecting shapes is fast:
In [515]: timeit [a.shape for a in alist]
10000 loops, best of 3: 31.2 µs per loop
Taking the min takes more time:
In [516]: np.min([a.shape for a in alist],axis=0)
Out[516]: array([512, 512, 3])
In [517]: timeit np.min([a.shape for a in alist],axis=0)
1000 loops, best of 3: 344 µs per loop
slicing is faster
In [518]: timeit [a[:500,:500,:] for a in alist]
10000 loops, best of 3: 133 µs per loop
now try to isolate the min step.
In [519]: shapes=[a.shape for a in alist]
In [520]: timeit np.min(shapes, axis=0)
The slowest run took 5.75 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 136 µs per loop
When you have lists of objects, iteration is the only way to deal with all elements. Look at the code for np.hstack and np.vstack (and others). They do one or more list comprehensions to massage all the input arrays into the correct shape. Then they do np.concatenate which iterates too, but in compiled code.
I'd like to take in a list of functions, funclist, and return a new function which takes in a list of arguments, arglist, and applies the ith function in funclist to the ith element of arglist, returning the results in a list:
def myfunc(funclist):
return lambda arglist: [ funclist[i](elt) for i, elt in enumerate(arglist) ]
This is not optimized for parallel/vectorized application of the independent functions in funclist to the independent arguments in argvec. Is there a built-in function in python or numpy (or otherwise) that will return a more optimized version of the lambda above? It would be similar in spirit to map or numpy.vectorize (but obviously not the same), and so far I haven't found anything.
In numpy terms true vectorization means performing the iterative stuff in compiled code. Usually that requires using numpy functions that work with whole arrays, doing thing like addition and indexing.
np.vectorize is a way of iterate of several arrays, and using their elements in a function that does not handle arrays. It doesn't do much in compiled code, so does not improve the speed much. It's most valuable as a way of applying numpy broadcasting rules to your own scalar function.
map is a variant on list comprehension, and has basically the same speed. And a list comprehension has more expressive power, working with several lists.
#Tore's zipped comprehension is a clear expression this task
[f(args) for f, args in zip(funclist, arglist)]
map can work with several input lists:
In [415]: arglist=[np.arange(3),np.arange(1,4)]
In [416]: fnlist=[np.sum, np.prod]
In [417]: [f(a) for f,a in zip(fnlist, arglist)]
Out[417]: [3, 6]
In [418]: list(map(lambda f,a: f(a), fnlist, arglist))
Out[418]: [3, 6]
Your version is a little wordier, but functionally the same.
In [423]: def myfunc(funclist):
...: return lambda arglist: [ funclist[i](elt) for i, elt in enumerate(arglist) ]
In [424]: myfunc(fnlist)
Out[424]: <function __main__.myfunc.<locals>.<lambda>>
In [425]: myfunc(fnlist)(arglist)
Out[425]: [3, 6]
It has the advantage of generating a function that can be applied to different arglists:
In [426]: flist=myfunc(fnlist)
In [427]: flist(arglist)
Out[427]: [3, 6]
In [428]: flist(arglist[::-1])
Out[428]: [6, 0]
I would have written myfunc more like:
def altfun(funclist):
def foo(arglist):
return [f(a) for f,a in zip(funclist, arglist)]
return foo
but the differences are just stylistic.
================
Time test for zip v enumerate:
In [154]: funclist=[sum]*N
In [155]: arglist=[list(range(N))]*N
In [156]: sum([funclist[i](args) for i,args in enumerate(arglist)])
Out[156]: 499500000
In [157]: sum([f(args) for f,args in zip(funclist, arglist)])
Out[157]: 499500000
In [158]: timeit [funclist[i](args) for i,args in enumerate(arglist)]
10 loops, best of 3: 43.5 ms per loop
In [159]: timeit [f(args) for f,args in zip(funclist, arglist)]
10 loops, best of 3: 43.1 ms per loop
Basically the same. But map is 2x faster
In [161]: timeit list(map(lambda f,a: f(a), funclist, arglist))
10 loops, best of 3: 23.1 ms per loop
Packaging the iteration in a callable is also faster
In [165]: timeit altfun(funclist)(arglist)
10 loops, best of 3: 23 ms per loop
In [179]: timeit myfunc(funclist)(arglist)
10 loops, best of 3: 22.6 ms per loop
Wondering why this tuple process;
x = tuple((t for t in range(100000)))
# 0.014001131057739258 seconds
Took longer than this list;
y = [z for z in range(100000)]
# 0.005000114440917969 seconds
I learned that tuple processes are faster than list since tuples are immutable.
Edit: After I changed the codes;
x = tuple(t for t in range(100000))
y = list(z for z in range(100000))
>>>
0.009999990463256836
0.0
>>>
These are the result: Still tuple is the slower one.
Tuple operations aren't necessarily faster. Being immutable at most opens the door to more optimisations, but that doesn't mean Python does them or that they apply in every case.
The difference here is very marginal, and - without profiling to confirm - it seems likely that it relates to the generator version having an extra name lookup and function call. As mentioned in the comments, rewriting the list comprehension as a call to list wrapped around a generator expression, the difference will likely shrink.
using comparative methods of testing the tuple is slightly faster:
In [12]: timeit tuple(t for t in range(100000))
100 loops, best of 3: 7.41 ms per loop
In [13]: timeit list(t for t in range(100000))
100 loops, best of 3: 7.53 ms per loop
calling list does actually create a list:
In [19]: x = list(t for t in range(10))
In [20]: x
Out[20]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
we can also see calling list on the generator does not allocate as much space as using a list comprehension:
In [28]: x = list(t for t in range(10))
In [29]: sys.getsizeof(x)
Out[29]: 168
In [30]: x = [t for t in range(10)]
In [31]: sys.getsizeof(x)
Out[31]: 200
So both operations are very similar.
A better comparison would be creating lists and tuples as subelements:
In [41]: timeit tuple((t,) for t in range(1000000))
10 loops, best of 3: 151 ms per loop
In [42]: timeit list([t] for t in range(1000000))
1 loops, best of 3: 247 ms per loop
Now we see a much larger difference.
Good evening!
I was running some tests on lists and list creation vs iterator creation and I came across some staggering time differences. Observe the following:
>>> timeit.timeit('map(lambda x: x**3, [1, 2, 3, 4, 5])')
0.4515998857965542
>>> timeit.timeit('list(map(lambda x: x**3, [1, 2, 3, 4, 5]))')
2.868906182460819
The iterator version returned by the first test runs more than 6x as fast as converting to a list. I understand basically why this might be occurring, but what I'm more interested in is a solution. Does anyone know of a data structure similar to a list that offers fast creation time? (Basically, I want to know if there is a way to go straight from iterator (i.e. map or filter function, etc.), to a list without any major performance hits)
Things I can sacrifice for speed:
Appending, inserting, popping and deleting elements.
Slicing of elements.
Reversing the list or any inplace operators like sort.
Contains (in) operator.
Concatenation and multiplication.
All suggestions are welcome thanks!
EDIT: Indeed this is for python 3.
In Python 3.x, map doesn't create a list, but just an iterator, unlike Python 2.x.
print(type(map(lambda x: x**3, [1, 2, 3, 4, 5])))
# <class 'map'>
To really get a list, iterate it with the list function, like this
print(type(list(map(lambda x: x**3, [1, 2, 3, 4, 5]))))
# <class 'list'>
So, you are really not comparing two similar things.
Expanding on thefourtheye's answer; The expressions inside the map function will not be evaluated before you iterate over it. This example should be pretty clear:
from time import sleep
def badass_heavy_function():
sleep(3600)
# Method call isn't evaluated
foo = map(lambda x: x(), [badass_heavy_function, badass_heavy_function])
# Methods call will be evaluated, please wait 2 hours
bar = list(map(lambda x: x(), [badass_heavy_function, badass_heavy_function]))
for _ in foo:
# Please wait one hour
pass
To further extend the answers of the other two guys:
You had a misconception about the iterator. But you refer to as "slow creation time", and then you look for a "faster container", because of your misinterpretation.
Note that the creation of a list object in python is fast:
%timeit list(range(10000))
10000 loops, best of 3: 164 µs per loop
What you experience as slow is the actual loop that you need to do calculate the values that need to go into the list.
see a very unoptimized example of slowly "creating" a new list of another list:
x = list(range(10000))
def slow_loop(x):
new = []
for i in x:
new.append(i**2)
return new
%timeit slow_loop(x)
100 loops, best of 3: 4.17 ms per loop
the time that is spent is actually on the loop, that is "slow" in python.
This is actually what you are doing here technically if you compare:
def your_loop(x):
return list(map(lambda y: y**2, x))
%timeit your_loop(x)
100 loops, best of 3: 4.5 ms per loop
There is a way to speed this up though:
def faster_loop(x):
return [i**2 for i in x]
%timeit faster_loop(x)
100 loops, best of 3: 3.67 ms per loop
although not by much given this kind of function. The thing is: the slow part here is the math, not the list and not the container. You can prove this by using numpy
arr = np.array(x)
%timeit arr ** 2
100000 loops, best of 3: 7.44 µs per loop
Woah... crazy speedup.
With the benchmarking - I find myself guilty of this quite often as well - people doubt the system too often but themselves not often enough. So it's not like python is very unoptimized or "slow" it's just that you're doing it wrong. Don't doubt the python list efficiency. Doubt your slow, inefficient code. You will probably get it right quicker...
It seems here the pure python ** operator is very slow, as a simple multiplication
is much quicker:
def faster_loop2(x):
return [i * i for i in x]
%timeit faster_loop2(x)
1000 loops, best of 3: 534 µs per loop