This is a contrived example to demonstrate referencing the same dictionary item multiple times in a for-loop and a list-comprehension. First, the for-loop:
dict_index_mylists = {0:['a', 'b', 'c'], 1:['b', 'c', 'a'], 2:['c', 'a', 'b']}
# for-loop
myseq = []
for i in [0, 1, 2]:
interim = dict_index_mylists[i]
if interim[0] == 'b' or interim[1] == 'c' or interim[2] == 'a':
myseq.append(interim)
In the for-loop, the interim list is referenced from the dictionary object and is then referenced multiple times in the if-conditional which may make sense particularly if the dictionary is very large and/or on storage. Then again, the 'interim' reference maybe unnecessary because the Python dictionary is optimized for performance.
This is a list-comprehension of the for-loop:
# list-comprehension
myseq = [dict_index_mylists[i] for i in [0, 1, 2] if dict_index_mylists[i][0] == 'b' or dict_index_mylists[i][1] == 'c' or dict_index_mylists[i][2] == 'a']
The questions are:
a. Does the list-comprehension make multiple references to the dictionary item or does it reference and keep a local 'interim' list to work on?
b. What is the optimal list-comprehension expression that contains multiple conditionals on the same dictionary item and where the dictionary is very large?
You seem to be asking only about optimization of common sub-expressions. In your list comprehension, it will index into the dictionary multiple times. Python is dynamic, it is difficult to know what side effects an operation like dict_index_mylists[i] might have, so CPython simply executes the operation as many times as you tell it to.
Other implementations like PyPy use a JIT and may optimize away subexpressions, but it is difficult to know for sure what it will do ahead of time.
If you are very concerned with performance, you need to time various options to see which is best.
I'm no expert at looking at python bytecode, but here's my attempt to learn something new this morning:
def dostuff():
myseq = [dict_index_mylists[i] for i in [0, 1, 2] if dict_index_mylists[i][0] == 'b' or dict_index_mylists[i][1] == 'c' or dict_index_mylists[i][2] == 'a']
import dis
dis.dis(dostuff)
If you look at the output (below), there are 4 calls to LOAD_GLOBAL, so it doesn't look like python is storing an interim list. As for your second question, what you have is probably about as good as you can do. It's not as bad as you might think though. dict objects access items by a hash function, so their lookup complexity is O(1) regardless of dictionary size. Of course, you could always use timeit and compare the two implementations (with loop and list-comp) and then choose the faster one. Profiling (as always) is your friend.
APENDIX (output of dis.dis(dostuff))
5 0 BUILD_LIST 0
3 DUP_TOP
4 STORE_FAST 0 (_[1])
7 LOAD_CONST 1 (0)
10 LOAD_CONST 2 (1)
13 LOAD_CONST 3 (2)
16 BUILD_LIST 3
19 GET_ITER
>> 20 FOR_ITER 84 (to 107)
23 STORE_FAST 1 (i)
26 LOAD_GLOBAL 0 (dict_index_mylists)
29 LOAD_FAST 1 (i)
32 BINARY_SUBSCR
33 LOAD_CONST 1 (0)
36 BINARY_SUBSCR
37 LOAD_CONST 4 ('b')
40 COMPARE_OP 2 (==)
43 JUMP_IF_TRUE 42 (to 88)
46 POP_TOP
47 LOAD_GLOBAL 0 (dict_index_mylists)
50 LOAD_FAST 1 (i)
53 BINARY_SUBSCR
54 LOAD_CONST 2 (1)
57 BINARY_SUBSCR
58 LOAD_CONST 5 ('c')
61 COMPARE_OP 2 (==)
64 JUMP_IF_TRUE 21 (to 88)
67 POP_TOP
68 LOAD_GLOBAL 0 (dict_index_mylists)
71 LOAD_FAST 1 (i)
74 BINARY_SUBSCR
75 LOAD_CONST 3 (2)
78 BINARY_SUBSCR
79 LOAD_CONST 6 ('a')
82 COMPARE_OP 2 (==)
85 JUMP_IF_FALSE 15 (to 103)
>> 88 POP_TOP
89 LOAD_FAST 0 (_[1])
92 LOAD_GLOBAL 0 (dict_index_mylists)
95 LOAD_FAST 1 (i)
98 BINARY_SUBSCR
99 LIST_APPEND
100 JUMP_ABSOLUTE 20
>> 103 POP_TOP
104 JUMP_ABSOLUTE 20
>> 107 DELETE_FAST 0 (_[1])
110 STORE_FAST 2 (myseq)
113 LOAD_CONST 0 (None)
116 RETURN_VALUE
First point: nothing (expect 'myseq') is "created" here, neither in the forloop nor in the listcomp versions of your code - it's just a reference to the existing dict item.
Now to answer you questions : the list comp version will make a lookup (a call to dict.__getitem__ for each of the dict_index_mylists[i] expressions. Each of these lookup will a return a reference to the same list. You can avoid these extra lookups by retaining a local reference to the dict's items, ie :
myseq = [
item for item in (dict_index_mylists[i] for i in (0, 1, 2))
if item[0] == 'b' or item[1] == 'c' or item[2] == 'a'
]
but there's no point in writing a listcomp just for the sake of writing a listcomp.
Note that if you don't care about the original ordering and want to apply this to your whole dict, using dict.itervalues() would be simpler.
wrt/ the second question, "optimal" is not an absolute. What do you want to optimize for ? space ? time ? readability ?
Related
When using the in operator on a literal, is it most idiomatic for that literal to be a list, set, or tuple?
e.g.
for x in {'foo', 'bar', 'baz'}:
doSomething(x)
...
if val in {1, 2, 3}:
doSomethingElse(val)
I don't see any benefit to the list, but the tuple's immutably means it could be hoisted or reused by an efficient interpreter. And in the case of the if, if it's reused, there's an efficiency benefit.
Which is the most idiomatic, and which is most performant in cpython?
Python provides a disassembler, so you can often just check the bytecode:
In [4]: def checktup():
...: for _ in range(10):
...: if val in (1, 2, 3):
...: print("foo")
...:
In [5]: def checkset():
...: for _ in range(10):
...: if val in {1, 2, 3}:
...: print("foo")
...:
In [6]: import dis
For the tuple literal:
In [7]: dis.dis(checktup)
2 0 SETUP_LOOP 32 (to 34)
2 LOAD_GLOBAL 0 (range)
4 LOAD_CONST 1 (10)
6 CALL_FUNCTION 1
8 GET_ITER
>> 10 FOR_ITER 20 (to 32)
12 STORE_FAST 0 (_)
3 14 LOAD_GLOBAL 1 (val)
16 LOAD_CONST 6 ((1, 2, 3))
18 COMPARE_OP 6 (in)
20 POP_JUMP_IF_FALSE 10
4 22 LOAD_GLOBAL 2 (print)
24 LOAD_CONST 5 ('foo')
26 CALL_FUNCTION 1
28 POP_TOP
30 JUMP_ABSOLUTE 10
>> 32 POP_BLOCK
>> 34 LOAD_CONST 0 (None)
36 RETURN_VALUE
For the set-literal:
In [8]: dis.dis(checkset)
2 0 SETUP_LOOP 32 (to 34)
2 LOAD_GLOBAL 0 (range)
4 LOAD_CONST 1 (10)
6 CALL_FUNCTION 1
8 GET_ITER
>> 10 FOR_ITER 20 (to 32)
12 STORE_FAST 0 (_)
3 14 LOAD_GLOBAL 1 (val)
16 LOAD_CONST 6 (frozenset({1, 2, 3}))
18 COMPARE_OP 6 (in)
20 POP_JUMP_IF_FALSE 10
4 22 LOAD_GLOBAL 2 (print)
24 LOAD_CONST 5 ('foo')
26 CALL_FUNCTION 1
28 POP_TOP
30 JUMP_ABSOLUTE 10
>> 32 POP_BLOCK
>> 34 LOAD_CONST 0 (None)
36 RETURN_VALUE
You'll notice that in both cases, the function will LOAD_CONST, i.e., both times it has been optimized. Even better, in the case of the set literal, the compiler has saved a frozenset, which during the construction of the function, the peephole-optimizer has managed to figure out can become the immutable equivalent of a set.
Note, on Python 2, the compiler builds a set every time!:
In [1]: import dis
In [2]: def checkset():
...: for _ in range(10):
...: if val in {1, 2, 3}:
...: print("foo")
...:
In [3]: dis.dis(checkset)
2 0 SETUP_LOOP 49 (to 52)
3 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (10)
9 CALL_FUNCTION 1
12 GET_ITER
>> 13 FOR_ITER 35 (to 51)
16 STORE_FAST 0 (_)
3 19 LOAD_GLOBAL 1 (val)
22 LOAD_CONST 2 (1)
25 LOAD_CONST 3 (2)
28 LOAD_CONST 4 (3)
31 BUILD_SET 3
34 COMPARE_OP 6 (in)
37 POP_JUMP_IF_FALSE 13
4 40 LOAD_CONST 5 ('foo')
43 PRINT_ITEM
44 PRINT_NEWLINE
45 JUMP_ABSOLUTE 13
48 JUMP_ABSOLUTE 13
>> 51 POP_BLOCK
>> 52 LOAD_CONST 0 (None)
55 RETURN_VALUE
IMO, there's essentially no such thing as "idiomatic" usage of literal values as shown in the question. Such values look like "magic numbers" to me. Using literals for "performance" is probably misguided because it sacrifices readability for marginal gains. In cases where performance really matters, using literals is unlikely to help much and there are better options regardless.
I think the idiomatic thing to do would be to store such values in a global or class variable, especially if you're using them in multiple places (but also even if you aren't). This provides some documentation as to what a value's purpose is and makes it easier to update. You can then memomize these values in function/method definitions to improve performance if necessary.
As to what type of data structure is most appropriate, that would depend on what your program does and how it uses the data. For example, does ordering matter? With an if x in y, it won't, but maybe you're using the data in a for and an if. Without context, it's hard to say what the best choice would be.
Here's an example I think is readable, extensible, and also efficient. Memoizing the global ITEMS in the function definitions makes lookup fast because items is in the local namespace of the function. If you look at the disassembled code, you'll see that items is looked up via LOAD_FAST instead of LOAD_GLOBAL. This approach also avoids making multiple copies of the list of items, which might be relevant if it's big enough (although, if it was big enough, you probably wouldn't try to inline it anyway). Personally, I wouldn't bother with these kinds of optimizations most of the time, but they can be useful in some cases.
# In real code, this would have a domain-specific name instead of the
# generic `ITEMS`.
ITEMS = {'a', 'b', 'c'}
def filter_in_items(values, items=ITEMS):
matching_items = []
for value in values:
if value in items:
matching_items.append(value)
return matching_items
def filter_not_in_items(values, items=ITEMS):
non_matching_items = []
for value in values:
if value not in items:
non_matching_items.append(value)
return non_matching_items
print(filter_in_items(('a', 'x'))) # -> ['a']
print(filter_not_in_items(('a', 'x'))) # -> ['x']
import dis
dis.dis(filter_in_items)
I am trying to find the most efficient way to check whether the given string is palindrome or not.
Firstly, I tried brute force which has running time of the order O(N). Then I optimized the code a little bit by making only n/2 comparisons instead of n.
Here is the code:
def palindrome(a):
length=len(a)
iterator=0
while iterator <= length/2:
if a[iterator]==a[length-iterator-1]:
iterator+=1
else:
return False
return True
It takes half time when compared to brute force but it is still order O(N).
Meanwhile, I also thought of a solution which uses slice operator.
Here is the code:
def palindrome_py(a):
return a==a[::-1]
Then I did running time analysis of both. Here is the result:
Running time
Length of string used is 50
Length multiplier indicates length of new string(50*multiplier)
Running time for 100000 iterations
For palindrome For palindrome_py Length Multiplier
0.6559998989 0.5309998989 1
1.2970001698 0.5939998627 2
3.5149998665 0.7820000648 3
13.4249999523 1.5310001373 4
65.5319998264 5.2660000324 5
The code I used can be accessed here: Running Time Table Generator
Now, I want to know why there is difference between running time of slice operator(palindrome_py) and the palindrome function.Why I am getting this type of running time?
Why is the slice operator so efficient as compared to the palindrome function, what is happening behind the scenes?
My observations-:
running time is proportional to multiplier ie. running time when multiplier is 2 can be obtained by multiplying running time of case (n-1) ie. 1st in this case by multiplier (n) ie.2
Generalizing, we get Running Time(n)=Running Time(n-1)* Multiplier
Your slicing-based solution is still O(n), the constant got smaller (that's your multiplier). It's faster, because less stuff is done in Python and more stuff is done in C. The bytecode shows it all.
In [1]: import dis
In [2]: %paste
def palindrome(a):
length=len(a)
iterator=0
while iterator <= length/2:
if a[iterator]==a[length-iterator-1]:
iterator+=1
else:
return False
return True
## -- End pasted text --
In [3]: dis.dis(palindrome)
2 0 LOAD_GLOBAL 0 (len)
3 LOAD_FAST 0 (a)
6 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
9 STORE_FAST 1 (length)
3 12 LOAD_CONST 1 (0)
15 STORE_FAST 2 (iterator)
4 18 SETUP_LOOP 65 (to 86)
>> 21 LOAD_FAST 2 (iterator)
24 LOAD_FAST 1 (length)
27 LOAD_CONST 2 (2)
30 BINARY_TRUE_DIVIDE
31 COMPARE_OP 1 (<=)
34 POP_JUMP_IF_FALSE 85
5 37 LOAD_FAST 0 (a)
40 LOAD_FAST 2 (iterator)
43 BINARY_SUBSCR
44 LOAD_FAST 0 (a)
47 LOAD_FAST 1 (length)
50 LOAD_FAST 2 (iterator)
53 BINARY_SUBTRACT
54 LOAD_CONST 3 (1)
57 BINARY_SUBTRACT
58 BINARY_SUBSCR
59 COMPARE_OP 2 (==)
62 POP_JUMP_IF_FALSE 78
6 65 LOAD_FAST 2 (iterator)
68 LOAD_CONST 3 (1)
71 INPLACE_ADD
72 STORE_FAST 2 (iterator)
75 JUMP_ABSOLUTE 21
8 >> 78 LOAD_CONST 4 (False)
81 RETURN_VALUE
82 JUMP_ABSOLUTE 21
>> 85 POP_BLOCK
10 >> 86 LOAD_CONST 5 (True)
89 RETURN_VALUE
There is a hell lot of Python virtual-machine level instructions, that are basically function calls, which are very expensive in Python.
Now, what's with the second function.
In [4]: %paste
def palindrome_py(a):
return a==a[::-1]
## -- End pasted text --
In [5]: dis.dis(palindrome_py)
2 0 LOAD_FAST 0 (a)
3 LOAD_FAST 0 (a)
6 LOAD_CONST 0 (None)
9 LOAD_CONST 0 (None)
12 LOAD_CONST 2 (-1)
15 BUILD_SLICE 3
18 BINARY_SUBSCR
19 COMPARE_OP 2 (==)
22 RETURN_VALUE
No Python iteration (jumpers) involved here and you only get 3 calls (these instructions call methods): BUILD_SLICE, BINARY_SUBSCR, COMPARE_OP, all done in C, because str is a built-in type with all methods written C. To be fair, we've seen the same instructions in the first function (along with a lot more other instructions), but there they are repeated for each character, multiplying the method-call overhead by n. Here you only pay the Python's function call overhead once, the rest is done in C.
The bottomline. You shouldn't do low-level stuff in Python manually, because it will run slower than a high-level counterpart (unless you have an asymptotically faster alternative that literally requires low-level magic). Python, unlike many other languages, most of the time encourages you to use abstractions and rewards you with higher performance.
To my understanding, both these approach work for operating on every item in a generator:
let i be our operator target
let my_iter be our generator
let callable do_something_with return None
While Loop + StopIteratioon
try:
while True:
i = next(my_iter)
do_something_with(i)
except StopIteration:
pass
For loop / list comprehension
for i in my_iter:
do_something_with(i)
[do_something_with(i) for i in my_iter]
Minor Edit: print(i) replaced with do_something_with(i) as suggested by #kojiro to disambiguate a use case with the interpreter mechanics.
As far as I am aware, these are both applicable ways to iterate over a generator, Is there any reason to prefer one over the other?
Right now the for loop is looking superior to me. Due to: less lines/clutter and readability in general, plus single indent.
I really only see the while approach being advantages if you want to handily break the loop on particular exceptions.
the third option is definitively NOT the same as the first two. the third example creates a list, one each for the return value of print(i), which happens to be None, so not a very interesting list.
the first two are semantically similar. There is a minor, technical difference; the while loop, as presented, does not work if my_iter is not, in fact an iterator (ie, has a __next__() method); for instance, if it's a list. The for loop works for all iterables (has an __iter__() method) in addition to iterators.
The correct version is thus:
my_iter = iter(my_iterable)
try:
while True:
i = next(my_iter)
print(i)
except StopIteration:
pass
Now, aside from readability reasons, there in fact is a technical reason you should prefer the for loop; there is a penalty you pay (in CPython, anyhow) for the number of bytecodes executed in tight inner loops. lets compare:
In [1]: def forloop(my_iter):
...: for i in my_iter:
...: print(i)
...:
In [57]: dis.dis(forloop)
2 0 SETUP_LOOP 24 (to 27)
3 LOAD_FAST 0 (my_iter)
6 GET_ITER
>> 7 FOR_ITER 16 (to 26)
10 STORE_FAST 1 (i)
3 13 LOAD_GLOBAL 0 (print)
16 LOAD_FAST 1 (i)
19 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
22 POP_TOP
23 JUMP_ABSOLUTE 7
>> 26 POP_BLOCK
>> 27 LOAD_CONST 0 (None)
30 RETURN_VALUE
7 bytecodes called in inner loop vs:
In [55]: def whileloop(my_iterable):
....: my_iter = iter(my_iterable)
....: try:
....: while True:
....: i = next(my_iter)
....: print(i)
....: except StopIteration:
....: pass
....:
In [56]: dis.dis(whileloop)
2 0 LOAD_GLOBAL 0 (iter)
3 LOAD_FAST 0 (my_iterable)
6 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
9 STORE_FAST 1 (my_iter)
3 12 SETUP_EXCEPT 32 (to 47)
4 15 SETUP_LOOP 25 (to 43)
5 >> 18 LOAD_GLOBAL 1 (next)
21 LOAD_FAST 1 (my_iter)
24 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
27 STORE_FAST 2 (i)
6 30 LOAD_GLOBAL 2 (print)
33 LOAD_FAST 2 (i)
36 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
39 POP_TOP
40 JUMP_ABSOLUTE 18
>> 43 POP_BLOCK
44 JUMP_FORWARD 18 (to 65)
7 >> 47 DUP_TOP
48 LOAD_GLOBAL 3 (StopIteration)
51 COMPARE_OP 10 (exception match)
54 POP_JUMP_IF_FALSE 64
57 POP_TOP
58 POP_TOP
59 POP_TOP
8 60 POP_EXCEPT
61 JUMP_FORWARD 1 (to 65)
>> 64 END_FINALLY
>> 65 LOAD_CONST 0 (None)
68 RETURN_VALUE
9 Bytecodes in the inner loop.
We can actually do even better, though.
In [58]: from collections import deque
In [59]: def deqloop(my_iter):
....: deque(map(print, my_iter), 0)
....:
In [61]: dis.dis(deqloop)
2 0 LOAD_GLOBAL 0 (deque)
3 LOAD_GLOBAL 1 (map)
6 LOAD_GLOBAL 2 (print)
9 LOAD_FAST 0 (my_iter)
12 CALL_FUNCTION 2 (2 positional, 0 keyword pair)
15 LOAD_CONST 1 (0)
18 CALL_FUNCTION 2 (2 positional, 0 keyword pair)
21 POP_TOP
22 LOAD_CONST 0 (None)
25 RETURN_VALUE
everything happens in C, collections.deque, map and print are all builtins. (for cpython) so in this case, there are no bytecodes executed for looping. This is only a useful optimization when the iteration step is a c function (as is the case for print. Otherwise, the overhead of a python function call is larger than the JUMP_ABSOLUTE overhead.
The for loop is the most pythonic. Note that you can break out of for loops as well as while loops.
Don't use the list comprehension unless you need the resulting list, otherwise you are needlessly storing all the elements. Your example list comprehension will only work with the print function in Python 3, it won't work with the print statement in Python 2.
I would agree with you that the for loop is superior. As you mentioned it is less clutter and it is a lot easier to read. Programmers like to keep things as simple as possible and the for loop does that. It is also better for novice Python programmers who might not have learned try/except. Also, as Alasdair mentioned, you can break out of for loops. Also the while loop runs an error if you are using a list unless you use iter() on my_iter first.
I have some pretty ugly indexing going on. For example, things like
valid[ data[ index[valid[:,0],0] ] == 0, 1] = False
where valid and index are {Nx2} arrays or bools and ints respectively, and data is {N} long.
If I concentrate really hard, I can convince myself that this is doing what I want... but its incredibly obfuscated. How can I unobfuscate something like this efficiently?
I could break it up, for example:
valid_index = index[valid[:,0],0]
invalid_index = (data[ valid_index ] == 0)
valid[ invalid_index, 1 ] = False
But my arrays will have up to 100's of millions of entries so I don't want to duplicate the memory; and I need to remain as speed efficient as possible.
These two code sequences are nearly identical, and should have very similar performance. That's my "gut feeling"--but then I did static analysis and ran a partial benchmark to confirm.
The clearer option requires four more bytecodes to implement, so will probably be slightly slower. But the extra work is restricted to LOAD_FAST and STORE_FAST, which are just moves from the top of stack (TOS) to/from variables. As the extra work is modest, so should be the performance impact.
You could benchmark the two approaches on your target equipment for more quantitative precision, but on my 3-year-old laptop, 100 million extra LOAD_FAST / STORE_FAST pairs takes just over 3 seconds on standard CPython 2.7.5. So I estimate this clarity will cost you about 6 seconds per 100M entries. While the PyPy just-in-time Python compiler doesn't use the same bytecodes, I timed its overhead for the clear version at about half that, or 3 seconds per 100M. Compared to other work you're doing to process the items, the clearer version probably is not a significant showdown.
The TL;DR Backstory
My first impression is that the code sequences, while different in readability and clarity, are technically very similar, and should not have similar performance characteristics. But let's analyze a bit further using the Python disassembler. I dropped each code snippet into a function:
def one(valid, data):
valid[ data[ index[valid[:,0],0] ] == 0, 1] = False
def two(valid, data):
valid_index = index[valid[:,0],0]
invalid_index = (data[ valid_index ] == 0)
valid[ invalid_index, 1 ] = False
Then using Python's bytecode dissassember:
import dis
dis.dis(one)
print "---"
dis.dis(two)
Gives:
15 0 LOAD_GLOBAL 0 (False)
3 LOAD_FAST 0 (valid)
6 LOAD_FAST 1 (data)
9 LOAD_GLOBAL 1 (index)
12 LOAD_FAST 0 (valid)
15 LOAD_CONST 0 (None)
18 LOAD_CONST 0 (None)
21 BUILD_SLICE 2
24 LOAD_CONST 1 (0)
27 BUILD_TUPLE 2
30 BINARY_SUBSCR
31 LOAD_CONST 1 (0)
34 BUILD_TUPLE 2
37 BINARY_SUBSCR
38 BINARY_SUBSCR
39 LOAD_CONST 1 (0)
42 COMPARE_OP 2 (==)
45 LOAD_CONST 2 (1)
48 BUILD_TUPLE 2
51 STORE_SUBSCR
52 LOAD_CONST 0 (None)
55 RETURN_VALUE
18 0 LOAD_GLOBAL 0 (index)
3 LOAD_FAST 0 (valid)
6 LOAD_CONST 0 (None)
9 LOAD_CONST 0 (None)
12 BUILD_SLICE 2
15 LOAD_CONST 1 (0)
18 BUILD_TUPLE 2
21 BINARY_SUBSCR
22 LOAD_CONST 1 (0)
25 BUILD_TUPLE 2
28 BINARY_SUBSCR
29 STORE_FAST 2 (valid_index)
19 32 LOAD_FAST 1 (data)
35 LOAD_FAST 2 (valid_index)
38 BINARY_SUBSCR
39 LOAD_CONST 1 (0)
42 COMPARE_OP 2 (==)
45 STORE_FAST 3 (invalid_index)
20 48 LOAD_GLOBAL 1 (False)
51 LOAD_FAST 0 (valid)
54 LOAD_FAST 3 (invalid_index)
57 LOAD_CONST 2 (1)
60 BUILD_TUPLE 2
63 STORE_SUBSCR
64 LOAD_CONST 0 (None)
67 RETURN_VALUE
Similar but not identical, and not in the same order. A quick diff of the two shows the same, plus the possibility the clearer function requires more byte codes.
I parsed the bytecode opcodes out of each function's disassembler listing, dropped them into a collections.Counter, and compared the counts:
Bytecode Count(s)
======== ========
BINARY_SUBSCR 3
BUILD_SLICE 1
BUILD_TUPLE 3
COMPARE_OP 1
LOAD_CONST 7
LOAD_FAST 3, 5 *** differs ***
LOAD_GLOBAL 2
RETURN_VALUE 1
STORE_FAST 0, 2 *** differs ***
STORE_SUBSCR 1
Here is where it becomes evident that the second, clearer approach uses only four more bytecodes, and of the simple, fast LOAD_FAST / STORE_FAST variety. Static analysis thus shows no particular reason to fear additional memory allocation or other performance-killing side effects.
I then constructed two functions, very similar to one another, that the disassembler shows differ only in that the second one has an extra LOAD_FAST / STORE_FAST pair. I ran them 100,000,000 times, and compared their runtimes. They differed by just over 3 seconds in CPython 2.7.5, and about 1.5 seconds under PyPy 2.2.1 (based on Python 2.7.3). Even when you double those times (because you have two pairs), it's pretty clear those extra load/store pairs are not going to slow you down much.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
Why does list comprehension have better performance than a for loop, in Python?
list comprehension:
new_items = [a for a in items if a > 10]
for loop:
new_items = []
for a in items:
if a > 10: new_items.append(a)
Are there other examples (not loops), where one Python structure has worse performance than another Python structure?
Essentially, list comprehension and for loops does pretty similar things, with list comprehension doing away some overheads and making it look pretty.
To understand why this is faster, you should look in Efficiency of list comprehensions and to quote the relevant part for your problem:
List comprehensions perform better here because you don’t need to load
the append attribute off of the list (loop program, bytecode 28) and
call it as a function (loop program, bytecode 38). Instead, in a
comprehension, a specialized LIST_APPEND bytecode is generated for a
fast append onto the result list (comprehension program, bytecode 33).
In the loop_faster program, you avoid the overhead of the append
attribute lookup by hoisting it out of the loop and placing the result
in a fastlocal (bytecode 9-12), so it loops more quickly; however, the
comprehension uses a specialized LIST_APPEND bytecode instead of
incurring the overhead of a function call, so it still trumps.
The link also details some of the possible pitfalls associated with lc and I would recommend you to go through it once.
Assuming we're talking CPython here, you could use the dis module to compare the generated bytecodes:
>> def one():
return [a for a in items if a > 10]
>> def two():
res = []
for a in items:
if a > 10:
res.append(a)
>> dis.dis(one)
2 0 BUILD_LIST 0
3 LOAD_GLOBAL 0 (items)
6 GET_ITER
>> 7 FOR_ITER 24 (to 34)
10 STORE_FAST 0 (a)
13 LOAD_FAST 0 (a)
16 LOAD_CONST 1 (10)
19 COMPARE_OP 4 (>)
22 POP_JUMP_IF_FALSE 7
25 LOAD_FAST 0 (a)
28 LIST_APPEND 2
31 JUMP_ABSOLUTE 7
>> 34 RETURN_VALUE
>> dis.dis(two)
2 0 BUILD_LIST 0
3 STORE_FAST 0 (res)
3 6 SETUP_LOOP 42 (to 51)
9 LOAD_GLOBAL 0 (items)
12 GET_ITER
>> 13 FOR_ITER 34 (to 50)
16 STORE_FAST 1 (a)
4 19 LOAD_FAST 1 (a)
22 LOAD_CONST 1 (10)
25 COMPARE_OP 4 (>)
28 POP_JUMP_IF_FALSE 13
5 31 LOAD_FAST 0 (res)
34 LOAD_ATTR 1 (append)
37 LOAD_FAST 1 (a)
40 CALL_FUNCTION 1
43 POP_TOP
44 JUMP_ABSOLUTE 13
47 JUMP_ABSOLUTE 13
>> 50 POP_BLOCK
>> 51 LOAD_CONST 0 (None)
54 RETURN_VALUE
So for one thing, the list comprehension takes advantage of the dedicated LIST_APPEND opcode which isn't being used by the for loop.
From the python wiki
The for statement is most commonly used. It loops over the elements of
a sequence, assigning each to the loop variable. If the body of your
loop is simple, the interpreter overhead of the for loop itself can be
a substantial amount of the overhead. This is where the map function
is handy. You can think of map as a for moved into C code.
So simple for loops have overhead that list comprehensions get away with.