Python List Comprehension vs For [closed] - python

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
Why does list comprehension have better performance than a for loop, in Python?
list comprehension:
new_items = [a for a in items if a > 10]
for loop:
new_items = []
for a in items:
if a > 10: new_items.append(a)
Are there other examples (not loops), where one Python structure has worse performance than another Python structure?

Essentially, list comprehension and for loops does pretty similar things, with list comprehension doing away some overheads and making it look pretty.
To understand why this is faster, you should look in Efficiency of list comprehensions and to quote the relevant part for your problem:
List comprehensions perform better here because you don’t need to load
the append attribute off of the list (loop program, bytecode 28) and
call it as a function (loop program, bytecode 38). Instead, in a
comprehension, a specialized LIST_APPEND bytecode is generated for a
fast append onto the result list (comprehension program, bytecode 33).
In the loop_faster program, you avoid the overhead of the append
attribute lookup by hoisting it out of the loop and placing the result
in a fastlocal (bytecode 9-12), so it loops more quickly; however, the
comprehension uses a specialized LIST_APPEND bytecode instead of
incurring the overhead of a function call, so it still trumps.
The link also details some of the possible pitfalls associated with lc and I would recommend you to go through it once.

Assuming we're talking CPython here, you could use the dis module to compare the generated bytecodes:
>> def one():
return [a for a in items if a > 10]
>> def two():
res = []
for a in items:
if a > 10:
res.append(a)
>> dis.dis(one)
2 0 BUILD_LIST 0
3 LOAD_GLOBAL 0 (items)
6 GET_ITER
>> 7 FOR_ITER 24 (to 34)
10 STORE_FAST 0 (a)
13 LOAD_FAST 0 (a)
16 LOAD_CONST 1 (10)
19 COMPARE_OP 4 (>)
22 POP_JUMP_IF_FALSE 7
25 LOAD_FAST 0 (a)
28 LIST_APPEND 2
31 JUMP_ABSOLUTE 7
>> 34 RETURN_VALUE
>> dis.dis(two)
2 0 BUILD_LIST 0
3 STORE_FAST 0 (res)
3 6 SETUP_LOOP 42 (to 51)
9 LOAD_GLOBAL 0 (items)
12 GET_ITER
>> 13 FOR_ITER 34 (to 50)
16 STORE_FAST 1 (a)
4 19 LOAD_FAST 1 (a)
22 LOAD_CONST 1 (10)
25 COMPARE_OP 4 (>)
28 POP_JUMP_IF_FALSE 13
5 31 LOAD_FAST 0 (res)
34 LOAD_ATTR 1 (append)
37 LOAD_FAST 1 (a)
40 CALL_FUNCTION 1
43 POP_TOP
44 JUMP_ABSOLUTE 13
47 JUMP_ABSOLUTE 13
>> 50 POP_BLOCK
>> 51 LOAD_CONST 0 (None)
54 RETURN_VALUE
So for one thing, the list comprehension takes advantage of the dedicated LIST_APPEND opcode which isn't being used by the for loop.

From the python wiki
The for statement is most commonly used. It loops over the elements of
a sequence, assigning each to the loop variable. If the body of your
loop is simple, the interpreter overhead of the for loop itself can be
a substantial amount of the overhead. This is where the map function
is handy. You can think of map as a for moved into C code.
So simple for loops have overhead that list comprehensions get away with.

Related

Is it better to use nested loop for bigger repetitions or just put the entire range into one loop? Which is faster / less complex?

Which one is better?
for x in range(0,100):
print("Lorem Ipsum")
for x in range(0,10):
for y in range(0,10):
print("Lorem Ipsum")
The second one is harder to read and you construct an unnecessary range iterable (a list in Python 2, a less memory consuming and faster to create range object in Python 3).
From the unnecessary iterable the inner for loop constructs an unnecessary iterator (a list_iterator in Python 2, a range_iterator in Python 3).
The first one is more readable and easier understandable. Use that.
Regarding performance, I doubt it makes any difference and if it does, the 0-100 is faster, because it has smaller code (if the double loop is not optimized away) and thus a smaller code path.
When in doubt about such things, use the one that is easier to understand when you read the code. Premature optimization is a sin.
You can use dis from dis module to disassemble and analyse the bytecode of wich one of your loops is better (in a way your loops needs less memory, less iterators, etc ...).
Here is a traceback:
from dis import dis
def loop1():
for x in range(100):
pass
def loop2():
for x in range(10):
for j in range(10):
pass
Now look under the hood of each loop:
dis(loop1)
2 0 SETUP_LOOP 20 (to 23)
3 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (100)
9 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
12 GET_ITER
>> 13 FOR_ITER 6 (to 22)
16 STORE_FAST 0 (x)
3 19 JUMP_ABSOLUTE 13
>> 22 POP_BLOCK
>> 23 LOAD_CONST 0 (None)
26 RETURN_VALUE
And look at the amount of data and operations needed in your second loop:
dis(loop2)
2 0 SETUP_LOOP 43 (to 46)
3 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (10)
9 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
12 GET_ITER
>> 13 FOR_ITER 29 (to 45)
16 STORE_FAST 0 (x)
3 19 SETUP_LOOP 20 (to 42)
22 LOAD_GLOBAL 0 (range)
25 LOAD_CONST 1 (10)
28 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
31 GET_ITER
>> 32 FOR_ITER 6 (to 41)
35 STORE_FAST 1 (j)
4 38 JUMP_ABSOLUTE 32
>> 41 POP_BLOCK
>> 42 JUMP_ABSOLUTE 13
>> 45 POP_BLOCK
>> 46 LOAD_CONST 0 (None)
49 RETURN_VALUE
Because, both of loops do the same thing, the first one is a far better.
Just imagine how would you modify the nested loop for 101 iterations instead of 100 and the disadvantage is clear.

How does List Comprehension exactly work in Python? [duplicate]

This question already has answers here:
What does "list comprehension" and similar mean? How does it work and how can I use it?
(5 answers)
Closed 7 months ago.
I am going trough the docs of Python 3.X, I have doubt about List Comprehension speed of execution and how it exactly work.
Let's take the following example:
Listing 1
...
L = range(0,10)
L = [x ** 2 for x in L]
...
Now in my knowledge this return a new listing, and it's equivalent to write down:
Listing 2
...
res = []
for x in L:
res.append(x ** 2)
...
The main difference is the speed of execution if I am correct. Listing 1 is supposed to be performed at C language speed inside the interpreter, meanwhile Listing 2 is not.
But Listing 2 is what the list comprehension does internally (not sure), so why Listing 1 is executed at C Speed inside the interpreter & Listing 2 is not? Both are converted to byte code before being processed, or am I missing something?
Look at the actual bytecode that is produced. I've put the two fragments of code into fuctions called f1 and f2.
The comprehension does this:
3 15 LOAD_CONST 3 (<code object <listcomp> at 0x7fbf6c1b59c0, file "<stdin>", line 3>)
18 LOAD_CONST 4 ('f1.<locals>.<listcomp>')
21 MAKE_FUNCTION 0
24 LOAD_FAST 0 (L)
27 GET_ITER
28 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
31 STORE_FAST 0 (L)
Notice there is no loop in the bytecode. The loop happens in C.
Now the for loop does this:
4 21 SETUP_LOOP 31 (to 55)
24 LOAD_FAST 0 (L)
27 GET_ITER
>> 28 FOR_ITER 23 (to 54)
31 STORE_FAST 2 (x)
34 LOAD_FAST 1 (res)
37 LOAD_ATTR 1 (append)
40 LOAD_FAST 2 (x)
43 LOAD_CONST 3 (2)
46 BINARY_POWER
47 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
50 POP_TOP
51 JUMP_ABSOLUTE 28
>> 54 POP_BLOCK
In contrast to the comprehension, the loop is clearly here in the bytecode. So the loop occurs in python.
The bytecodes are different, and the first should be faster.
The answer is actually in your question.
When you run any built in python function you are running something that has been written in C and compiled into machine code.
When you write your own version of it, that code must be converted into CPython objects which are handled by the interpreter.
In consequence the built-in approach or function is always quicker (or takes less space) in Python than writing your own function.

python `for i in iter` vs `while True; i = next(iter)`

To my understanding, both these approach work for operating on every item in a generator:
let i be our operator target
let my_iter be our generator
let callable do_something_with return None
While Loop + StopIteratioon
try:
while True:
i = next(my_iter)
do_something_with(i)
except StopIteration:
pass
For loop / list comprehension
for i in my_iter:
do_something_with(i)
[do_something_with(i) for i in my_iter]
Minor Edit: print(i) replaced with do_something_with(i) as suggested by #kojiro to disambiguate a use case with the interpreter mechanics.
As far as I am aware, these are both applicable ways to iterate over a generator, Is there any reason to prefer one over the other?
Right now the for loop is looking superior to me. Due to: less lines/clutter and readability in general, plus single indent.
I really only see the while approach being advantages if you want to handily break the loop on particular exceptions.
the third option is definitively NOT the same as the first two. the third example creates a list, one each for the return value of print(i), which happens to be None, so not a very interesting list.
the first two are semantically similar. There is a minor, technical difference; the while loop, as presented, does not work if my_iter is not, in fact an iterator (ie, has a __next__() method); for instance, if it's a list. The for loop works for all iterables (has an __iter__() method) in addition to iterators.
The correct version is thus:
my_iter = iter(my_iterable)
try:
while True:
i = next(my_iter)
print(i)
except StopIteration:
pass
Now, aside from readability reasons, there in fact is a technical reason you should prefer the for loop; there is a penalty you pay (in CPython, anyhow) for the number of bytecodes executed in tight inner loops. lets compare:
In [1]: def forloop(my_iter):
...: for i in my_iter:
...: print(i)
...:
In [57]: dis.dis(forloop)
2 0 SETUP_LOOP 24 (to 27)
3 LOAD_FAST 0 (my_iter)
6 GET_ITER
>> 7 FOR_ITER 16 (to 26)
10 STORE_FAST 1 (i)
3 13 LOAD_GLOBAL 0 (print)
16 LOAD_FAST 1 (i)
19 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
22 POP_TOP
23 JUMP_ABSOLUTE 7
>> 26 POP_BLOCK
>> 27 LOAD_CONST 0 (None)
30 RETURN_VALUE
7 bytecodes called in inner loop vs:
In [55]: def whileloop(my_iterable):
....: my_iter = iter(my_iterable)
....: try:
....: while True:
....: i = next(my_iter)
....: print(i)
....: except StopIteration:
....: pass
....:
In [56]: dis.dis(whileloop)
2 0 LOAD_GLOBAL 0 (iter)
3 LOAD_FAST 0 (my_iterable)
6 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
9 STORE_FAST 1 (my_iter)
3 12 SETUP_EXCEPT 32 (to 47)
4 15 SETUP_LOOP 25 (to 43)
5 >> 18 LOAD_GLOBAL 1 (next)
21 LOAD_FAST 1 (my_iter)
24 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
27 STORE_FAST 2 (i)
6 30 LOAD_GLOBAL 2 (print)
33 LOAD_FAST 2 (i)
36 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
39 POP_TOP
40 JUMP_ABSOLUTE 18
>> 43 POP_BLOCK
44 JUMP_FORWARD 18 (to 65)
7 >> 47 DUP_TOP
48 LOAD_GLOBAL 3 (StopIteration)
51 COMPARE_OP 10 (exception match)
54 POP_JUMP_IF_FALSE 64
57 POP_TOP
58 POP_TOP
59 POP_TOP
8 60 POP_EXCEPT
61 JUMP_FORWARD 1 (to 65)
>> 64 END_FINALLY
>> 65 LOAD_CONST 0 (None)
68 RETURN_VALUE
9 Bytecodes in the inner loop.
We can actually do even better, though.
In [58]: from collections import deque
In [59]: def deqloop(my_iter):
....: deque(map(print, my_iter), 0)
....:
In [61]: dis.dis(deqloop)
2 0 LOAD_GLOBAL 0 (deque)
3 LOAD_GLOBAL 1 (map)
6 LOAD_GLOBAL 2 (print)
9 LOAD_FAST 0 (my_iter)
12 CALL_FUNCTION 2 (2 positional, 0 keyword pair)
15 LOAD_CONST 1 (0)
18 CALL_FUNCTION 2 (2 positional, 0 keyword pair)
21 POP_TOP
22 LOAD_CONST 0 (None)
25 RETURN_VALUE
everything happens in C, collections.deque, map and print are all builtins. (for cpython) so in this case, there are no bytecodes executed for looping. This is only a useful optimization when the iteration step is a c function (as is the case for print. Otherwise, the overhead of a python function call is larger than the JUMP_ABSOLUTE overhead.
The for loop is the most pythonic. Note that you can break out of for loops as well as while loops.
Don't use the list comprehension unless you need the resulting list, otherwise you are needlessly storing all the elements. Your example list comprehension will only work with the print function in Python 3, it won't work with the print statement in Python 2.
I would agree with you that the for loop is superior. As you mentioned it is less clutter and it is a lot easier to read. Programmers like to keep things as simple as possible and the for loop does that. It is also better for novice Python programmers who might not have learned try/except. Also, as Alasdair mentioned, you can break out of for loops. Also the while loop runs an error if you are using a list unless you use iter() on my_iter first.

Python List-comprehension and referencing an object multiple times

This is a contrived example to demonstrate referencing the same dictionary item multiple times in a for-loop and a list-comprehension. First, the for-loop:
dict_index_mylists = {0:['a', 'b', 'c'], 1:['b', 'c', 'a'], 2:['c', 'a', 'b']}
# for-loop
myseq = []
for i in [0, 1, 2]:
interim = dict_index_mylists[i]
if interim[0] == 'b' or interim[1] == 'c' or interim[2] == 'a':
myseq.append(interim)
In the for-loop, the interim list is referenced from the dictionary object and is then referenced multiple times in the if-conditional which may make sense particularly if the dictionary is very large and/or on storage. Then again, the 'interim' reference maybe unnecessary because the Python dictionary is optimized for performance.
This is a list-comprehension of the for-loop:
# list-comprehension
myseq = [dict_index_mylists[i] for i in [0, 1, 2] if dict_index_mylists[i][0] == 'b' or dict_index_mylists[i][1] == 'c' or dict_index_mylists[i][2] == 'a']
The questions are:
a. Does the list-comprehension make multiple references to the dictionary item or does it reference and keep a local 'interim' list to work on?
b. What is the optimal list-comprehension expression that contains multiple conditionals on the same dictionary item and where the dictionary is very large?
You seem to be asking only about optimization of common sub-expressions. In your list comprehension, it will index into the dictionary multiple times. Python is dynamic, it is difficult to know what side effects an operation like dict_index_mylists[i] might have, so CPython simply executes the operation as many times as you tell it to.
Other implementations like PyPy use a JIT and may optimize away subexpressions, but it is difficult to know for sure what it will do ahead of time.
If you are very concerned with performance, you need to time various options to see which is best.
I'm no expert at looking at python bytecode, but here's my attempt to learn something new this morning:
def dostuff():
myseq = [dict_index_mylists[i] for i in [0, 1, 2] if dict_index_mylists[i][0] == 'b' or dict_index_mylists[i][1] == 'c' or dict_index_mylists[i][2] == 'a']
import dis
dis.dis(dostuff)
If you look at the output (below), there are 4 calls to LOAD_GLOBAL, so it doesn't look like python is storing an interim list. As for your second question, what you have is probably about as good as you can do. It's not as bad as you might think though. dict objects access items by a hash function, so their lookup complexity is O(1) regardless of dictionary size. Of course, you could always use timeit and compare the two implementations (with loop and list-comp) and then choose the faster one. Profiling (as always) is your friend.
APENDIX (output of dis.dis(dostuff))
5 0 BUILD_LIST 0
3 DUP_TOP
4 STORE_FAST 0 (_[1])
7 LOAD_CONST 1 (0)
10 LOAD_CONST 2 (1)
13 LOAD_CONST 3 (2)
16 BUILD_LIST 3
19 GET_ITER
>> 20 FOR_ITER 84 (to 107)
23 STORE_FAST 1 (i)
26 LOAD_GLOBAL 0 (dict_index_mylists)
29 LOAD_FAST 1 (i)
32 BINARY_SUBSCR
33 LOAD_CONST 1 (0)
36 BINARY_SUBSCR
37 LOAD_CONST 4 ('b')
40 COMPARE_OP 2 (==)
43 JUMP_IF_TRUE 42 (to 88)
46 POP_TOP
47 LOAD_GLOBAL 0 (dict_index_mylists)
50 LOAD_FAST 1 (i)
53 BINARY_SUBSCR
54 LOAD_CONST 2 (1)
57 BINARY_SUBSCR
58 LOAD_CONST 5 ('c')
61 COMPARE_OP 2 (==)
64 JUMP_IF_TRUE 21 (to 88)
67 POP_TOP
68 LOAD_GLOBAL 0 (dict_index_mylists)
71 LOAD_FAST 1 (i)
74 BINARY_SUBSCR
75 LOAD_CONST 3 (2)
78 BINARY_SUBSCR
79 LOAD_CONST 6 ('a')
82 COMPARE_OP 2 (==)
85 JUMP_IF_FALSE 15 (to 103)
>> 88 POP_TOP
89 LOAD_FAST 0 (_[1])
92 LOAD_GLOBAL 0 (dict_index_mylists)
95 LOAD_FAST 1 (i)
98 BINARY_SUBSCR
99 LIST_APPEND
100 JUMP_ABSOLUTE 20
>> 103 POP_TOP
104 JUMP_ABSOLUTE 20
>> 107 DELETE_FAST 0 (_[1])
110 STORE_FAST 2 (myseq)
113 LOAD_CONST 0 (None)
116 RETURN_VALUE
First point: nothing (expect 'myseq') is "created" here, neither in the forloop nor in the listcomp versions of your code - it's just a reference to the existing dict item.
Now to answer you questions : the list comp version will make a lookup (a call to dict.__getitem__ for each of the dict_index_mylists[i] expressions. Each of these lookup will a return a reference to the same list. You can avoid these extra lookups by retaining a local reference to the dict's items, ie :
myseq = [
item for item in (dict_index_mylists[i] for i in (0, 1, 2))
if item[0] == 'b' or item[1] == 'c' or item[2] == 'a'
]
but there's no point in writing a listcomp just for the sake of writing a listcomp.
Note that if you don't care about the original ordering and want to apply this to your whole dict, using dict.itervalues() would be simpler.
wrt/ the second question, "optimal" is not an absolute. What do you want to optimize for ? space ? time ? readability ?

python - performance difference between the two implementations

How are the following two implementations have different performance in Python?
from cStringIO import StringIO
from itertools import imap
from sys import stdin
input = imap(int, StringIO(stdin.read()))
print '\n'.join(imap(str, sorted(input)))
AND
import sys
for line in sys.stdin:
l.append(int(line.strip('\n')))
l.sort()
for x in l:
print x
The first implementation is faster than the second for inputs of the order of 10^6 lines. Why so?
>>> dis.dis(first)
2 0 LOAD_GLOBAL 0 (imap)
3 LOAD_GLOBAL 1 (int)
6 LOAD_GLOBAL 2 (StringIO)
9 LOAD_GLOBAL 3 (stdin)
12 LOAD_ATTR 4 (read)
15 CALL_FUNCTION 0
18 CALL_FUNCTION 1
21 CALL_FUNCTION 2
24 STORE_FAST 0 (input)
27 LOAD_CONST 0 (None)
30 RETURN_VALUE
>>> dis.dis(second)
2 0 SETUP_LOOP 48 (to 51)
3 LOAD_GLOBAL 0 (sys)
6 LOAD_ATTR 1 (stdin)
9 CALL_FUNCTION 0
12 GET_ITER
>> 13 FOR_ITER 34 (to 50)
16 STORE_FAST 0 (line)
3 19 LOAD_GLOBAL 2 (l)
22 LOAD_ATTR 3 (append)
25 LOAD_GLOBAL 4 (int)
28 LOAD_FAST 0 (line)
31 LOAD_ATTR 5 (strip)
34 LOAD_CONST 1 ('\n')
37 CALL_FUNCTION 1
40 CALL_FUNCTION 1
43 CALL_FUNCTION 1
46 POP_TOP
47 JUMP_ABSOLUTE 13
>> 50 POP_BLOCK
4 >> 51 LOAD_GLOBAL 2 (l)
54 LOAD_ATTR 6 (sort)
57 CALL_FUNCTION 0
60 POP_TOP
61 LOAD_CONST 0 (None)
64 RETURN_VALUE
first is your first function.
second is your second function.
dis tells one of the reasons why the first one is faster.
Two primary reasons:
The 2nd code explicitly constructs a list and sorts it afterwards, while the 1st version lets sorted create only a internal list while sorting at the same time.
The 2nd code explicitly loops over a list with for (on the Python VM), while the 1st version implicitly loops with imap (over the underlaying structure in C).
Anyways, why is StringIO in there? The most straightforward and probably fastest way is:
from sys import stdin, stdout
stdout.writelines(sorted(stdin, key=int))
Do a step-by-step conversion from the second to the first one and see how the performance changes with each step.
Remove line.strip. This will cause some speed up, whether it would be significant is another matter. The stripping is superfluous as has been mentioned by you and THC4k.
Then replace the for loop using l.append with map(int, sys.stdin). My guess is that this would give a significant speed-up.
Replace map and l.sort with imap and sorted. My guess is that it won't affect the performance, there could be a slight slowdown, but it would be far from significant. Between the two, I'd usually go with the former, but with Python 3 on the horizon the latter is probably preferable.
Replace the for loop using print with print '\n'.join(...). My guess is that this would be another speed-up, but it would cost you some memory.
Add cStringIO (which is completely unnecessary by the way) to see how it affects performance. My guess is that it would be slightly slower, but not enough to counter 4 and 2.
Then, if you try THC4k's answer, it would probably be faster than all of the above, while being simpler and easier to read, and using less memory than 4 and 5. It has slightly different behaviour (it doesn't strip leading zeros from the numbers).
Of course, try this yourself instead of trusting anyone guesses. Also run cProfile on your code and see which parts are losing most time.

Categories

Resources