python `for i in iter` vs `while True; i = next(iter)` - python

To my understanding, both these approach work for operating on every item in a generator:
let i be our operator target
let my_iter be our generator
let callable do_something_with return None
While Loop + StopIteratioon
try:
while True:
i = next(my_iter)
do_something_with(i)
except StopIteration:
pass
For loop / list comprehension
for i in my_iter:
do_something_with(i)
[do_something_with(i) for i in my_iter]
Minor Edit: print(i) replaced with do_something_with(i) as suggested by #kojiro to disambiguate a use case with the interpreter mechanics.
As far as I am aware, these are both applicable ways to iterate over a generator, Is there any reason to prefer one over the other?
Right now the for loop is looking superior to me. Due to: less lines/clutter and readability in general, plus single indent.
I really only see the while approach being advantages if you want to handily break the loop on particular exceptions.

the third option is definitively NOT the same as the first two. the third example creates a list, one each for the return value of print(i), which happens to be None, so not a very interesting list.
the first two are semantically similar. There is a minor, technical difference; the while loop, as presented, does not work if my_iter is not, in fact an iterator (ie, has a __next__() method); for instance, if it's a list. The for loop works for all iterables (has an __iter__() method) in addition to iterators.
The correct version is thus:
my_iter = iter(my_iterable)
try:
while True:
i = next(my_iter)
print(i)
except StopIteration:
pass
Now, aside from readability reasons, there in fact is a technical reason you should prefer the for loop; there is a penalty you pay (in CPython, anyhow) for the number of bytecodes executed in tight inner loops. lets compare:
In [1]: def forloop(my_iter):
...: for i in my_iter:
...: print(i)
...:
In [57]: dis.dis(forloop)
2 0 SETUP_LOOP 24 (to 27)
3 LOAD_FAST 0 (my_iter)
6 GET_ITER
>> 7 FOR_ITER 16 (to 26)
10 STORE_FAST 1 (i)
3 13 LOAD_GLOBAL 0 (print)
16 LOAD_FAST 1 (i)
19 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
22 POP_TOP
23 JUMP_ABSOLUTE 7
>> 26 POP_BLOCK
>> 27 LOAD_CONST 0 (None)
30 RETURN_VALUE
7 bytecodes called in inner loop vs:
In [55]: def whileloop(my_iterable):
....: my_iter = iter(my_iterable)
....: try:
....: while True:
....: i = next(my_iter)
....: print(i)
....: except StopIteration:
....: pass
....:
In [56]: dis.dis(whileloop)
2 0 LOAD_GLOBAL 0 (iter)
3 LOAD_FAST 0 (my_iterable)
6 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
9 STORE_FAST 1 (my_iter)
3 12 SETUP_EXCEPT 32 (to 47)
4 15 SETUP_LOOP 25 (to 43)
5 >> 18 LOAD_GLOBAL 1 (next)
21 LOAD_FAST 1 (my_iter)
24 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
27 STORE_FAST 2 (i)
6 30 LOAD_GLOBAL 2 (print)
33 LOAD_FAST 2 (i)
36 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
39 POP_TOP
40 JUMP_ABSOLUTE 18
>> 43 POP_BLOCK
44 JUMP_FORWARD 18 (to 65)
7 >> 47 DUP_TOP
48 LOAD_GLOBAL 3 (StopIteration)
51 COMPARE_OP 10 (exception match)
54 POP_JUMP_IF_FALSE 64
57 POP_TOP
58 POP_TOP
59 POP_TOP
8 60 POP_EXCEPT
61 JUMP_FORWARD 1 (to 65)
>> 64 END_FINALLY
>> 65 LOAD_CONST 0 (None)
68 RETURN_VALUE
9 Bytecodes in the inner loop.
We can actually do even better, though.
In [58]: from collections import deque
In [59]: def deqloop(my_iter):
....: deque(map(print, my_iter), 0)
....:
In [61]: dis.dis(deqloop)
2 0 LOAD_GLOBAL 0 (deque)
3 LOAD_GLOBAL 1 (map)
6 LOAD_GLOBAL 2 (print)
9 LOAD_FAST 0 (my_iter)
12 CALL_FUNCTION 2 (2 positional, 0 keyword pair)
15 LOAD_CONST 1 (0)
18 CALL_FUNCTION 2 (2 positional, 0 keyword pair)
21 POP_TOP
22 LOAD_CONST 0 (None)
25 RETURN_VALUE
everything happens in C, collections.deque, map and print are all builtins. (for cpython) so in this case, there are no bytecodes executed for looping. This is only a useful optimization when the iteration step is a c function (as is the case for print. Otherwise, the overhead of a python function call is larger than the JUMP_ABSOLUTE overhead.

The for loop is the most pythonic. Note that you can break out of for loops as well as while loops.
Don't use the list comprehension unless you need the resulting list, otherwise you are needlessly storing all the elements. Your example list comprehension will only work with the print function in Python 3, it won't work with the print statement in Python 2.

I would agree with you that the for loop is superior. As you mentioned it is less clutter and it is a lot easier to read. Programmers like to keep things as simple as possible and the for loop does that. It is also better for novice Python programmers who might not have learned try/except. Also, as Alasdair mentioned, you can break out of for loops. Also the while loop runs an error if you are using a list unless you use iter() on my_iter first.

Related

PyCharm not hitting Quick and Dirty breakpoint on "pass"

I want to add a quick & dirty breakpoint, e.g when I am interested in stopping in the middle of iterating a long list.
for item in list:
if item == 'curry':
pass
I put a breakpoint on pass, and it is not hit(!).
If I add a following (empty) print
for item in list:
if item = 'curry':
pass
print('')
and breakpoint both pass and print, only print is hit.
Any idea why? Windows 7, (portable) Python 3.7
[Update] as per the comment form #Adam.Er8 I tried inserting and breakpointing the ellipsis literal, ... but that was not hit, although the following print('') was.
[Updtae++] Hmm, it does hit a breakpoint on the pass in
for key, value in dictionary.items():
pass
The pass doesn't actually make it into the bytecode. The code is exactly the same as if it wasn't there. You can see this using the dis module. (examples using 3.7 on linux).
>>> import dis
>>> dis.dis(dis.dis('for i in a:\n\tprint("i")')
1 0 SETUP_LOOP 20 (to 22)
2 LOAD_NAME 0 (a)
4 GET_ITER
>> 6 FOR_ITER 12 (to 20)
8 STORE_NAME 1 (i)
2 10 LOAD_NAME 2 (print)
12 LOAD_CONST 0 ('i')
14 CALL_FUNCTION 1
16 POP_TOP
18 JUMP_ABSOLUTE 6
>> 20 POP_BLOCK
>> 22 LOAD_CONST 1 (None)
24 RETURN_VALUE
>>> dis.dis('for i in a:\n\tpass\n\tprint("i")')
1 0 SETUP_LOOP 20 (to 22)
2 LOAD_NAME 0 (a)
4 GET_ITER
>> 6 FOR_ITER 12 (to 20)
8 STORE_NAME 1 (i)
3 10 LOAD_NAME 2 (print)
12 LOAD_CONST 0 ('i')
14 CALL_FUNCTION 1
16 POP_TOP
18 JUMP_ABSOLUTE 6
>> 20 POP_BLOCK
>> 22 LOAD_CONST 1 (None)
24 RETURN_VALUE
What the bytecode is doing isn't as relevant as the fact both blocks are identical. the pass is just ignored so there is nothing for the debugger to break on.
try replacing pass with ...:
for item in list:
if item = 'curry':
...
you should be able to break-point there
this is called the ellipsis literal, unlike pass it is actually "executed" (well, sort of), and this is why you can break on it, like you would on any other statement, but it has 0 side effects and reads like "nothing" (before discovering this trick I'd just write _ = 0)
EDIT:
you can just set a conditional breakpoint.
In PyCharm this is done by right-clicking the bp and writing the condition:

Idiomatic Python: `in` keyword on literal

When using the in operator on a literal, is it most idiomatic for that literal to be a list, set, or tuple?
e.g.
for x in {'foo', 'bar', 'baz'}:
doSomething(x)
...
if val in {1, 2, 3}:
doSomethingElse(val)
I don't see any benefit to the list, but the tuple's immutably means it could be hoisted or reused by an efficient interpreter. And in the case of the if, if it's reused, there's an efficiency benefit.
Which is the most idiomatic, and which is most performant in cpython?
Python provides a disassembler, so you can often just check the bytecode:
In [4]: def checktup():
...: for _ in range(10):
...: if val in (1, 2, 3):
...: print("foo")
...:
In [5]: def checkset():
...: for _ in range(10):
...: if val in {1, 2, 3}:
...: print("foo")
...:
In [6]: import dis
For the tuple literal:
In [7]: dis.dis(checktup)
2 0 SETUP_LOOP 32 (to 34)
2 LOAD_GLOBAL 0 (range)
4 LOAD_CONST 1 (10)
6 CALL_FUNCTION 1
8 GET_ITER
>> 10 FOR_ITER 20 (to 32)
12 STORE_FAST 0 (_)
3 14 LOAD_GLOBAL 1 (val)
16 LOAD_CONST 6 ((1, 2, 3))
18 COMPARE_OP 6 (in)
20 POP_JUMP_IF_FALSE 10
4 22 LOAD_GLOBAL 2 (print)
24 LOAD_CONST 5 ('foo')
26 CALL_FUNCTION 1
28 POP_TOP
30 JUMP_ABSOLUTE 10
>> 32 POP_BLOCK
>> 34 LOAD_CONST 0 (None)
36 RETURN_VALUE
For the set-literal:
In [8]: dis.dis(checkset)
2 0 SETUP_LOOP 32 (to 34)
2 LOAD_GLOBAL 0 (range)
4 LOAD_CONST 1 (10)
6 CALL_FUNCTION 1
8 GET_ITER
>> 10 FOR_ITER 20 (to 32)
12 STORE_FAST 0 (_)
3 14 LOAD_GLOBAL 1 (val)
16 LOAD_CONST 6 (frozenset({1, 2, 3}))
18 COMPARE_OP 6 (in)
20 POP_JUMP_IF_FALSE 10
4 22 LOAD_GLOBAL 2 (print)
24 LOAD_CONST 5 ('foo')
26 CALL_FUNCTION 1
28 POP_TOP
30 JUMP_ABSOLUTE 10
>> 32 POP_BLOCK
>> 34 LOAD_CONST 0 (None)
36 RETURN_VALUE
You'll notice that in both cases, the function will LOAD_CONST, i.e., both times it has been optimized. Even better, in the case of the set literal, the compiler has saved a frozenset, which during the construction of the function, the peephole-optimizer has managed to figure out can become the immutable equivalent of a set.
Note, on Python 2, the compiler builds a set every time!:
In [1]: import dis
In [2]: def checkset():
...: for _ in range(10):
...: if val in {1, 2, 3}:
...: print("foo")
...:
In [3]: dis.dis(checkset)
2 0 SETUP_LOOP 49 (to 52)
3 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (10)
9 CALL_FUNCTION 1
12 GET_ITER
>> 13 FOR_ITER 35 (to 51)
16 STORE_FAST 0 (_)
3 19 LOAD_GLOBAL 1 (val)
22 LOAD_CONST 2 (1)
25 LOAD_CONST 3 (2)
28 LOAD_CONST 4 (3)
31 BUILD_SET 3
34 COMPARE_OP 6 (in)
37 POP_JUMP_IF_FALSE 13
4 40 LOAD_CONST 5 ('foo')
43 PRINT_ITEM
44 PRINT_NEWLINE
45 JUMP_ABSOLUTE 13
48 JUMP_ABSOLUTE 13
>> 51 POP_BLOCK
>> 52 LOAD_CONST 0 (None)
55 RETURN_VALUE
IMO, there's essentially no such thing as "idiomatic" usage of literal values as shown in the question. Such values look like "magic numbers" to me. Using literals for "performance" is probably misguided because it sacrifices readability for marginal gains. In cases where performance really matters, using literals is unlikely to help much and there are better options regardless.
I think the idiomatic thing to do would be to store such values in a global or class variable, especially if you're using them in multiple places (but also even if you aren't). This provides some documentation as to what a value's purpose is and makes it easier to update. You can then memomize these values in function/method definitions to improve performance if necessary.
As to what type of data structure is most appropriate, that would depend on what your program does and how it uses the data. For example, does ordering matter? With an if x in y, it won't, but maybe you're using the data in a for and an if. Without context, it's hard to say what the best choice would be.
Here's an example I think is readable, extensible, and also efficient. Memoizing the global ITEMS in the function definitions makes lookup fast because items is in the local namespace of the function. If you look at the disassembled code, you'll see that items is looked up via LOAD_FAST instead of LOAD_GLOBAL. This approach also avoids making multiple copies of the list of items, which might be relevant if it's big enough (although, if it was big enough, you probably wouldn't try to inline it anyway). Personally, I wouldn't bother with these kinds of optimizations most of the time, but they can be useful in some cases.
# In real code, this would have a domain-specific name instead of the
# generic `ITEMS`.
ITEMS = {'a', 'b', 'c'}
def filter_in_items(values, items=ITEMS):
matching_items = []
for value in values:
if value in items:
matching_items.append(value)
return matching_items
def filter_not_in_items(values, items=ITEMS):
non_matching_items = []
for value in values:
if value not in items:
non_matching_items.append(value)
return non_matching_items
print(filter_in_items(('a', 'x'))) # -> ['a']
print(filter_not_in_items(('a', 'x'))) # -> ['x']
import dis
dis.dis(filter_in_items)

Is it better to use nested loop for bigger repetitions or just put the entire range into one loop? Which is faster / less complex?

Which one is better?
for x in range(0,100):
print("Lorem Ipsum")
for x in range(0,10):
for y in range(0,10):
print("Lorem Ipsum")
The second one is harder to read and you construct an unnecessary range iterable (a list in Python 2, a less memory consuming and faster to create range object in Python 3).
From the unnecessary iterable the inner for loop constructs an unnecessary iterator (a list_iterator in Python 2, a range_iterator in Python 3).
The first one is more readable and easier understandable. Use that.
Regarding performance, I doubt it makes any difference and if it does, the 0-100 is faster, because it has smaller code (if the double loop is not optimized away) and thus a smaller code path.
When in doubt about such things, use the one that is easier to understand when you read the code. Premature optimization is a sin.
You can use dis from dis module to disassemble and analyse the bytecode of wich one of your loops is better (in a way your loops needs less memory, less iterators, etc ...).
Here is a traceback:
from dis import dis
def loop1():
for x in range(100):
pass
def loop2():
for x in range(10):
for j in range(10):
pass
Now look under the hood of each loop:
dis(loop1)
2 0 SETUP_LOOP 20 (to 23)
3 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (100)
9 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
12 GET_ITER
>> 13 FOR_ITER 6 (to 22)
16 STORE_FAST 0 (x)
3 19 JUMP_ABSOLUTE 13
>> 22 POP_BLOCK
>> 23 LOAD_CONST 0 (None)
26 RETURN_VALUE
And look at the amount of data and operations needed in your second loop:
dis(loop2)
2 0 SETUP_LOOP 43 (to 46)
3 LOAD_GLOBAL 0 (range)
6 LOAD_CONST 1 (10)
9 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
12 GET_ITER
>> 13 FOR_ITER 29 (to 45)
16 STORE_FAST 0 (x)
3 19 SETUP_LOOP 20 (to 42)
22 LOAD_GLOBAL 0 (range)
25 LOAD_CONST 1 (10)
28 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
31 GET_ITER
>> 32 FOR_ITER 6 (to 41)
35 STORE_FAST 1 (j)
4 38 JUMP_ABSOLUTE 32
>> 41 POP_BLOCK
>> 42 JUMP_ABSOLUTE 13
>> 45 POP_BLOCK
>> 46 LOAD_CONST 0 (None)
49 RETURN_VALUE
Because, both of loops do the same thing, the first one is a far better.
Just imagine how would you modify the nested loop for 101 iterations instead of 100 and the disadvantage is clear.

Is a constant list used in a loop constructed/deleted with each pass?

Will the following snippet create and destroy the list of constants on each loop, incurring whatever (albeit small) overhead this implies, or is the list created once?
for i in <some-type-of-iterable>:
if i in [1,3,5,18,3457,40567]:
print(i)
I am asking about both the Python "standard", such one as exists, and about the common CPython implementation.
I am aware that this example is contrived, as well as that trying to worry about performance using CPython is silly, but I am just curious.
This depends on the python implementation and version and how the "constant lists" are used. On Cpython2.7.10 with your example, it looks like the answer is that the list in the condition of the if statement is only created once...
>>> def foo():
... for i in iterable:
... if i in [1, 3, 5]:
... print(i)
...
>>> import dis
>>> dis.dis(foo)
2 0 SETUP_LOOP 34 (to 37)
3 LOAD_GLOBAL 0 (iterable)
6 GET_ITER
>> 7 FOR_ITER 26 (to 36)
10 STORE_FAST 0 (i)
3 13 LOAD_FAST 0 (i)
16 LOAD_CONST 4 ((1, 3, 5))
19 COMPARE_OP 6 (in)
22 POP_JUMP_IF_FALSE 7
4 25 LOAD_FAST 0 (i)
28 PRINT_ITEM
29 PRINT_NEWLINE
30 JUMP_ABSOLUTE 7
33 JUMP_ABSOLUTE 7
>> 36 POP_BLOCK
>> 37 LOAD_CONST 0 (None)
40 RETURN_VALUE
Notice: 16 LOAD_CONST 4 ((1, 3, 5))
Python's peephole optimizer has turned our list into a tuple (thanks python!) and stored it as a constant. Note that the peephole optimizer can only do these transforms on objects if it knows that you as the programmer have absolutely no way of getting a reference to the list (otherwise, you could mutate the list and change the meaning of the code). As far as I'm aware, they only do this optimization for list, set literals that are composed of entirely constants and are the RHS of an in operator. There might be other cases that I'm not aware of (dis.dis is your friend for finding these optimizations).
I hinted at it above, but you can do the same thing with set-literals in more recent versions of python (in python3.2+, the set is converted to a constant frozenset). The benefit there is that set/frozenset have faster membership testing on average than list/tuple.
Another example with Python 3.5, list is created for each iteration.
>>> import dis
>>> def func():
... for i in iterable:
... for j in [1,2,3]:
... print(i+j)
...
>>> dis.dis(func)
2 0 SETUP_LOOP 54 (to 57)
3 LOAD_GLOBAL 0 (iterable)
6 GET_ITER
>> 7 FOR_ITER 46 (to 56)
10 STORE_FAST 0 (i)
3 13 SETUP_LOOP 37 (to 53)
16 LOAD_CONST 1 (1) # building list
19 LOAD_CONST 2 (2)
22 LOAD_CONST 3 (3)
25 BUILD_LIST 3
28 GET_ITER
>> 29 FOR_ITER 20 (to 52) # inner loop body begin
32 STORE_FAST 1 (j)
4 35 LOAD_GLOBAL 1 (print)
38 LOAD_FAST 0 (i)
41 LOAD_FAST 1 (j)
44 BINARY_ADD
45 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
48 POP_TOP
49 JUMP_ABSOLUTE 29 # inner loop body end
>> 52 POP_BLOCK
>> 53 JUMP_ABSOLUTE 7 # outer loop end,
# jumping back before list creation
>> 56 POP_BLOCK
>> 57 LOAD_CONST 0 (None)
60 RETURN_VALUE

python - performance difference between the two implementations

How are the following two implementations have different performance in Python?
from cStringIO import StringIO
from itertools import imap
from sys import stdin
input = imap(int, StringIO(stdin.read()))
print '\n'.join(imap(str, sorted(input)))
AND
import sys
for line in sys.stdin:
l.append(int(line.strip('\n')))
l.sort()
for x in l:
print x
The first implementation is faster than the second for inputs of the order of 10^6 lines. Why so?
>>> dis.dis(first)
2 0 LOAD_GLOBAL 0 (imap)
3 LOAD_GLOBAL 1 (int)
6 LOAD_GLOBAL 2 (StringIO)
9 LOAD_GLOBAL 3 (stdin)
12 LOAD_ATTR 4 (read)
15 CALL_FUNCTION 0
18 CALL_FUNCTION 1
21 CALL_FUNCTION 2
24 STORE_FAST 0 (input)
27 LOAD_CONST 0 (None)
30 RETURN_VALUE
>>> dis.dis(second)
2 0 SETUP_LOOP 48 (to 51)
3 LOAD_GLOBAL 0 (sys)
6 LOAD_ATTR 1 (stdin)
9 CALL_FUNCTION 0
12 GET_ITER
>> 13 FOR_ITER 34 (to 50)
16 STORE_FAST 0 (line)
3 19 LOAD_GLOBAL 2 (l)
22 LOAD_ATTR 3 (append)
25 LOAD_GLOBAL 4 (int)
28 LOAD_FAST 0 (line)
31 LOAD_ATTR 5 (strip)
34 LOAD_CONST 1 ('\n')
37 CALL_FUNCTION 1
40 CALL_FUNCTION 1
43 CALL_FUNCTION 1
46 POP_TOP
47 JUMP_ABSOLUTE 13
>> 50 POP_BLOCK
4 >> 51 LOAD_GLOBAL 2 (l)
54 LOAD_ATTR 6 (sort)
57 CALL_FUNCTION 0
60 POP_TOP
61 LOAD_CONST 0 (None)
64 RETURN_VALUE
first is your first function.
second is your second function.
dis tells one of the reasons why the first one is faster.
Two primary reasons:
The 2nd code explicitly constructs a list and sorts it afterwards, while the 1st version lets sorted create only a internal list while sorting at the same time.
The 2nd code explicitly loops over a list with for (on the Python VM), while the 1st version implicitly loops with imap (over the underlaying structure in C).
Anyways, why is StringIO in there? The most straightforward and probably fastest way is:
from sys import stdin, stdout
stdout.writelines(sorted(stdin, key=int))
Do a step-by-step conversion from the second to the first one and see how the performance changes with each step.
Remove line.strip. This will cause some speed up, whether it would be significant is another matter. The stripping is superfluous as has been mentioned by you and THC4k.
Then replace the for loop using l.append with map(int, sys.stdin). My guess is that this would give a significant speed-up.
Replace map and l.sort with imap and sorted. My guess is that it won't affect the performance, there could be a slight slowdown, but it would be far from significant. Between the two, I'd usually go with the former, but with Python 3 on the horizon the latter is probably preferable.
Replace the for loop using print with print '\n'.join(...). My guess is that this would be another speed-up, but it would cost you some memory.
Add cStringIO (which is completely unnecessary by the way) to see how it affects performance. My guess is that it would be slightly slower, but not enough to counter 4 and 2.
Then, if you try THC4k's answer, it would probably be faster than all of the above, while being simpler and easier to read, and using less memory than 4 and 5. It has slightly different behaviour (it doesn't strip leading zeros from the numbers).
Of course, try this yourself instead of trusting anyone guesses. Also run cProfile on your code and see which parts are losing most time.

Categories

Resources