Which of the following if statements is more Pythonic?
if not a and not b:
do_something
OR
if not ( a or b ):
do something
Its not predicate logic so I should use the Python key words because its more readable right?
In the later solution more optimal than the other? (I don't believe so.)
Is there any PEP-8 guides on this?
Byte code of the two approaches(if it matters):
In [43]: def func1():
if not a and not b:
return
....:
....:
In [46]: def func2():
if not(a or b):
return
....:
....:
In [49]: dis.dis(func1)
2 0 LOAD_GLOBAL 0 (a)
3 UNARY_NOT
4 JUMP_IF_FALSE 13 (to 20)
7 POP_TOP
8 LOAD_GLOBAL 1 (b)
11 UNARY_NOT
12 JUMP_IF_FALSE 5 (to 20)
15 POP_TOP
3 16 LOAD_CONST 0 (None)
19 RETURN_VALUE
>> 20 POP_TOP
21 LOAD_CONST 0 (None)
24 RETURN_VALUE
In [50]: dis.dis(func2)
2 0 LOAD_GLOBAL 0 (a)
3 JUMP_IF_TRUE 4 (to 10)
6 POP_TOP
7 LOAD_GLOBAL 1 (b)
>> 10 JUMP_IF_TRUE 5 (to 18)
13 POP_TOP
3 14 LOAD_CONST 0 (None)
17 RETURN_VALUE
>> 18 POP_TOP
19 LOAD_CONST 0 (None)
22 RETURN_VALUE
I'd say whichever is easier for you to read, depending on what a and b are.
I think both your examples are equally readable, however if I wanted to "push the boat out" on readability I would go with:
not any((a, b))
Since to me this reads much more like English, and hence is the most Pythonic.
Which to use? Whichever is more readable for what you're trying to do.
As to which is more efficient, the first one does do an extra not so it is technically less efficient, but not so you'd notice in a normal situation.
They are equivalent and whether one is faster than the other depends on circumstances (the values of a and b).
So just choose the version which you find most readable and/or understandable.
I personally like the Eiffel approach, put into pythonic form
if a and then b:
dosomething
if a and b:
dosomething
The first approach differs from the second if a is false. It doesn't evaluate b in the first case, in the second it does.
The or equivalent is "or else"
http://en.wikipedia.org/wiki/Short-circuit_evaluation
and/or are eager.
and then/or else short circuit the evaluation
The nice thing about the syntax is that it reads well, and it doesn't introduce new keywords.
For a piece of code to be Pythonic, it must be both pleasing to the reader in and of itself (readable) and in the context of its surroundings (consistent). Without having the context of this piece of code, a good opinion is hard to give.
But, on the other hand... If I were being Pythonic in my opinion giving I would need to operate consistently with my surroundings, which seem not to take context into consideration (e.g. the OP).
The top one.
Related
I want to ask that is using magic methods( like int.__add__()) is quicker than using operators (like +) ?
will it make a difference even by a bit?
thanks.
Here is the disassembled byte code for 3 different ways of adding.
import dis
def add1(a, b):
return a + b
dis.dis(add1)
2 0 LOAD_FAST 0 (a)
2 LOAD_FAST 1 (b)
4 BINARY_ADD
6 RETURN_VALUE
def add2(a, b):
return a.__add__(b)
dis.dis(add2)
2 0 LOAD_FAST 0 (a)
2 LOAD_ATTR 0 (__add__)
4 LOAD_FAST 1 (b)
6 CALL_FUNCTION 1
8 RETURN_VALUE
def add3(a, b):
return int.__add__(a, b)
dis.dis(add3)
2 0 LOAD_GLOBAL 0 (int)
2 LOAD_ATTR 1 (__add__)
4 LOAD_FAST 0 (a)
6 LOAD_FAST 1 (b)
8 CALL_FUNCTION 2
10 RETURN_VALUE
a+b generates the simplest byte code, but I expect that the interpreter's code for BINARY_ADD simply calls the first arguments's __add__() method, so it's effectively the same as a.__add__(b).
int.__add__(a, b) looks like it might be faster because it doesn't have to find the method for a specific object, but looking up the int.__add__ attribute may be just as expensive.
If you really want to find out which is best, I suggest you run benchmarks.
This question already has answers here:
"x not in y" or "not x in y"
(6 answers)
Closed 9 years ago.
I've noticed that both of these work the same:
if x not in list and if not x in list.
Is there some sort of difference between the two in certain cases? Is there a reason for having both, or is it just because it's more natural for some people to write one or the other?
Which one am I more likely to see in other people's code?
The two forms make identical bytecode, as you can clearly verify:
>>> import dis
>>> dis.dis(compile('if x not in d: pass', '', 'exec'))
1 0 LOAD_NAME 0 (x)
3 LOAD_NAME 1 (d)
6 COMPARE_OP 7 (not in)
9 JUMP_IF_FALSE 4 (to 16)
12 POP_TOP
13 JUMP_FORWARD 1 (to 17)
>> 16 POP_TOP
>> 17 LOAD_CONST 0 (None)
20 RETURN_VALUE
>>> dis.dis(compile('if not x in d: pass', '', 'exec'))
1 0 LOAD_NAME 0 (x)
3 LOAD_NAME 1 (d)
6 COMPARE_OP 7 (not in)
9 JUMP_IF_FALSE 4 (to 16)
12 POP_TOP
13 JUMP_FORWARD 1 (to 17)
>> 16 POP_TOP
>> 17 LOAD_CONST 0 (None)
20 RETURN_VALUE
so obviously they're semantically identical.
As a matter of style, PEP 8 does not mention the issue.
Personally, I strongly prefer the if x not in y form -- that makes it immediately clear that not in is a single operator, and "reads like English". if not x in y may mislead some readers into thinking it means if (not x) in y, reads a bit less like English, and has absolutely no compensating advantages.
>>> dis.dis(lambda: a not in b)
1 0 LOAD_GLOBAL 0 (a)
3 LOAD_GLOBAL 1 (b)
6 COMPARE_OP 7 (not in)
9 RETURN_VALUE
>>> dis.dis(lambda: not a in b)
1 0 LOAD_GLOBAL 0 (a)
3 LOAD_GLOBAL 1 (b)
6 COMPARE_OP 7 (not in)
9 RETURN_VALUE
when you do "not a in b" it will need be converted for (not in)
so, the right way is "a not in b".
not x in L isn't explicitly disallowed because that would be silly. x not in L is explicitly allowed (though it compiles to the same bytecode) because it's more readable.
x not in L is what everyone uses, though.
When you write a not in b it is using the not in operator, whereas not a in b uses the in operator and then negates the result. But the not in operator is defined to return the same as not a in b so they do exactly the same thing. From the documentation:
The operators in and not in test for collection membership. x in s evaluates to true if x is a member of the collection s, and false otherwise. x not in s returns the negation of x in s.
Similarly there is a is not b versus not a is b.
The extra syntax was added because it makes it easier for a human to read it naturally.
It just personal preference. You could also compare if x != 3 and if not x == 3. There's no difference between the two expressions you've shown.
How are the following two implementations have different performance in Python?
from cStringIO import StringIO
from itertools import imap
from sys import stdin
input = imap(int, StringIO(stdin.read()))
print '\n'.join(imap(str, sorted(input)))
AND
import sys
for line in sys.stdin:
l.append(int(line.strip('\n')))
l.sort()
for x in l:
print x
The first implementation is faster than the second for inputs of the order of 10^6 lines. Why so?
>>> dis.dis(first)
2 0 LOAD_GLOBAL 0 (imap)
3 LOAD_GLOBAL 1 (int)
6 LOAD_GLOBAL 2 (StringIO)
9 LOAD_GLOBAL 3 (stdin)
12 LOAD_ATTR 4 (read)
15 CALL_FUNCTION 0
18 CALL_FUNCTION 1
21 CALL_FUNCTION 2
24 STORE_FAST 0 (input)
27 LOAD_CONST 0 (None)
30 RETURN_VALUE
>>> dis.dis(second)
2 0 SETUP_LOOP 48 (to 51)
3 LOAD_GLOBAL 0 (sys)
6 LOAD_ATTR 1 (stdin)
9 CALL_FUNCTION 0
12 GET_ITER
>> 13 FOR_ITER 34 (to 50)
16 STORE_FAST 0 (line)
3 19 LOAD_GLOBAL 2 (l)
22 LOAD_ATTR 3 (append)
25 LOAD_GLOBAL 4 (int)
28 LOAD_FAST 0 (line)
31 LOAD_ATTR 5 (strip)
34 LOAD_CONST 1 ('\n')
37 CALL_FUNCTION 1
40 CALL_FUNCTION 1
43 CALL_FUNCTION 1
46 POP_TOP
47 JUMP_ABSOLUTE 13
>> 50 POP_BLOCK
4 >> 51 LOAD_GLOBAL 2 (l)
54 LOAD_ATTR 6 (sort)
57 CALL_FUNCTION 0
60 POP_TOP
61 LOAD_CONST 0 (None)
64 RETURN_VALUE
first is your first function.
second is your second function.
dis tells one of the reasons why the first one is faster.
Two primary reasons:
The 2nd code explicitly constructs a list and sorts it afterwards, while the 1st version lets sorted create only a internal list while sorting at the same time.
The 2nd code explicitly loops over a list with for (on the Python VM), while the 1st version implicitly loops with imap (over the underlaying structure in C).
Anyways, why is StringIO in there? The most straightforward and probably fastest way is:
from sys import stdin, stdout
stdout.writelines(sorted(stdin, key=int))
Do a step-by-step conversion from the second to the first one and see how the performance changes with each step.
Remove line.strip. This will cause some speed up, whether it would be significant is another matter. The stripping is superfluous as has been mentioned by you and THC4k.
Then replace the for loop using l.append with map(int, sys.stdin). My guess is that this would give a significant speed-up.
Replace map and l.sort with imap and sorted. My guess is that it won't affect the performance, there could be a slight slowdown, but it would be far from significant. Between the two, I'd usually go with the former, but with Python 3 on the horizon the latter is probably preferable.
Replace the for loop using print with print '\n'.join(...). My guess is that this would be another speed-up, but it would cost you some memory.
Add cStringIO (which is completely unnecessary by the way) to see how it affects performance. My guess is that it would be slightly slower, but not enough to counter 4 and 2.
Then, if you try THC4k's answer, it would probably be faster than all of the above, while being simpler and easier to read, and using less memory than 4 and 5. It has slightly different behaviour (it doesn't strip leading zeros from the numbers).
Of course, try this yourself instead of trusting anyone guesses. Also run cProfile on your code and see which parts are losing most time.
This question already has answers here:
"x not in y" or "not x in y"
(6 answers)
Closed 9 years ago.
I've noticed that both of these work the same:
if x not in list and if not x in list.
Is there some sort of difference between the two in certain cases? Is there a reason for having both, or is it just because it's more natural for some people to write one or the other?
Which one am I more likely to see in other people's code?
The two forms make identical bytecode, as you can clearly verify:
>>> import dis
>>> dis.dis(compile('if x not in d: pass', '', 'exec'))
1 0 LOAD_NAME 0 (x)
3 LOAD_NAME 1 (d)
6 COMPARE_OP 7 (not in)
9 JUMP_IF_FALSE 4 (to 16)
12 POP_TOP
13 JUMP_FORWARD 1 (to 17)
>> 16 POP_TOP
>> 17 LOAD_CONST 0 (None)
20 RETURN_VALUE
>>> dis.dis(compile('if not x in d: pass', '', 'exec'))
1 0 LOAD_NAME 0 (x)
3 LOAD_NAME 1 (d)
6 COMPARE_OP 7 (not in)
9 JUMP_IF_FALSE 4 (to 16)
12 POP_TOP
13 JUMP_FORWARD 1 (to 17)
>> 16 POP_TOP
>> 17 LOAD_CONST 0 (None)
20 RETURN_VALUE
so obviously they're semantically identical.
As a matter of style, PEP 8 does not mention the issue.
Personally, I strongly prefer the if x not in y form -- that makes it immediately clear that not in is a single operator, and "reads like English". if not x in y may mislead some readers into thinking it means if (not x) in y, reads a bit less like English, and has absolutely no compensating advantages.
>>> dis.dis(lambda: a not in b)
1 0 LOAD_GLOBAL 0 (a)
3 LOAD_GLOBAL 1 (b)
6 COMPARE_OP 7 (not in)
9 RETURN_VALUE
>>> dis.dis(lambda: not a in b)
1 0 LOAD_GLOBAL 0 (a)
3 LOAD_GLOBAL 1 (b)
6 COMPARE_OP 7 (not in)
9 RETURN_VALUE
when you do "not a in b" it will need be converted for (not in)
so, the right way is "a not in b".
not x in L isn't explicitly disallowed because that would be silly. x not in L is explicitly allowed (though it compiles to the same bytecode) because it's more readable.
x not in L is what everyone uses, though.
When you write a not in b it is using the not in operator, whereas not a in b uses the in operator and then negates the result. But the not in operator is defined to return the same as not a in b so they do exactly the same thing. From the documentation:
The operators in and not in test for collection membership. x in s evaluates to true if x is a member of the collection s, and false otherwise. x not in s returns the negation of x in s.
Similarly there is a is not b versus not a is b.
The extra syntax was added because it makes it easier for a human to read it naturally.
It just personal preference. You could also compare if x != 3 and if not x == 3. There's no difference between the two expressions you've shown.
I was trying to figure out which integers python only instantiates once (-6 to 256 it seems), and in the process stumbled on some string behaviour I can't see the pattern in. Sometimes, equal strings created in different ways share the same id, sometimes not. This code:
A = "10000"
B = "10000"
C = "100" + "00"
D = "%i"%10000
E = str(10000)
F = str(10000)
G = str(100) + "00"
H = "0".join(("10","00"))
for obj in (A,B,C,D,E,F,G,H):
print obj, id(obj), obj is A
prints:
10000 4959776 True
10000 4959776 True
10000 4959776 True
10000 4959776 True
10000 4959456 False
10000 4959488 False
10000 4959520 False
10000 4959680 False
I don't even see the pattern - save for the fact that the first four don't have an explicit function call - but surely that can't be it, since the "+" in C for example implies a function call to add. I especially don't understand why C and G are different, seeing as that implies that the ids of the components of the addition are more important than the outcome.
So, what is the special treatment that A-D undergo, making them come out as the same instance?
In terms of language specification, any compliant Python compiler and runtime is fully allowed, for any instance of an immutable type, to make a new instance OR find an existing instance of the same type that's equal to the required value and use a new reference to that same instance. This means it's always incorrect to use is or by-id comparison among immutables, and any minor release may tweak or change strategy in this matter to enhance optimization.
In terms of implementations, the tradeoff are pretty clear: trying to reuse an existing instance may mean time spent (perhaps wasted) trying to find such an instance, but if the attempt succeeds then some memory is saved (as well as the time to allocate and later free the memory bits needed to hold a new instance).
How to solve those implementation tradeoffs is not entirely obvious -- if you can identify heuristics that indicate that finding a suitable existing instance is likely and the search (even if it fails) will be fast, then you may want to attempt the search-and-reuse when the heuristics suggest it, but skip it otherwise.
In your observations you seem to have found a particular dot-release implementation that performs a modicum of peephole optimization when that's entirely safe, fast, and simple, so the assignments A to D all boil down to exactly the same as A (but E to F don't, as they involve named functions or methods that the optimizer's authors may reasonably have considered not 100% safe to assume semantics for -- and low-ROI if that was done -- so they're not peephole-optimized).
Thus, A to D reusing the same instance boils down to A and B doing so (as C and D get peephole-optimized to exactly the same construct).
That reuse, in turn, clearly suggests compiler tactics/optimizer heuristics whereby identical literal constants of an immutable type in the same function's local namespace are collapsed to references to just one instance in the function's .func_code.co_consts (to use current CPython's terminology for attributes of functions and code objects) -- reasonable tactics and heuristics, as reuse of the same immutable constant literal within one function are somewhat frequent, AND the price is only paid once (at compile time) while the advantage is accrued many times (every time the function runs, maybe within loops etc etc).
(It so happens that these specific tactics and heuristics, given their clearly-positive tradeoffs, have been pervasive in all recent versions of CPython, and, I believe, IronPython, Jython, and PyPy as well;-).
This is a somewhat worthy and interesting are of study if you're planning to write compilers, runtime environments, peephole optimizers, etc etc, for Python itself or similar languages. I guess that deep study of the internals (ideally of many different correct implementations, of course, so as not to fixate on the quirks of a specific one -- good thing Python currently enjoys at least 4 separate production-worthy implementations, not to mention several versions of each!) can also help, indirectly, make one a better Python programmer -- but it's particularly important to focus on what's guaranteed by the language itself, which is somewhat less than what you'll find in common among separate implementations, because the parts that "just happen" to be in common right now (without being required to be so by the language specs) may perfectly well change under you at the next point release of one or another implementation and, if your production code was mistakenly relying on such details, that might cause nasty surprises;-). Plus -- it's hardly ever necessary, or even particularly helpful, to rely on such variable implementation details rather than on language-mandated behavior (unless you're coding something like an optimizer, debugger, profiler, or the like, of course;-).
Python is allowed to inline string constants; A,B,C,D are actually the same literals (if Python sees a constant expression, it treats it as a constant).
str is actually a class, so str(whatever) is calling this class' constructor, which should yield a fresh object. This explains E,F,G (note that each of these has separate identity).
As for H, I am not sure, but I'd go for explanation that this expression is too complicated for Python to figure out it's actually a constant, so it computes a new string.
I believe short strings that can be evaluated at compile time, will be interned automatically. In the last examples, the result can't be evaluated at compile time because str or join might be redefined.
in answer to S.Lott's suggestion of examining the byte code:
import dis
def moo():
A = "10000"
B = "10000"
C = "100" + "00"
D = "%i"%10000
E = str(10000)
F = str(10000)
G = "1000"+str(0)
H = "0".join(("10","00"))
I = str("10000")
for obj in (A,B,C,D,E,F,G,H, I):
print obj, id(obj), obj is A
moo()
print dis.dis(moo)
yields:
10000 4968128 True
10000 4968128 True
10000 4968128 True
10000 4968128 True
10000 2840928 False
10000 2840896 False
10000 2840864 False
10000 2840832 False
10000 4968128 True
4 0 LOAD_CONST 1 ('10000')
3 STORE_FAST 0 (A)
5 6 LOAD_CONST 1 ('10000')
9 STORE_FAST 1 (B)
6 12 LOAD_CONST 10 ('10000')
15 STORE_FAST 2 (C)
7 18 LOAD_CONST 11 ('10000')
21 STORE_FAST 3 (D)
8 24 LOAD_GLOBAL 0 (str)
27 LOAD_CONST 5 (10000)
30 CALL_FUNCTION 1
33 STORE_FAST 4 (E)
9 36 LOAD_GLOBAL 0 (str)
39 LOAD_CONST 5 (10000)
42 CALL_FUNCTION 1
45 STORE_FAST 5 (F)
10 48 LOAD_CONST 6 ('1000')
51 LOAD_GLOBAL 0 (str)
54 LOAD_CONST 7 (0)
57 CALL_FUNCTION 1
60 BINARY_ADD
61 STORE_FAST 6 (G)
11 64 LOAD_CONST 8 ('0')
67 LOAD_ATTR 1 (join)
70 LOAD_CONST 12 (('10', '00'))
73 CALL_FUNCTION 1
76 STORE_FAST 7 (H)
12 79 LOAD_GLOBAL 0 (str)
82 LOAD_CONST 1 ('10000')
85 CALL_FUNCTION 1
88 STORE_FAST 8 (I)
14 91 SETUP_LOOP 66 (to 160)
94 LOAD_FAST 0 (A)
97 LOAD_FAST 1 (B)
100 LOAD_FAST 2 (C)
103 LOAD_FAST 3 (D)
106 LOAD_FAST 4 (E)
109 LOAD_FAST 5 (F)
112 LOAD_FAST 6 (G)
115 LOAD_FAST 7 (H)
118 LOAD_FAST 8 (I)
121 BUILD_TUPLE 9
124 GET_ITER
>> 125 FOR_ITER 31 (to 159)
128 STORE_FAST 9 (obj)
15 131 LOAD_FAST 9 (obj)
134 PRINT_ITEM
135 LOAD_GLOBAL 2 (id)
138 LOAD_FAST 9 (obj)
141 CALL_FUNCTION 1
144 PRINT_ITEM
145 LOAD_FAST 9 (obj)
148 LOAD_FAST 0 (A)
151 COMPARE_OP 8 (is)
154 PRINT_ITEM
155 PRINT_NEWLINE
156 JUMP_ABSOLUTE 125
>> 159 POP_BLOCK
>> 160 LOAD_CONST 0 (None)
163 RETURN_VALUE
so it would seem that indeed the compiler understands A-D to mean the same thing, and so it saves memory by only generating it once (as suggested by Alex,Maciej and Greg). (added case I seems to just be str() realising it's trying to make a string from a string, and just passing it through.)
Thanks everyone, that's a lot clearer now.