Why is it that according to the timeit.timeit function the code boolean = True if foo else False runs faster than the code boolean = bool(foo)?
How is it that the if statement is able to determine the trueness of foo faster then the bool function itself?
Why doesn't the bool function simply use the same mechanic?
And what is the purpose of the bool function when it can be outperformed by a factor of four by a different technique?
Or, is it so that I am misusing the timeit function and that bool(foo) is, in fact, faster?
>>> timeit.timeit("boolean = True if foo else False", setup="foo='zon-zero'")
0.021019499999965774
>>> timeit.timeit("boolean = bool(foo)", setup="foo='zon-zero'")
0.0684856000000309
>>> timeit.timeit("boolean = True if foo else False", setup="foo=''")
0.019911300000103438
>>> timeit.timeit("boolean = bool(foo)", setup="foo=''")
0.09232059999999365
Looking at these results, True if foo else False seems to be four to five times faster than bool(foo).
I suspect that the difference in speed is caused by the overhead of calling a function and that does indeed seem to be the case when I use the dis module.
>>> dis.dis("boolean = True if foo else False")
1 0 LOAD_NAME 0 (foo)
2 POP_JUMP_IF_FALSE 8
4 LOAD_CONST 0 (True)
6 JUMP_FORWARD 2 (to 10)
>> 8 LOAD_CONST 1 (False)
>> 10 STORE_NAME 1 (boolean)
12 LOAD_CONST 2 (None)
14 RETURN_VALUE
>>> dis.dis("boolean = bool(foo)")
1 0 LOAD_NAME 0 (bool)
2 LOAD_NAME 1 (foo)
4 CALL_FUNCTION 1
6 STORE_NAME 2 (boolean)
8 LOAD_CONST 0 (None)
10 RETURN_VALUE
According to the dis module, than the difference between the two techniques is:
2 POP_JUMP_IF_FALSE 8
4 LOAD_CONST 0 (True)
6 JUMP_FORWARD 2 (to 10)
>> 8 LOAD_CONST 1 (False)
versus
0 LOAD_NAME 1 (bool)
4 CALL_FUNCTION 1
which makes it look like either the call to a function is far too expensive for something as simple as determining a boolean value or the bool function has been written very inefficiently.
But that actually makes me wonder why anyone would use the bool function when it is this much slower and why the bool function even exists when python does not even seem to use it internally.
So, is the bool function slower because it has been written inefficiently, because of the function overhead, or because of a different reason?
And why would anyone use the bool function when a much faster and equally clear alternative is available?
As per Python documentation :
class bool( [ x ] )
Return a Boolean value, i.e. one of True or False. x is converted using the standard truth testing
procedure. If x is false or omitted, this returns False; otherwise it returns True. The bool class is a
subclass of int (see Numeric Types — int, float, complex). It cannot be subclassed further. Its only
instances are False and True
So, when you directly use the object itself (like foo), the interpreter uses its foo.__bool__ property. But the bool function is a wrapper that again calls foo.__bool__
As you said, calling the function made it expensive.
And the use of bool is, there are certain situations where you need the boolean value of an object and need to refer it by a variable.
x = bool(my_object)
Writing x = my_object doesn't work.
Here its useful.
Sometimes bool(foo) is more readable where you can ignore small time lags.
You might be also interested in knowing that
x = {}
is faster than
x = dict()
Find out why... :)
Related
Is there any practical difference between list(iterable) and [*iterable] in versions of Python that support the latter?
list(x) is a function, [*x] is an expression. You can reassign list, and make it do something else (but you shouldn't).
Talking about cPython, b = list(a) translates to this sequence of bytecodes:
LOAD_NAME 1 (list)
LOAD_NAME 0 (a)
CALL_FUNCTION 1
STORE_NAME 2 (b)
Instead, c = [*a] becomes:
LOAD_NAME 0 (a)
BUILD_LIST_UNPACK 1
STORE_NAME 3 (c)
so you can argue that [*a] might be slightly more efficient, but marginally so.
You can use the standard library module dis to investigate the byte code generated by a function. In this case:
import dis
def call_list(x):
return list(x)
def unpacking(x):
return [*x]
dis.dis(call_list)
# 2 0 LOAD_GLOBAL 0 (list)
# 2 LOAD_FAST 0 (x)
# 4 CALL_FUNCTION 1
# 6 RETURN_VALUE
dis.dis(unpacking)
# 2 0 LOAD_FAST 0 (x)
# 2 BUILD_LIST_UNPACK 1
# 4 RETURN_VALUE
So there is a difference and it is not only the loading of the globally defined name list, which does not need to happen with the unpacking. So it boils down to how the built-in list function is defined and what exactly BUILD_LIST_UNPACK does.
Note that both are actually a lot less code than writing a standard list comprehension for this:
def list_comp(x):
return [a for a in x]
dis.dis(list_comp)
# 2 0 LOAD_CONST 1 (<code object <listcomp> at 0x7f65356198a0, file "<ipython-input-46-dd71fb182ec7>", line 2>)
# 2 LOAD_CONST 2 ('list_comp.<locals>.<listcomp>')
# 4 MAKE_FUNCTION 0
# 6 LOAD_FAST 0 (x)
# 8 GET_ITER
# 10 CALL_FUNCTION 1
# 12 RETURN_VALUE
Since [*iterable] is unpacking, it accepts assignment-like syntax, unlike list(iterable):
>>> [*[]] = []
>>> list([]) = []
File "<stdin>", line 1
SyntaxError: can't assign to function call
You can read more about this here (not useful though).
You can also use list(sequence=iterable), i.e. with a key-word argument:
>>> list(sequence=[])
[]
Again not useful.
There's always going to be some differences between two constructs that do the same thing. Thing is, I wouldn't say the differences in this case are actually practical. Both are expressions that take the iterable, iterate through it and then create a list out of it.
The contract is the same: input is an iterable output is a list populated by the iterables elements.
Yes, list can be rebound to a different name; list(it) is a function call while [*it] is a list display; [*it] is faster with smaller iterables but generally performs the same with larger ones. Heck, one could even throw in the fact that [*it] is three less keystrokes.
Are these practical though? Would I think of them when trying to get a list out of an iterable? Well, maybe the keystrokes in order to stay under 79 characters and get the linter to shut it up.
Apparently there’s a performance difference in CPython, where [*a] overallocates and list() doesn’t: What causes [*a] to overallocate?
From some of the answers on Stackoverflow, I came to know that from -5 to 256 same memory location is referenced thus we get true for:
>>> a = 256
>>> a is 256
True
Now comes the twist (see this line before marking duplicate):
>>> a = 257
>>> a is 257
False
This is completely understood, but now if I do:
>>> a = 257; a is 257
True
>>> a = 12345; a is 12345
True
Why?
What you're seeing is an optimization in the compiler in CPython (which compiles your source code into the bytecode that the interpreter runs). Whenever the same immutable constant value is used in several different places within the a chunk of code that is being compiled in one step, the compiler will try to use a reference to same object for each place.
So if you do multiple assignments on the same line in an interactive session, you'll get two references to the same object, but you won't if you use two separate lines:
>>> x = 257; y = 257 # multiple statements on the same line are compiled in one step
>>> print(x is y) # prints True
>>> x = 257
>>> y = 257
>>> print(x is y) # prints False this time, since the assignments were compiled separately
Another place this optimization comes up is in the body of a function. The whole function body will be compiled together, so any constants used anywhere in the function can be combined, even if they're on separate lines:
def foo():
x = 257
y = 257
return x is y # this will always return True
While it's interesting to investigate optimizations like this one, you should never rely upon this behavior in your normal code. Different Python interpreters, and even different versions of CPython may do these optimizations differently or not at all. If your code depends on a specific optimization, it may be completely broken for somebody else who tries to run it on their own system.
As an example, the two assignments on the same line I show in my first code block above doesn't result in two references to the same object when I do it in the interactive shell inside Spyder (my preferred IDE). I have no idea why that specific situation doesn't work the same way it does in a conventional interactive shell, but the different behavior is my fault, since my code relies upon implementation-specific behavior.
Generally speaking, numbers outside the range -5 to 256 will not necessarily have the optimization applied to numbers within that range. However, Python is free to apply other optimizations as appropriate. In your cause, you're seeing that the same literal value used multiple times on one line is stored in a single memory location no matter how many times it's used on that line. Here are some other examples of this behavior:
>>> s = 'a'; s is 'a'
True
>>> s = 'asdfghjklzxcvbnmsdhasjkdhskdja'; s is 'asdfghjklzxcvbnmsdhasjkdhskdja'
True
>>> x = 3.14159; x is 3.14159
True
>>> t = 'a' + 'b'; t is 'a' + 'b'
True
>>>
From python2 docs:
The operators is and is not test for object identity: x is y is true
if and only if x and y are the same object. x is not y yields the
inverse truth value. [6]
From python3 docs:
The operators is and is not test for object identity: x is y is true
if and only if x and y are the same object. Object identity is
determined using the id() function. x is not y yields the inverse
truth value. [4]
So basically the key to understand those tests you've run on the repl console is by using
accordingly the id() function, here's an example that will show you what's going on behind the curtains:
>>> a=256
>>> id(a);id(256);a is 256
2012996640
2012996640
True
>>> a=257
>>> id(a);id(257);a is 257
36163472
36162032
False
>>> a=257;id(a);id(257);a is 257
36162496
36162496
True
>>> a=12345;id(a);id(12345);a is 12345
36162240
36162240
True
That said, usually a good way to understand what's going on behind the curtains with these type of snippets is by using either dis.dis or dis.disco, let's take a look for instance what this snippet would look like:
import dis
import textwrap
dis.disco(compile(textwrap.dedent("""\
a=256
a is 256
a=257
a is 257
a=257;a is 257
a=12345;a is 12345\
"""), '', 'exec'))
the output would be:
1 0 LOAD_CONST 0 (256)
2 STORE_NAME 0 (a)
2 4 LOAD_NAME 0 (a)
6 LOAD_CONST 0 (256)
8 COMPARE_OP 8 (is)
10 POP_TOP
3 12 LOAD_CONST 1 (257)
14 STORE_NAME 0 (a)
4 16 LOAD_NAME 0 (a)
18 LOAD_CONST 1 (257)
20 COMPARE_OP 8 (is)
22 POP_TOP
5 24 LOAD_CONST 1 (257)
26 STORE_NAME 0 (a)
28 LOAD_NAME 0 (a)
30 LOAD_CONST 1 (257)
32 COMPARE_OP 8 (is)
34 POP_TOP
6 36 LOAD_CONST 2 (12345)
38 STORE_NAME 0 (a)
40 LOAD_NAME 0 (a)
42 LOAD_CONST 2 (12345)
44 COMPARE_OP 8 (is)
46 POP_TOP
48 LOAD_CONST 3 (None)
50 RETURN_VALUE
As we can see in this case the asm output doesn't tell us very much, we can see than lines 3-4 are basically the "same" instructions than line 5. So my recommendation would be once again to use id() smartly so you'll know what's is will compare. In case you want to know exactly the type of optimizations cpython is doing I'm afraid you'd need to dig out in its source code
After discussion and testing in various versions, the final conclusions can be drawn.
Python will interpret and compile instructions in blocks. Depending on the syntax used, Python version, Operating System, distribution, different results may be achieved depending on what instructions Python takes in one block.
The general rules are:
(from official documentation)
The current implementation keeps an array of integer objects for all
integers between -5 and 256
Therefore:
a = 256
id(a)
Out[2]: 1997190544
id(256)
Out[3]: 1997190544 # int actually stored once within Python
a = 257
id(a)
Out[5]: 2365489141456
id(257)
Out[6]: 2365489140880 #literal, temporary. as you see the ids differ
id(257)
Out[7]: 2365489142192 # literal, temporary. as you see it gets a new id everytime
# since it is not pre-stored
The part below returns False in Python 3.6.3 |Anaconda custom (64-bit)| (default, Oct 17 2017, 23:26:12) [MSC v.1900 64 bit (AMD64)]
a = 257; a is 257
Out[8]: False
But
a=257; print(a is 257) ; a=258; print(a is 257)
>>>True
>>>False
As it is evident, whatever Python takes in "one block" is non deterministic and can be swayed depending on how it is written, single line or not, as well as the version, operating system and distribution used.
This question already has answers here:
The `is` operator behaves unexpectedly with non-cached integers
(2 answers)
What's with the integer cache maintained by the interpreter?
(1 answer)
"is" operator behaves unexpectedly with integers
(11 answers)
Closed 1 year ago.
From this link I learnt that
The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object
But when I tried to give some example for my session and I found out that it behaves differently with assignment and tuple unpacking.
Here is the snippet:
>>> a,b = 300,300
>>> a is b
True
>>> c = 300
>>> d = 300
>>> c is d
False
Because int is immutable, Python may or may not use exists object, if you save the following code in to a script file, and run it, it will output two True.
a, b = 300, 300
print a is b
c = 300
d = 300
print c is d
When Python compile the code, it may reuse all the constants. Becasue you input your code in a python session, the codes are compiled line by line, Python can't reuse all the constants as one object.
The document only says that there will be only one instance for -5 to 256, but doesn't define the behavior of others. For immutable types, is and is not is not important, because you can't modify them.
import dis
def testMethod1():
a, b = 300, 300
print dis.dis(testMethod1)
Prints:
4 0 LOAD_CONST 2 ((300, 300))
3 UNPACK_SEQUENCE 2
6 STORE_FAST 0 (a)
9 STORE_FAST 1 (b)
12 LOAD_CONST 0 (None)
15 RETURN_VALUE None
def testMethod2():
a = 300
b = 300
Prints:
7 0 LOAD_CONST 1 (300)
3 STORE_FAST 0 (a)
8 6 LOAD_CONST 1 (300)
9 STORE_FAST 1 (b)
12 LOAD_CONST 0 (None)
15 RETURN_VALUE None
So, it looks essentially the same, but with LOAD_CONST in one step in the first method and two steps in the second method....
EDIT
After some testing, I discovered that both methods return False eventually; however, on one run only, ie not putting the methods in a loop, they seem to always return True. Sometimes it uses a single reference, and sometimes it does not.
The documentation only states that -5 to 256 will return the same reference; hence, you simply just shouldn't be using is for comparison (in this case) as the number's current id has no guarantee on it.
NB: You never want to use is for comparison of values, as that's not what it's for, it's to compare identities. My point was that is's return value is not always going to be True when you're outside of the defined range.
When testing for membership, we can use:
x not in y
Or alternatively:
not x in y
There can be many possible contexts for this expression depending on x and y. It could be for a substring check, list membership, dict key existence, for example.
Are the two forms always equivalent?
Is there a preferred syntax?
They always give the same result.
In fact, not 'ham' in 'spam and eggs' appears to be special cased to perform a single "not in" operation, rather than an "in" operation and then negating the result:
>>> import dis
>>> def notin():
'ham' not in 'spam and eggs'
>>> dis.dis(notin)
2 0 LOAD_CONST 1 ('ham')
3 LOAD_CONST 2 ('spam and eggs')
6 COMPARE_OP 7 (not in)
9 POP_TOP
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
>>> def not_in():
not 'ham' in 'spam and eggs'
>>> dis.dis(not_in)
2 0 LOAD_CONST 1 ('ham')
3 LOAD_CONST 2 ('spam and eggs')
6 COMPARE_OP 7 (not in)
9 POP_TOP
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
>>> def not__in():
not ('ham' in 'spam and eggs')
>>> dis.dis(not__in)
2 0 LOAD_CONST 1 ('ham')
3 LOAD_CONST 2 ('spam and eggs')
6 COMPARE_OP 7 (not in)
9 POP_TOP
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
>>> def noteq():
not 'ham' == 'spam and eggs'
>>> dis.dis(noteq)
2 0 LOAD_CONST 1 ('ham')
3 LOAD_CONST 2 ('spam and eggs')
6 COMPARE_OP 2 (==)
9 UNARY_NOT
10 POP_TOP
11 LOAD_CONST 0 (None)
14 RETURN_VALUE
I had thought at first that they always gave the same result, but that not on its own was simply a low precedence logical negation operator, which could be applied to a in b just as easily as any other boolean expression, whereas not in was a separate operator for convenience and clarity.
The disassembly above was revealing! It seems that while not obviously is a logical negation operator, the form not a in b is special cased so that it's not actually using the general operator. This makes not a in b literally the same expression as a not in b, rather than merely an expression that results in the same value.
No, there is no difference.
The operator not in is defined to have the inverse true value of in.
—Python documentation
I would assume not in is preferred because it is more obvious and they added a special case for it.
They are identical in meaning, but the pycodestyle Python style guide checker (formerly called pep8) prefers the not in operator in rule E713:
E713: test for membership should be not in
See also "Python if x is not None or if not x is None?" for a very similar choice of style.
Others have already made it very clear that the two statements are, down to a quite low level, equivalent.
However, I don't think that anyone yet has stressed enough that since this leaves the choice up to you, you should
choose the form that makes your code as readable as possible.
And not necessarily as readable as possible to anyone, even if that's of course a nice thing to aim for. No, make sure the code is as readable as possible to you, since you are the one who is the most likely to come back to this code later and try to read it.
In Python, there is no difference. And there is no preference.
Syntactically they're the same statement. I would be quick to state that 'ham' not in 'spam and eggs' conveys clearer intent, but I've seen code and scenarios in which not 'ham' in 'spam and eggs' conveys a clearer meaning than the other.
This question already has answers here:
Python `if x is not None` or `if not x is None`? [closed]
(9 answers)
Closed 8 years ago.
Out of these not None tests.
if val != None:
if not (val is None):
if val is not None:
Which one is preferable, and why?
if val is not None:
# ...
is the Pythonic idiom for testing that a variable is not set to None. This idiom has particular uses in the case of declaring keyword functions with default parameters. is tests identity in Python. Because there is one and only one instance of None present in a running Python script/program, is is the optimal test for this. As Johnsyweb points out, this is discussed in PEP 8 under "Programming Recommendations".
As for why this is preferred to
if not (val is None):
# ...
this is simply part of the Zen of Python: "Readability counts." Good Python is often close to good pseudocode.
From, Programming Recommendations, PEP 8:
Comparisons to singletons like None should always be done with is or is not, never the equality operators.
Also, beware of writing if x when you really mean if x is not None — e.g. when testing whether a variable or argument that defaults to None was set to some other value. The other value might have a type (such as a container) that could be false in a boolean context!
PEP 8 is essential reading for any Python programmer.
The best bet with these types of questions is to see exactly what python does. The dis module is incredibly informative:
>>> import dis
>>> dis.dis("val != None")
1 0 LOAD_NAME 0 (val)
2 LOAD_CONST 0 (None)
4 COMPARE_OP 3 (!=)
6 RETURN_VALUE
>>> dis.dis("not (val is None)")
1 0 LOAD_NAME 0 (val)
2 LOAD_CONST 0 (None)
4 COMPARE_OP 9 (is not)
6 RETURN_VALUE
>>> dis.dis("val is not None")
1 0 LOAD_NAME 0 (val)
2 LOAD_CONST 0 (None)
4 COMPARE_OP 9 (is not)
6 RETURN_VALUE
Notice that the last two cases reduce to the same sequence of operations, Python reads not (val is None) and uses the is not operator. The first uses the != operator when comparing with None.
As pointed out by other answers, using != when comparing with None is a bad idea.
Either of the latter two, since val could potentially be of a type that defines __eq__() to return true when passed None.