This question already has answers here:
The `is` operator behaves unexpectedly with non-cached integers
(2 answers)
What's with the integer cache maintained by the interpreter?
(1 answer)
"is" operator behaves unexpectedly with integers
(11 answers)
Closed 1 year ago.
From this link I learnt that
The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object
But when I tried to give some example for my session and I found out that it behaves differently with assignment and tuple unpacking.
Here is the snippet:
>>> a,b = 300,300
>>> a is b
True
>>> c = 300
>>> d = 300
>>> c is d
False
Because int is immutable, Python may or may not use exists object, if you save the following code in to a script file, and run it, it will output two True.
a, b = 300, 300
print a is b
c = 300
d = 300
print c is d
When Python compile the code, it may reuse all the constants. Becasue you input your code in a python session, the codes are compiled line by line, Python can't reuse all the constants as one object.
The document only says that there will be only one instance for -5 to 256, but doesn't define the behavior of others. For immutable types, is and is not is not important, because you can't modify them.
import dis
def testMethod1():
a, b = 300, 300
print dis.dis(testMethod1)
Prints:
4 0 LOAD_CONST 2 ((300, 300))
3 UNPACK_SEQUENCE 2
6 STORE_FAST 0 (a)
9 STORE_FAST 1 (b)
12 LOAD_CONST 0 (None)
15 RETURN_VALUE None
def testMethod2():
a = 300
b = 300
Prints:
7 0 LOAD_CONST 1 (300)
3 STORE_FAST 0 (a)
8 6 LOAD_CONST 1 (300)
9 STORE_FAST 1 (b)
12 LOAD_CONST 0 (None)
15 RETURN_VALUE None
So, it looks essentially the same, but with LOAD_CONST in one step in the first method and two steps in the second method....
EDIT
After some testing, I discovered that both methods return False eventually; however, on one run only, ie not putting the methods in a loop, they seem to always return True. Sometimes it uses a single reference, and sometimes it does not.
The documentation only states that -5 to 256 will return the same reference; hence, you simply just shouldn't be using is for comparison (in this case) as the number's current id has no guarantee on it.
NB: You never want to use is for comparison of values, as that's not what it's for, it's to compare identities. My point was that is's return value is not always going to be True when you're outside of the defined range.
Related
The usual way to swap values in a list is to use a temporary variable.
temp = l[i]
l[i] = l[j]
l[j] = temp
But in python you can do this:
l[i], l[j] = l[j], l[i]
How does this second method work? Is it the exact same process? Does it use less / more memory?
import dis
def swap_using_temp(a, b):
temp = a
a = b
b = temp
def swap_using_unpacking(a, b):
a, b = b, a
swap_using_unpacking does not require extra memory.
Explanation:
If you disassemble the code using dis module of both the function described then you will see that in swap_using_unpacking there is a bytecode instruction ROT_TWO which swaps the 2 topmost elements of the stack(which don't require a third variable hence no extra memory is consumed).
dis.dis(swap_using_unpacking)
11 0 LOAD_FAST 1 (b)
2 LOAD_FAST 0 (a)
4 ROT_TWO
6 STORE_FAST 0 (a)
8 STORE_FAST 1 (b)
10 LOAD_CONST 0 (None)
12 RETURN_VALUE
dis.dis(swap_using_temp)
5 0 LOAD_FAST 0 (a)
2 STORE_FAST 2 (temp)
6 4 LOAD_FAST 1 (b)
6 STORE_FAST 0 (a)
7 8 LOAD_FAST 2 (temp)
10 STORE_FAST 1 (b)
12 LOAD_CONST 0 (None)
14 RETURN_VALUE
You are asking the wrong question here. You are not using assembly language but Python, so you should not worry for some extra bytes. What really matters in Python is readability.
Anyway, both versions should be internally implemented more or less the same way, except that the first one creates an additional identifier. It is explicitely named temp so provided you do not use it in that scope, it brings no real problem.
If you use a linting environment that warns you for possible problems (what you should do...) you must be aware that reusing a variable name that hides the same name in an outer scope, while perfectly correct on a language point of view will light some warning on. But as you should not use a temp identifier outside a local scope (readability...) it should not be a problem either.
So it is more of a matter of taste. If you or your team mates often use other languages that do not allow multiple assignments the first way will be more natural. If you mainly use Python, the second way is IMHO more pythonic because it avoids adding an unnecessary identifier in the local scope. But as I have already said, nothing more that a matter of taste...
I saw a blog post where it's mentioned "Use func.__code__.co_consts to check all the constants defined in the function".
def func():
return 1 in {1,2,3}
func.__code__.co_consts
(None, 1, frozenset({1, 2, 3}))
Why did it return a frozenset?
def func():
return 1 in [1,2,3]
func.__code__.co_consts
(None, 1, (1,2,3))
Why did it return a tuple instead of a list? Every object returned from __code__.co_consts is immutable. Why are the mutable constants made immutable? Why is the first element of the returned tuple always None?
This is a result of the Python Peephole optimizer
Under "Optimizations", it says:
BUILD_LIST + COMPARE_OP(in/not in): convert list to tuple
BUILD_SET + COMPARE_OP(in/not in): convert set to frozenset
See here for more information:
"Python uses peephole optimization of your code by either pre-calculating constant expressions or transforming certain data structures"
especially the part about "Membership Tests":
"What Python for membership tests is to transform mutable data structures to its inmutable version. Lists get transformed into tuples and sets into frozensets."
All objects in co_consts are constants, i.e. they are immutable. You shouldn't be able to, e.g., append to a list appearing as a literal in the source code and thereby modify the behaviour of the function.
The compiler usually represents list literals by listing all individual constants appearing in the list:
>>> def f():
... a = [1, 2, 3]
... return 1 in a
...
>>> f.__code__.co_consts
(None, 1, 2, 3)
Looking at the byte code of this function we can see that the function builds a list at execution time each time the function is executed:
>>> dis.dis(f)
2 0 LOAD_CONST 1 (1)
2 LOAD_CONST 2 (2)
4 LOAD_CONST 3 (3)
6 BUILD_LIST 3
8 STORE_FAST 0 (a)
3 10 LOAD_CONST 1 (1)
12 LOAD_FAST 0 (a)
14 COMPARE_OP 6 (in)
16 RETURN_VALUE
Creating a new list is required in general, because the function may modify or return the list defined by the literal, in which case it needs to operate on a new list object every time the funciton is executed.
In other contexts, creating a new list object is wasteful, though. For this reason, Python's peephole optimizer can replace the list with a tuple, or a set with a frozen_set, in certain situations where it is known to be safe. One such situation is when the list or set literal is only used in an expression of the form x [not] in <list_literal>. Another such situation is when a list literal is used in a for loop.
The peephole optimizer is very simple. It only looks at one expression at a time. For this reason, it can't detect that this optimization would be safe in my definition of f above, which is functionally equivalent to your example.
Why is it that according to the timeit.timeit function the code boolean = True if foo else False runs faster than the code boolean = bool(foo)?
How is it that the if statement is able to determine the trueness of foo faster then the bool function itself?
Why doesn't the bool function simply use the same mechanic?
And what is the purpose of the bool function when it can be outperformed by a factor of four by a different technique?
Or, is it so that I am misusing the timeit function and that bool(foo) is, in fact, faster?
>>> timeit.timeit("boolean = True if foo else False", setup="foo='zon-zero'")
0.021019499999965774
>>> timeit.timeit("boolean = bool(foo)", setup="foo='zon-zero'")
0.0684856000000309
>>> timeit.timeit("boolean = True if foo else False", setup="foo=''")
0.019911300000103438
>>> timeit.timeit("boolean = bool(foo)", setup="foo=''")
0.09232059999999365
Looking at these results, True if foo else False seems to be four to five times faster than bool(foo).
I suspect that the difference in speed is caused by the overhead of calling a function and that does indeed seem to be the case when I use the dis module.
>>> dis.dis("boolean = True if foo else False")
1 0 LOAD_NAME 0 (foo)
2 POP_JUMP_IF_FALSE 8
4 LOAD_CONST 0 (True)
6 JUMP_FORWARD 2 (to 10)
>> 8 LOAD_CONST 1 (False)
>> 10 STORE_NAME 1 (boolean)
12 LOAD_CONST 2 (None)
14 RETURN_VALUE
>>> dis.dis("boolean = bool(foo)")
1 0 LOAD_NAME 0 (bool)
2 LOAD_NAME 1 (foo)
4 CALL_FUNCTION 1
6 STORE_NAME 2 (boolean)
8 LOAD_CONST 0 (None)
10 RETURN_VALUE
According to the dis module, than the difference between the two techniques is:
2 POP_JUMP_IF_FALSE 8
4 LOAD_CONST 0 (True)
6 JUMP_FORWARD 2 (to 10)
>> 8 LOAD_CONST 1 (False)
versus
0 LOAD_NAME 1 (bool)
4 CALL_FUNCTION 1
which makes it look like either the call to a function is far too expensive for something as simple as determining a boolean value or the bool function has been written very inefficiently.
But that actually makes me wonder why anyone would use the bool function when it is this much slower and why the bool function even exists when python does not even seem to use it internally.
So, is the bool function slower because it has been written inefficiently, because of the function overhead, or because of a different reason?
And why would anyone use the bool function when a much faster and equally clear alternative is available?
As per Python documentation :
class bool( [ x ] )
Return a Boolean value, i.e. one of True or False. x is converted using the standard truth testing
procedure. If x is false or omitted, this returns False; otherwise it returns True. The bool class is a
subclass of int (see Numeric Types — int, float, complex). It cannot be subclassed further. Its only
instances are False and True
So, when you directly use the object itself (like foo), the interpreter uses its foo.__bool__ property. But the bool function is a wrapper that again calls foo.__bool__
As you said, calling the function made it expensive.
And the use of bool is, there are certain situations where you need the boolean value of an object and need to refer it by a variable.
x = bool(my_object)
Writing x = my_object doesn't work.
Here its useful.
Sometimes bool(foo) is more readable where you can ignore small time lags.
You might be also interested in knowing that
x = {}
is faster than
x = dict()
Find out why... :)
From some of the answers on Stackoverflow, I came to know that from -5 to 256 same memory location is referenced thus we get true for:
>>> a = 256
>>> a is 256
True
Now comes the twist (see this line before marking duplicate):
>>> a = 257
>>> a is 257
False
This is completely understood, but now if I do:
>>> a = 257; a is 257
True
>>> a = 12345; a is 12345
True
Why?
What you're seeing is an optimization in the compiler in CPython (which compiles your source code into the bytecode that the interpreter runs). Whenever the same immutable constant value is used in several different places within the a chunk of code that is being compiled in one step, the compiler will try to use a reference to same object for each place.
So if you do multiple assignments on the same line in an interactive session, you'll get two references to the same object, but you won't if you use two separate lines:
>>> x = 257; y = 257 # multiple statements on the same line are compiled in one step
>>> print(x is y) # prints True
>>> x = 257
>>> y = 257
>>> print(x is y) # prints False this time, since the assignments were compiled separately
Another place this optimization comes up is in the body of a function. The whole function body will be compiled together, so any constants used anywhere in the function can be combined, even if they're on separate lines:
def foo():
x = 257
y = 257
return x is y # this will always return True
While it's interesting to investigate optimizations like this one, you should never rely upon this behavior in your normal code. Different Python interpreters, and even different versions of CPython may do these optimizations differently or not at all. If your code depends on a specific optimization, it may be completely broken for somebody else who tries to run it on their own system.
As an example, the two assignments on the same line I show in my first code block above doesn't result in two references to the same object when I do it in the interactive shell inside Spyder (my preferred IDE). I have no idea why that specific situation doesn't work the same way it does in a conventional interactive shell, but the different behavior is my fault, since my code relies upon implementation-specific behavior.
Generally speaking, numbers outside the range -5 to 256 will not necessarily have the optimization applied to numbers within that range. However, Python is free to apply other optimizations as appropriate. In your cause, you're seeing that the same literal value used multiple times on one line is stored in a single memory location no matter how many times it's used on that line. Here are some other examples of this behavior:
>>> s = 'a'; s is 'a'
True
>>> s = 'asdfghjklzxcvbnmsdhasjkdhskdja'; s is 'asdfghjklzxcvbnmsdhasjkdhskdja'
True
>>> x = 3.14159; x is 3.14159
True
>>> t = 'a' + 'b'; t is 'a' + 'b'
True
>>>
From python2 docs:
The operators is and is not test for object identity: x is y is true
if and only if x and y are the same object. x is not y yields the
inverse truth value. [6]
From python3 docs:
The operators is and is not test for object identity: x is y is true
if and only if x and y are the same object. Object identity is
determined using the id() function. x is not y yields the inverse
truth value. [4]
So basically the key to understand those tests you've run on the repl console is by using
accordingly the id() function, here's an example that will show you what's going on behind the curtains:
>>> a=256
>>> id(a);id(256);a is 256
2012996640
2012996640
True
>>> a=257
>>> id(a);id(257);a is 257
36163472
36162032
False
>>> a=257;id(a);id(257);a is 257
36162496
36162496
True
>>> a=12345;id(a);id(12345);a is 12345
36162240
36162240
True
That said, usually a good way to understand what's going on behind the curtains with these type of snippets is by using either dis.dis or dis.disco, let's take a look for instance what this snippet would look like:
import dis
import textwrap
dis.disco(compile(textwrap.dedent("""\
a=256
a is 256
a=257
a is 257
a=257;a is 257
a=12345;a is 12345\
"""), '', 'exec'))
the output would be:
1 0 LOAD_CONST 0 (256)
2 STORE_NAME 0 (a)
2 4 LOAD_NAME 0 (a)
6 LOAD_CONST 0 (256)
8 COMPARE_OP 8 (is)
10 POP_TOP
3 12 LOAD_CONST 1 (257)
14 STORE_NAME 0 (a)
4 16 LOAD_NAME 0 (a)
18 LOAD_CONST 1 (257)
20 COMPARE_OP 8 (is)
22 POP_TOP
5 24 LOAD_CONST 1 (257)
26 STORE_NAME 0 (a)
28 LOAD_NAME 0 (a)
30 LOAD_CONST 1 (257)
32 COMPARE_OP 8 (is)
34 POP_TOP
6 36 LOAD_CONST 2 (12345)
38 STORE_NAME 0 (a)
40 LOAD_NAME 0 (a)
42 LOAD_CONST 2 (12345)
44 COMPARE_OP 8 (is)
46 POP_TOP
48 LOAD_CONST 3 (None)
50 RETURN_VALUE
As we can see in this case the asm output doesn't tell us very much, we can see than lines 3-4 are basically the "same" instructions than line 5. So my recommendation would be once again to use id() smartly so you'll know what's is will compare. In case you want to know exactly the type of optimizations cpython is doing I'm afraid you'd need to dig out in its source code
After discussion and testing in various versions, the final conclusions can be drawn.
Python will interpret and compile instructions in blocks. Depending on the syntax used, Python version, Operating System, distribution, different results may be achieved depending on what instructions Python takes in one block.
The general rules are:
(from official documentation)
The current implementation keeps an array of integer objects for all
integers between -5 and 256
Therefore:
a = 256
id(a)
Out[2]: 1997190544
id(256)
Out[3]: 1997190544 # int actually stored once within Python
a = 257
id(a)
Out[5]: 2365489141456
id(257)
Out[6]: 2365489140880 #literal, temporary. as you see the ids differ
id(257)
Out[7]: 2365489142192 # literal, temporary. as you see it gets a new id everytime
# since it is not pre-stored
The part below returns False in Python 3.6.3 |Anaconda custom (64-bit)| (default, Oct 17 2017, 23:26:12) [MSC v.1900 64 bit (AMD64)]
a = 257; a is 257
Out[8]: False
But
a=257; print(a is 257) ; a=258; print(a is 257)
>>>True
>>>False
As it is evident, whatever Python takes in "one block" is non deterministic and can be swayed depending on how it is written, single line or not, as well as the version, operating system and distribution used.
This question already has answers here:
Python `if x is not None` or `if not x is None`? [closed]
(9 answers)
Closed 8 years ago.
Out of these not None tests.
if val != None:
if not (val is None):
if val is not None:
Which one is preferable, and why?
if val is not None:
# ...
is the Pythonic idiom for testing that a variable is not set to None. This idiom has particular uses in the case of declaring keyword functions with default parameters. is tests identity in Python. Because there is one and only one instance of None present in a running Python script/program, is is the optimal test for this. As Johnsyweb points out, this is discussed in PEP 8 under "Programming Recommendations".
As for why this is preferred to
if not (val is None):
# ...
this is simply part of the Zen of Python: "Readability counts." Good Python is often close to good pseudocode.
From, Programming Recommendations, PEP 8:
Comparisons to singletons like None should always be done with is or is not, never the equality operators.
Also, beware of writing if x when you really mean if x is not None — e.g. when testing whether a variable or argument that defaults to None was set to some other value. The other value might have a type (such as a container) that could be false in a boolean context!
PEP 8 is essential reading for any Python programmer.
The best bet with these types of questions is to see exactly what python does. The dis module is incredibly informative:
>>> import dis
>>> dis.dis("val != None")
1 0 LOAD_NAME 0 (val)
2 LOAD_CONST 0 (None)
4 COMPARE_OP 3 (!=)
6 RETURN_VALUE
>>> dis.dis("not (val is None)")
1 0 LOAD_NAME 0 (val)
2 LOAD_CONST 0 (None)
4 COMPARE_OP 9 (is not)
6 RETURN_VALUE
>>> dis.dis("val is not None")
1 0 LOAD_NAME 0 (val)
2 LOAD_CONST 0 (None)
4 COMPARE_OP 9 (is not)
6 RETURN_VALUE
Notice that the last two cases reduce to the same sequence of operations, Python reads not (val is None) and uses the is not operator. The first uses the != operator when comparing with None.
As pointed out by other answers, using != when comparing with None is a bad idea.
Either of the latter two, since val could potentially be of a type that defines __eq__() to return true when passed None.