'is' operator behaves unexpectedly with floats - python

I came across a confusing problem when unit testing a module. The module is actually casting values and I want to compare this values.
There is a difference in comparison with == and is (partly, I'm beware of the difference)
>>> 0.0 is 0.0
True # as expected
>>> float(0.0) is 0.0
True # as expected
As expected till now, but here is my "problem":
>>> float(0) is 0.0
False
>>> float(0) is float(0)
False
Why? At least the last one is really confusing to me. The internal representation of float(0) and float(0.0) should be equal. Comparison with == is working as expected.

This has to do with how is works. It checks for references instead of value. It returns True if either argument is assigned to the same object.
In this case, they are different instances; float(0) and float(0) have the same value ==, but are distinct entities as far as Python is concerned. CPython implementation also caches integers as singleton objects in this range -> [x | x ∈ ℤ ∧ -5 ≤ x ≤ 256 ]:
>>> 0.0 is 0.0
True
>>> float(0) is float(0) # Not the same reference, unique instances.
False
In this example we can demonstrate the integer caching principle:
>>> a = 256
>>> b = 256
>>> a is b
True
>>> a = 257
>>> b = 257
>>> a is b
False
Now, if floats are passed to float(), the float literal is simply returned (short-circuited), as in the same reference is used, as there's no need to instantiate a new float from an existing float:
>>> 0.0 is 0.0
True
>>> float(0.0) is float(0.0)
True
This can be demonstrated further by using int() also:
>>> int(256.0) is int(256.0) # Same reference, cached.
True
>>> int(257.0) is int(257.0) # Different references are returned, not cached.
False
>>> 257 is 257 # Same reference.
True
>>> 257.0 is 257.0 # Same reference. As #Martijn Pieters pointed out.
True
However, the results of is are also dependant on the scope it is being executed in (beyond the span of this question/explanation), please refer to user: #Jim's fantastic explanation on code objects. Even python's doc includes a section on this behavior:
5.9 Comparisons
[7]
Due to automatic garbage-collection, free lists, and the dynamic nature of descriptors, you may notice seemingly unusual behaviour in certain uses of the is operator, like those involving comparisons between instance methods, or constants. Check their documentation for more info.

If a float object is supplied to float(), CPython* just returns it without making a new object.
This can be seen in PyNumber_Float (which is eventually called from float_new) where the object o passed in is checked with PyFloat_CheckExact; if True, it just increases its reference count and returns it:
if (PyFloat_CheckExact(o)) {
Py_INCREF(o);
return o;
}
As a result, the id of the object stays the same. So the expression
>>> float(0.0) is float(0.0)
reduces to:
>>> 0.0 is 0.0
But why does that equal True? Well, CPython has some small optimizations.
In this case, it uses the same object for the two occurrences of 0.0 in your command because they are part of the same code object (short disclaimer: they're on the same logical line); so the is test will succeed.
This can be further corroborated if you execute float(0.0) in separate lines (or, delimited by ;) and then check for identity:
a = float(0.0); b = float(0.0) # Python compiles these separately
a is b # False
On the other hand, if an int (or a str) is supplied, CPython will create a new float object from it and return that. For this, it uses PyFloat_FromDouble and PyFloat_FromString respectively.
The effect is that the returned objects differ in ids (which used to check identities with is):
# Python uses the same object representing 0 to the calls to float
# but float returns new float objects when supplied with ints
# Thereby, the result will be False
float(0) is float(0)
*Note: All previous mentioned behavior applies for the implementation of python in C i.e CPython. Other implementations might exhibit different behavior. In short, don't depend on it.

Related

Why 'is' operator is different when adding a value to a variable [duplicate]

This question already has answers here:
"is" operator behaves unexpectedly with integers
(11 answers)
Closed last month.
in shell environment,
x = 250
y = 250
x + 100 is y + 100
x + 100 is y + 100 result is False.
Why is the result value "False" while id(x), id(y) is same?
Also
x = 250
y = 250
x is y
x += 10
y += 10
x is y
last statement result is False. It is same problem.
I thought both questions would be True.
At a high-level, is checks whether the values refer to the same reference, while == calls a method of the objects to compare them
As a general rule, only use is when comparing against singletons
None
True
False
and == everywhere else
#Mark Ransom's comment gets to why this can have unexpected effects (such as numbers sometimes comparing the same, not not every time) .. small numbers have a special priority and are cached, though to my knowledge this is implementation-specific (and so it may vary between different interpreters and versions of Python) and should not be relied upon
>>> a = 100
>>> b = 100
>>> c = 1000
>>> d = 1000
>>> a is b
True
>>> c is d
False
"The is operator does not match the values of the variables, but the instances themselves."
Source : Understanding the "is" operator
To sum up, the values are equal but the instances isn't the same.
I recommend this lecture from Python official documentation : https://docs.python.org/3/tutorial/classes.html
is checks object identity. It is only True if you are comparing the same object on both sides. Just because two integers have the same value doesn't mean they have to be the same object. In fact, they usually aren't.
CPython optimizes a small set of integers (-5 through +256) to always be singletons. That is, whenever an integer is created, python checks whether the integer is in a small range of cached integers and uses those when possible. This is pretty quick - for small integers, which are the most commonly used integers, you don't need to create new objects. But its not scalable. Outside that range, python will generally create a new integer even if the integer already exists. The cost of lookup is too much.
The python compiler may also reuse immutables such as integers and strings that it knows about in a given compilation unit. This lookup is done once at compile and is relatively fast considering everything that goes on in that compile step.
But you can't rely on any of this. This is just how python happens to be implemented and the moment and it may change in the future. There are several singletons you can count on: None, True and False will always work.

Python: 'inf is inf', but '-inf is not -inf'?

Python 3.7
While writing the search code for the maximum, I encountered a strange behavior of negative infinity.
Can somebody explain why this behavior?
>>> inf = float('inf')
>>> inf is inf
True
>>> (-inf) is (-inf)
False
And I've already realized it's better to use == for comparison, but I'm interested in the answer to the question above.
inf is a variable, bound to a specific object. Any object is itself, so inf is inf.
-inf is an expression. It does math, and produces an object with value floating-point negative infinity. Python makes no promises about whether this will be the same object as any other object with that value. In your case, the two evaluations of -inf happened to produce different objects.
Again, there are no promises about what -inf is -inf will produce. The current CPython implementation happens to consistently produce False. PyPy's primitive handling produces True. A different Python version or implementation might inconsistently produce True or False based on current memory pressure, or whether you ran this as a script or interactively, or any other factor. CPython itself already has cases where object identity is different in a script or interactively; for example, this:
x = 1000
y = 1000
print(x is y)
prints different things in the current CPython implementation depending on whether you run it interactively.
-inf is an operation that produces a new float object, and you use it twice in the same expression. Two distinct objects are produced, because a Python implementation is not required to notice that both subexpressions could return a reference to the same object.
I want to add one point: in Python, when we use id() function, is operator, == operator, etc, the real target we're working with is value, not variable.
a is b equals to id(a) == id(b).
In your code, inf = float('inf') will produce a value named inf.
inf is also an expression, which will be evaluated to a value.
inf is inf will be True, because the evaluated values of two expressions are same one, in fact.
In Python, focusing on values is better and easier.
For example, in CPython implementation:
a = 1
b = 1
a is b # will be true
a = 10000
b = 10000
a is b # will be false
a = '10000'
b = '10000'
a is b # will be true
If we focus on variable, such thing is weird. If we focus on values, and if I tell you that CPython implementation has cache pool, which is used for small int and some literal str value, such thing is simple again:
Because of cache pool, small int 1 is cached, str '10000' is cached too, but big int 10000 is not cached so it will be created two times.
Try to focus on values.

Python list indexing with bool values [duplicate]

Is it guaranteed that False == 0 and True == 1, in Python (assuming that they are not reassigned by the user)? For instance, is it in any way guaranteed that the following code will always produce the same results, whatever the version of Python (both existing and, likely, future ones)?
0 == False # True
1 == True # True
['zero', 'one'][False] # is 'zero'
Any reference to the official documentation would be much appreciated!
Edit: As noted in many answers, bool inherits from int. The question can therefore be recast as: "Does the documentation officially say that programmers can rely on booleans inheriting from integers, with the values 0 and 1?". This question is relevant for writing robust code that won't fail because of implementation details!
In Python 2.x this is not guaranteed as it is possible for True and False to be reassigned. However, even if this happens, boolean True and boolean False are still properly returned for comparisons.
In Python 3.x True and False are keywords and will always be equal to 1 and 0.
Under normal circumstances in Python 2, and always in Python 3:
False object is of type bool which is a subclass of int:
object
|
int
|
bool
It is the only reason why in your example, ['zero', 'one'][False] does work. It would not work with an object which is not a subclass of integer, because list indexing only works with integers, or objects that define a __index__ method (thanks mark-dickinson).
Edit:
It is true of the current python version, and of that of Python 3. The docs for python 2 and the docs for Python 3 both say:
There are two types of integers: [...] Integers (int) [...] Booleans (bool)
and in the boolean subsection:
Booleans: These represent the truth values False and True [...] Boolean values behave like the values 0 and 1, respectively, in almost all contexts, the exception being that when converted to a string, the strings "False" or "True" are returned, respectively.
There is also, for Python 2:
In numeric contexts (for example when used as the argument to an arithmetic operator), they [False and True] behave like the integers 0 and 1, respectively.
So booleans are explicitly considered as integers in Python 2 and 3.
So you're safe until Python 4 comes along. ;-)
Here's the PEP discussing the new bool type in Python 2.3: http://www.python.org/dev/peps/pep-0285/.
When converting a bool to an int, the integer value is always 0 or 1, but when converting an int to a bool, the boolean value is True for all integers except 0.
>>> int(False)
0
>>> int(True)
1
>>> bool(5)
True
>>> bool(-5)
True
>>> bool(0)
False
In Python 2.x, it is not guaranteed at all:
>>> False = 5
>>> 0 == False
False
So it could change. In Python 3.x, True, False, and None are reserved words, so the above code would not work.
In general, with booleans you should assume that while False will always have an integer value of 0 (so long as you don't change it, as above), True could have any other value. I wouldn't necessarily rely on any guarantee that True==1, but on Python 3.x, this will always be the case, no matter what.

How does Python know the values already stored in its memory?

I want to know how Python knows (if it knows) that a value-type object is already stored in its memory (and also knows where it is).
For this code, when assigning the value 1 for b, how does it know that the value 1 is already in its memory and stores its reference in b?
>>> a = 1
>>> b = 1
>>> a is b
True
Python (CPython precisely) uses shared small integers to help quick access. Integers range from [-5, 256] already exists in memory, so if you check the address, they are the same. However, for larger integers, it's not true.
a = 100000
b = 100000
a is b # False
Wait, what? If you check the address of the numbers, you'll find something interesting:
a = 1
b = 1
id(a) # 4463034512
id(b) # 4463034512
a = 257
b = 257
id(a) # 4642585200
id(b) # 4642585712
It's called integer cache. You can read more about the integer cache here.
Thanks comments from #KlausD and #user2357112 mentioning, direct access on small integers will be using integer cache, while if you do calculations, though they might equals to a number in range [-5, 256], it's not a cached integer. e.g.
pow(3, 47159012670, 47159012671) is 1 # False
pow(3, 47159012670, 47159012671) == 1 # True
“The current implementation keeps an array of integer objects for all
integers between -5 and 256, when you create an int in that range you
actually just get back a reference to the existing object.”
Why? Because small integers are more frequently used by loops. Using reference to existing objects instead of creating a new object saves an overhead.
If you take a look at Objects/longobject.c, which implements the int type for CPython, you will see that the numbers between -5 (NSMALLNEGINTS) and 256 (NSMALLPOSINTS - 1) are pre-allocated and cached. This is done to avoid the penalty of allocating multiple unnecessary objects for the most commonly used integers. This works because integers are immutable: you don't need multiple references to represent the same number.
Python doesn't know anything until you tell it. So in your code above, when you initialize a and b, you are storing those values(in the register or RAM), and calling the place to store it a and b, so that you can reference them later. If you didn't initialize the variable first, python would just give you an error.

'is' operator behaves differently when comparing strings with spaces

I've started learning Python (python 3.3) and I was trying out the is operator. I tried this:
>>> b = 'is it the space?'
>>> a = 'is it the space?'
>>> a is b
False
>>> c = 'isitthespace'
>>> d = 'isitthespace'
>>> c is d
True
>>> e = 'isitthespace?'
>>> f = 'isitthespace?'
>>> e is f
False
It seems like the space and the question mark make the is behave differently. What's going on?
EDIT: I know I should be using ==, I just wanted to know why is behaves like this.
Warning: this answer is about the implementation details of a specific python interpreter. comparing strings with is==bad idea.
Well, at least for cpython3.4/2.7.3, the answer is "no, it is not the whitespace". Not only the whitespace:
Two string literals will share memory if they are either alphanumeric or reside on the same block (file, function, class or single interpreter command)
An expression that evaluates to a string will result in an object that is identical to the one created using a string literal, if and only if it is created using constants and binary/unary operators, and the resulting string is shorter than 21 characters.
Single characters are unique.
Examples
Alphanumeric string literals always share memory:
>>> x='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
>>> y='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
>>> x is y
True
Non-alphanumeric string literals share memory if and only if they share the enclosing syntactic block:
(interpreter)
>>> x='`!##$%^&*() \][=-. >:"?<a'; y='`!##$%^&*() \][=-. >:"?<a';
>>> z='`!##$%^&*() \][=-. >:"?<a';
>>> x is y
True
>>> x is z
False
(file)
x='`!##$%^&*() \][=-. >:"?<a';
y='`!##$%^&*() \][=-. >:"?<a';
z=(lambda : '`!##$%^&*() \][=-. >:"?<a')()
print(x is y)
print(x is z)
Output: True and False
For simple binary operations, the compiler is doing very simple constant propagation (see peephole.c), but with strings it does so only if the resulting string is shorter than 21 charcters. If this is the case, the rules mentioned earlier are in force:
>>> 'a'*10+'a'*10 is 'a'*20
True
>>> 'a'*21 is 'a'*21
False
>>> 'aaaaaaaaaaaaaaaaaaaaa' is 'aaaaaaaa' + 'aaaaaaaaaaaaa'
False
>>> t=2; 'a'*t is 'aa'
False
>>> 'a'.__add__('a') is 'aa'
False
>>> x='a' ; x+='a'; x is 'aa'
False
Single characters always share memory, of course:
>>> chr(0x20) is ' '
True
To expand on Ignacio’s answer a bit: The is operator is the identity operator. It is used to compare object identity. If you construct two objects with the same contents, then it is usually not the case that the object identity yields true. It works for some small strings because CPython, the reference implementation of Python, stores the contents separately, making all those objects reference to the same string content. So the is operator returns true for those.
This however is an implementation detail of CPython and is generally neither guaranteed for CPython nor any other implementation. So using this fact is a bad idea as it can break any other day.
To compare strings, you use the == operator which compares the equality of objects. Two string objects are considered equal when they contain the same characters. So this is the correct operator to use when comparing strings, and is should be generally avoided if you do not explicitely want object identity (example: a is False).
If you are really interested in the details, you can find the implementation of CPython’s strings here. But again: This is implementation detail, so you should never require this to work.
The is operator relies on the id function, which is guaranteed to be unique among simultaneously existing objects. Specifically, id returns the object's memory address. It seems that CPython has consistent memory addresses for strings containing only characters a-z and A-Z.
However, this seems to only be the case when the string has been assigned to a variable:
Here, the id of "foo" and the id of a are the same. a has been set to "foo" prior to checking the id.
>>> a = "foo"
>>> id(a)
4322269384
>>> id("foo")
4322269384
However, the id of "bar" and the id of a are different when checking the id of "bar" prior to setting a equal to "bar".
>>> id("bar")
4322269224
>>> a = "bar"
>>> id(a)
4322268984
Checking the id of "bar" again after setting a equal to "bar" returns the same id.
>>> id("bar")
4322268984
So it seems that cPython keeps consistent memory addresses for strings containing only a-zA-Z when those strings are assigned to a variable. It's also entirely possible that this is version dependent: I'm running python 2.7.3 on a macbook. Others might get entirely different results.
In fact your code amounts to comparing objects id (i.e. their physical address). So instead of your is comparison:
>>> b = 'is it the space?'
>>> a = 'is it the space?'
>>> a is b
False
You can do:
>>> id(a) == id(b)
False
But, note that if a and b were directly in the comparison it would work.
>>> id('is it the space?') == id('is it the space?')
True
In fact, in an expression there's sharing between the same static strings. But, at the program scale there's only sharing for word-like strings (so neither spaces nor punctuations).
You should not rely on this behavior as it's not documented anywhere and is a detail of implementation.
Two or more identical strings of consecutive alphanumeric (only) characters are stored in one structure, thus they share their memory reference. There are posts about this phenomenon all over the internet since the 1990's. It has evidently always been that way. I have never seen a reasonable guess as to why that's the case. I only know that it is. Furthermore, if you split and re-join alphanumeric strings to remove spaces between words, the resulting identical alphanumeric strings do NOT share a reference, which I find odd. See below:
Add any non-alphanumeric value identically to both strings, and they instantly become copies, but not shared references.
a ="abbacca"; b = "abbacca"; a is b => True
a ="abbacca "; b = "abbacca "; a is b => False
a ="abbacca?"; b = "abbacca?"; a is b => False
~Dr. C.
'is' operator compare the actual object.
c is d should also be false. My guess is that python make some optimization and in that case, it is the same object.

Categories

Resources