I (incorrectly?) used 'is not' in a comparison and found this curious behavior:
>>> a = 256
>>> b = int('256')
>>> c = 300
>>> d = int('300')
>>>
>>> a is not b
False
>>> c is not d
True
Obviously I should have used:
>>> a != b
False
>>> c != d
False
But it worked for a long time due to small-valued test-cases until I happened to
use a number of 495.
If this is invalid syntax, then why? And shouldn't I at least get a warning?
"is" is not a check of equality of value, but a check that two variables point to the same instance of an object.
ints and strings are confusing for this as is and == can happen to give the same result due to how the internals of the language work.
For small numbers, Python is reusing the object instances, but for larger numbers, it creates new instances for them.
See this:
>>> a=256
>>> b=int('256')
>>> c=300
>>> d=int('300')
>>> id(a)
158013588
>>> id(b)
158013588
>>> id(c)
158151472
>>> id(d)
158151436
which is exactly why a is b, but c isn't d.
Don't use is [not] to compare integers; use == and != instead. Even though is works in current CPython for small numbers due to an optimization, it's unreliable and semantically wrong. The syntax itself is valid, but the benefits of a warning (which would have to be checked on every use of is and could be problematic with subclasses of int) are presumably not worth the trouble.
This is covered elsewhere on SO, but I didn't find it just now.
Int is an object in python, and python caches small integer between [-5,256] by default, so where you use int in [-5,256], they are identical.
a = 256
b = 256
a is b # True
If you declare two integers not in [-5,256], python will create two objects which are not the same(though they have the same value).
a = 257
b = 257
a is b # False
In your case, using != instead to compare the value is the right way.
a = 257
b = 257
a != b # False
For more understanding why this occurs take a look to Python-2.6.5/Objects/intobject.c:78:small_ints array and Python-2.6.5/Objects/intobject.c:1292:_PyInt_Init function in python sources.
Also similar thing occurs with lists:
>>> a = [12]
>>> id_a = id(a)
>>> del(a)
>>> id([1,2,34]) == id_a
True
>>>
Removed lists are not destroyed. They are reused
Related
After reading this and this, I still cannot understand the following behaviour:
a = 1000
b = 1000
print (a == b)
print (a is b)
print (f"id(a) = {id(a)} \nid(b) = {id(b)}")
As expected I get
True
True
id(a) = 2806705928816
id(b) = 2806705928816
But when i try to do something like this:
a = 1000
b = 1000 + a - a
print (a == b)
print (a is b)
print (f"id(a) = {id(a)} \nid(b) = {id(b)}")
I got False in expression a is b
True
False
id(a) = 3030783801968
id(b) = 3030783802064
Why does a variable behave differently when assigning the result of an expression over a integer and an expression with other variables to it? Although mathematically this gives the same integer.
When you do something like :
(case-1)
a = 1000
b = a
or (case-2)
a = 1000
b = 1000
Python is smart enough to know before hand that even after execution you won't need new memory.
So, python just before execution makes b an alias of a in the first case.
The second case is bit different.
Python is a true object oriented language, the literal 1000 is treated as an object. (Intuitively you can think as 1000 to be name of a const object).
So in second case a and b are technically, both becoming alias of 1000
Now in your example:
a = 1000
b = 1000 + a - a
print (a == b)
print (a is b)
while assignment of b, python doesn't know before hand what is going to be the value of a. When I say before-hand I mean before any form of calculation being started. So python reserves a new memory location for band then saves the output of the operation in this new memory location.
It is also worth noting this:
4-1 is 3
True
In this case, python doesn't saves this line with 4-1 but processes it before compilation to be 3, for runtime optimisation.
You already have a few accurate answers. Here I am giving a "back to basics" answer.
What is ==?
Python == means is the value on the left the same as the value on the right.
sum([5, 7]) == (48 * 3)**0.5
is True. It requires several evaluation steps to make each expression reach the value of 12. Even then, the integer 12 is being compared to the float 12.0, so a final conversion of the integer to a float is necessary.
The key takeaway: each expression is evaluated and the resulting values are compared. If they are equal, then the expression is true.
What is is?
Python is, on the other hand, means is the name on the left pointing to the same object as the name on the right.
a = 3.14159
b = a
a is b
is True. a has been assigned to the value 3.14159. But more to the point, there is a block of memory holding an object, which in this case is the float 3.14159. a points to that object / block of memory. b points to a, which means that it points to that same block of memory.
You can very easily test this: create two "names" that simply point to a number, and compare them using is, and they will not match:
>>> a = 1239481203948
>>> b = 1239481203948
>>> a is b
False
This is false because we now have two different locations in memory / objects pointing to each of them:
>>> id(a)
140402381635344
>>> id(b)
140402391174416
(On your machine, you will get a different set of ids.)
So, in effect, you have "wasted" space because you have two objects taking up space for the same information.
But wait, there's more
If you play around with this on your own, you will find tons of exceptions to what I wrote, and confuse yourself. Here are just a few:
>>> a = 157
>>> b = 157
>>> a is b
True
What?? Why is this true? To optimize Python, the "most common numbers" have been optimized. I may be wrong, but I recall that there is designated space in memory for the most common numbers. And those are the first few hundred integers, and a few others.
But there are other optimizations, too:
>>> a = None
>>> b = None
>>> a is b
True
>>> a = True
>>> b = True
>>> a is b
True
These are all still following the same rule as I stated earlier: the reason why is evaluates to True is because a and b are both pointing to the same location in memory / object.
This happens in these odd cases because of optimizations in Python. But generically speaking, the only way to ensure is evaluates to True is if a name is assigned to an object that already has a name, like when we wrote:
>>> a = 3.14159
>>> b = a
>>> a is b
True
instead of writing
>>> a = 3.14159
>>> b = 3.14159
>>> a is b
False
the difference is in the reference to location.
'==' checks for equality in terms of data type and value however, 'is; reference the location of variable in memory.
is will return false for the below
id(a) = 3030783801968 <----
id(b) = 3030783802064 <----
is will return true for the below
id(a) = 2806705928816 <----
id(b) = 2806705928816 <----
Python executes a statement by evaluating its expressions to values one by one, then performing some operation on those values.
Source:
https://courses.cs.washington.edu/courses/cse140/13wi/eval_rules.pdf
Basically b = 1000 + a - a is not being done in one go, but in multiple evaluations and python stores the results for b at each evaluation in a different memory location than a. At this point a and b are different objects.
use == for equality checks.
use "is" to check if objects are same (variables are referencing same memory location).
I have a float variable which may or may not be a number, and I want to check if that is the case. With x = float('nan'), I observed some behavior that surprised me:
print(x is math.nan)
>>> False
This means that float('nan') and math.nan are different objects, which I didn't expect, but that's okay. However, the result is the same, when I check for equality with ==:
print(x == math.nan):
>>> False
I get the correct result for all kinds of not-a-number, if I use math.isnan(x). Still, why doesn't float('nan') == math.nan evaluate to True?.
"Not a number" is (in some sense) the absence of a value.
Traditionally, and per the IEEE floating-point specification, it does not equal itself.
That's because there is no meaningful value to compare.
In fact, some people use this fact to detect NaN, so you could try x != x as your condition instead (though the linked Q&A arguably has some better suggestions).
The expression math.nan is math.nan is true, though, because is does an object identity comparison rather than a value equivalence/equality comparison.
This is not special behaviour: is returns whether two object are actually referring to the same thing (essentially in memory) and == returns whether two objects have the same value.
To see if they refer to the same thing, we can use id().
>>> a = [1,2,3]
>>> b = a
>>> id(a)
140302781856200
>>> id(b)
140302781856200
>>> a == b
True
>>> a is b
True
>>> c = [1,2,3]
>>> id(c)
140302781864904
>>> a == c
True
>>> a is c
False
Here we see that by assigning b = a, they now refer to the same list: hence is and == are True. However when we define c to be a new variable with the same value as a and b, it is ==, but is returns False.
The same is true for NaNs.
That is because NaN is just a float value. Using is doesn't check for whether the variables have the same value, it checks whether they are the same object. If you create two floats with the same value, they are not the same object, they are two objects with the same value. Take this for example:
>>> a = float('nan')
>>> b = float('nan')
>>> a is b
False
So even if you create two NaN values the same way, they are not the same object. This is true even for more trivial floats. Try this:
>>> a = 1.
>>> b = 1.
>>> a is b
False
The default version of Python re-uses some values, so that any instance of that value is the same object. So take this for example (note the lack of decimal, these are integers not floats):
>>> a = 1
>>> b = 1
>>> a is b
True
But that is an implementation detail you should never rely on, it can change at any time and can vary between python implementations. But even with that, NaN is not one of the values the default Python interpreter does this for.
You can check whether two variables are the same object manually using the id function, which gives a unique number for each simultaneously-existing object (although the numbers can be re-used if a variable is deleted, even automatically).
>>> a=1.
>>> b=1.
>>> c=float('nan')
>>> d=float('nan')
>>> e=1
>>> f=1
>>> id(a)
139622774035752
>>> id(b)
139622774035872
>>> id(c)
139622774035824
>>> id(d)
139622774035800
>>> id(e)
139622781650528
>>> id(f)
139622781650528
As for why they aren't equal, that is just part of the definition of NaN as it is used on modern computers. By definition, NaN must never be equal to itself. It is part of an international standard on how floating-point numbers work, and this behavior is built into modern CPUs.
While they are not the same object (because they are from different modules where they were implemented separately) and they are not equal (by design NaN != NaN), there is the function math.isnan (and numpy.isnan if you want a vectorized version) exactly for this purpose:
import math
import numpy
math.isnan(math.nan)
# True
math.isnan(numpy.nan)
# True
math.isnan(float("nan"))
# True
Although they are all unequal to each other and not identical:
math.nan == numpy.nan or math.nan is numpy.nan
# False
math.nan == float("nan") or math.nan is float("nan")
# False
numpy.nan == float("nan") or numpy.nan is float("nan")
# False
You can use the "hex" function that is built into "float"
float('nan') == math.nan # FALSE
float('nan').hex() == math.nan.hex() # TRUE
float('nan').hex() == float('nan').hex() # TRUE
float('nan').hex() == numpy.nan.hex() # TRUE
This is very helpful if you are using queries in pandas. I recently was trying to use:
df.eval('A == "NaN"')
Which should check if column A is NaN. But, pandas was automatically converting the string, "NaN", into a float. Most people would recommend using df['A'].isna(), but in our case, trying to pass an expression into a method, so it should handle any expression.
The solution was to do:
df.applymap(lambda x: 'NaN' if x.hex() == float('NaN').hex() else x).eval('A == "NaN"')
You can convert the nan value to string for comparing.
somthing like this:
x=float("nan")
s_nan = str(x)
if s_nan == "nan":
# What you need to do...
print('x is not a number')
This question already has answers here:
Compare if two variables reference the same object in python
(6 answers)
Closed 5 months ago.
The is operator does not match the values of the variables, but the
instances themselves.
What does it really mean?
I declared two variables named x and y assigning the same values in both variables, but it returns false when I use the is operator.
I need a clarification. Here is my code.
x = [1, 2, 3]
y = [1, 2, 3]
print(x is y) # It prints false!
You misunderstood what the is operator tests. It tests if two variables point the same object, not if two variables have the same value.
From the documentation for the is operator:
The operators is and is not test for object identity: x is y is true if and only if x and y are the same object.
Use the == operator instead:
print(x == y)
This prints True. x and y are two separate lists:
x[0] = 4
print(y) # prints [1, 2, 3]
print(x == y) # prints False
If you use the id() function you'll see that x and y have different identifiers:
>>> id(x)
4401064560
>>> id(y)
4401098192
but if you were to assign y to x then both point to the same object:
>>> x = y
>>> id(x)
4401064560
>>> id(y)
4401064560
>>> x is y
True
and is shows both are the same object, it returns True.
Remember that in Python, names are just labels referencing values; you can have multiple names point to the same object. is tells you if two names point to one and the same object. == tells you if two names refer to objects that have the same value.
Another duplicate was asking why two equal strings are generally not identical, which isn't really answered here:
>>> x = 'a'
>>> x += 'bc'
>>> y = 'abc'
>>> x == y
True
>>> x is y
False
So, why aren't they the same string? Especially given this:
>>> z = 'abc'
>>> w = 'abc'
>>> z is w
True
Let's put off the second part for a bit. How could the first one be true?
The interpreter would have to have an "interning table", a table mapping string values to string objects, so every time you try to create a new string with the contents 'abc', you get back the same object. Wikipedia has a more detailed discussion on how interning works.
And Python has a string interning table; you can manually intern strings with the sys.intern method.
In fact, Python is allowed to automatically intern any immutable types, but not required to do so. Different implementations will intern different values.
CPython (the implementation you're using if you don't know which implementation you're using) auto-interns small integers and some special singletons like False, but not strings (or large integers, or small tuples, or anything else). You can see this pretty easily:
>>> a = 0
>>> a += 1
>>> b = 1
>>> a is b
True
>>> a = False
>>> a = not a
>>> b = True
a is b
True
>>> a = 1000
>>> a += 1
>>> b = 1001
>>> a is b
False
OK, but why were z and w identical?
That's not the interpreter automatically interning, that's the compiler folding values.
If the same compile-time string appears twice in the same module (what exactly this means is hard to define—it's not the same thing as a string literal, because r'abc', 'abc', and 'a' 'b' 'c' are all different literals but the same string—but easy to understand intuitively), the compiler will only create one instance of the string, with two references.
In fact, the compiler can go even further: 'ab' + 'c' can be converted to 'abc' by the optimizer, in which case it can be folded together with an 'abc' constant in the same module.
Again, this is something Python is allowed but not required to do. But in this case, CPython always folds small strings (and also, e.g., small tuples). (Although the interactive interpreter's statement-by-statement compiler doesn't run the same optimization as the module-at-a-time compiler, so you won't see exactly the same results interactively.)
So, what should you do about this as a programmer?
Well… nothing. You almost never have any reason to care if two immutable values are identical. If you want to know when you can use a is b instead of a == b, you're asking the wrong question. Just always use a == b except in two cases:
For more readable comparisons to the singleton values like x is None.
For mutable values, when you need to know whether mutating x will affect the y.
is only returns true if they're actually the same object. If they were the same, a change to one would also show up in the other. Here's an example of the difference.
>>> x = [1, 2, 3]
>>> y = [1, 2, 3]
>>> print x is y
False
>>> z = y
>>> print y is z
True
>>> print x is z
False
>>> y[0] = 5
>>> print z
[5, 2, 3]
Prompted by a duplicate question, this analogy might work:
# - Darling, I want some pudding!
# - There is some in the fridge.
pudding_to_eat = fridge_pudding
pudding_to_eat is fridge_pudding
# => True
# - Honey, what's with all the dirty dishes?
# - I wanted to eat pudding so I made some. Sorry about the mess, Darling.
# - But there was already some in the fridge.
pudding_to_eat = make_pudding(ingredients)
pudding_to_eat is fridge_pudding
# => False
is and is not are the two identity operators in Python. is operator does not compare the values of the variables, but compares the identities of the variables. Consider this:
>>> a = [1,2,3]
>>> b = [1,2,3]
>>> hex(id(a))
'0x1079b1440'
>>> hex(id(b))
'0x107960878'
>>> a is b
False
>>> a == b
True
>>>
The above example shows you that the identity (can also be the memory address in Cpython) is different for both a and b (even though their values are the same). That is why when you say a is b it returns false due to the mismatch in the identities of both the operands. However when you say a == b, it returns true because the == operation only verifies if both the operands have the same value assigned to them.
Interesting example (for the extra grade):
>>> del a
>>> del b
>>> a = 132
>>> b = 132
>>> hex(id(a))
'0x7faa2b609738'
>>> hex(id(b))
'0x7faa2b609738'
>>> a is b
True
>>> a == b
True
>>>
In the above example, even though a and b are two different variables, a is b returned True. This is because the type of a is int which is an immutable object. So python (I guess to save memory) allocated the same object to b when it was created with the same value. So in this case, the identities of the variables matched and a is b turned out to be True.
This will apply for all immutable objects:
>>> del a
>>> del b
>>> a = "asd"
>>> b = "asd"
>>> hex(id(a))
'0x1079b05a8'
>>> hex(id(b))
'0x1079b05a8'
>>> a is b
True
>>> a == b
True
>>>
Hope that helps.
x is y is same as id(x) == id(y), comparing identity of objects.
As #tomasz-kurgan pointed out in the comment below is operator behaves unusually with certain objects.
E.g.
>>> class A(object):
... def foo(self):
... pass
...
>>> a = A()
>>> a.foo is a.foo
False
>>> id(a.foo) == id(a.foo)
True
Ref;
https://docs.python.org/2/reference/expressions.html#is-not
https://docs.python.org/2/reference/expressions.html#id24
As you can check here to a small integers. Numbers above 257 are not an small ints, so it is calculated as a different object.
It is better to use == instead in this case.
Further information is here: http://docs.python.org/2/c-api/int.html
X points to an array, Y points to a different array. Those arrays are identical, but the is operator will look at those pointers, which are not identical.
It compares object identity, that is, whether the variables refer to the same object in memory. It's like the == in Java or C (when comparing pointers).
A simple example with fruits
fruitlist = [" apple ", " banana ", " cherry ", " durian "]
newfruitlist = fruitlist
verynewfruitlist = fruitlist [:]
print ( fruitlist is newfruitlist )
print ( fruitlist is verynewfruitlist )
print ( newfruitlist is verynewfruitlist )
Output:
True
False
False
If you try
fruitlist = [" apple ", " banana ", " cherry ", " durian "]
newfruitlist = fruitlist
verynewfruitlist = fruitlist [:]
print ( fruitlist == newfruitlist )
print ( fruitlist == verynewfruitlist )
print ( newfruitlist == verynewfruitlist )
The output is different:
True
True
True
That's because the == operator compares just the content of the variable. To compare the identities of 2 variable use the is operator
To print the identification number:
print ( id( variable ) )
The is operator is nothing but an English version of ==.
Because the IDs of the two lists are different so the answer is false.
You can try:
a=[1,2,3]
b=a
print(b is a )#True
*Because the IDs of both the list would be same
This question already has answers here:
"is" operator behaves unexpectedly with integers
(11 answers)
Closed 8 years ago.
So i've been trying a few things out in python and happened to come across this:
>>> a = 10
>>> b = 10
>>> a is b
True
Apparently when creating the variable b Python notices that there is already another(different) variable with the value 10 and simply creates a reference to it (simply to save memory, maybe?). Since integers are immutable (at least I think they are) it makes some kind of sense.
But then i tried the same thing with a bigger number and got this:
>>> a = 100200103847239642631982367
>>> b = 100200103847239642631982367
>>> a is b
False
Here, for some reason Python seems to create another int object instead of making the variable b a reference to the variable a, which does not make sense to me. Assuming the reason the references created in first example is to save memory, wouldn't it be even more efficient to create a reference in the latter case also since the numbers are way bigger?
Python normally caches integers between -5 and 256 (although this may differ between implementation); when two names point to the same cached integer, they have the same id, and thus point to the same object:
>>> c = 10
>>> d = 10
>>> id(c) == id(d)
True
>>> c is d
True
Once you breach that cache threshold, however, the ids will change:
>>> e = 256
>>> d = 256
>>> id(e) == id(d)
True
>>> d = 257
>>> e = 257
>>> id(d) == id(e)
False
>>> d is e
False
>>> f = -5
>>> g = -5
>>> id(f) == id(g)
True
>>> f = -6
>>> g = -6
>>> id(f) == id(g)
False
You are seeing the same effect.
Keep in mind that is does not compare values, do not use is when you really mean "equals to":
>>> 10 * 1000 is 10000
False
>>> 10 * 1000 == 10000
True
My python IDE PyCharm by defaults suggests to change the following line of python:
if variable != 0:
to
if variable is not 0:
Why does it suggest this? Does it matter at all for the execution (i.e. does this behave different for any edge cases)?
It's a bug. You should not test integers by identity. Although it may work ok for small integers, it's just an implementation detail.
If you were checking variable is False, that would be ok. Perhaps the IDE is tripped up by the semantics
The != operator checks for non equality of value. The is operator is used to check for identity. In Python, you cannot have two instances of the same integer literal so the expressions have the same effect. The is not 0 reads more like English which is probably why the IDE is suggesting it (although I wouldn't accept the recommendation).
I did try some analysis. I dumped the bytecode for both the expressions and can't see any difference in the opcodes. One has COMPARE_OP 3 (!=) and the other has COMPARE_OP 9 (is not). They're the same. I then tried some performance runs and found that time taken is negligibly higher for the !=.
is not should be preferred if your matching object's identity not equality.
see these examples
>>> a=[1,2,3]
>>> b=[1,2,3] #both are eqaul
>>> if a is not b:
print('they are eqaul but they are not the same object')
they are eqaul but they are not the same object
>>> if a != b:
print('hello') #prints nothing because both have same value
>>> a=100000
>>> b=100000
>>> a is b
False
>>> if a is not b:
print('they are eqaul but they are not the same object')
they are eqaul but they are not the same object
>>> if a!=b:
print('something') #prints nothing as != matches their value not identity
But if the numbers stored in a and b are small integers or small strings then a is not b will not work as python does some caching, and they both point to the same object.
>>> a=2
>>> b=2
>>> a is b
True
>>> a='wow'
>>> b='wow'
>>> a is b
True
>>> a=9999999
>>> b=9999999
>>> a is b
False
The operator "is not" is checking for object identity and the operator != checks for object equality. I do not think there you should do this in your case but maybe your ide suggests this for the general case?