How does Python establish equality between objects?

How does Python establish equality between objects? - python

I tracked down an error in my program to a line where I was testing for the existence of an object in a list of objects. The line always returned False, which meant that the object wasn't in the list. In fact, it kept happening, even when I did the following:
class myObject(object):
__slots__=('mySlot')
def __init__(self,myArgument):
self.mySlot=myArgument
print(myObject(0)==myObject(0)) # prints False
a=0
print(myObject(a)==myObject(a)) # prints False
a=myObject(a)
print(a==a) # prints True
I've used deepcopy before, but I'm not experienced enough with Python to know when it is and isn't necessary, or mechanically what the difference is. I've also heard of pickling, but never used it. Can someone explain to me what's going on here?
Oh, and another thing. The line
if x in myIterable:
probably tests equality between x and each element in myIterable, right? So if I can change the perceived equality between two objects, I can modify the output of that line? Is there a built-in for that and all of the other inline operators?

It passes the second operand to the __eq__() method of the first.
Incorrect. It passes the first operand to the __contains__() method of the second, or iterates the second performing equality comparisons if no such method exists.
Perhaps you meant to use is, which compares identity instead of equality.

The line myObject(0)==myObject(0) in your code is creating two different instances of a myObject , and since you haven't defined __eq__ they are being compared for identity (i.e. memory location).
x.__eq__(y) <==> x==y and your line about if x in myIterable: using "comparing equal" for the in keyword is correct unless the iterable defines __contains__.
print(myObject(0)==myObject(0)) # prints False
#because id(left_hand_side_myObject) != id(right_hand_side_myObject)
a=0
print(myObject(a)==myObject(a)) # prints False
#again because id(left_hand_side_myObject) != id(right_hand_side_myObject)
a=myObject(a)
print(a==a) # prints True
#because id(a) == id(a)

myObject(a) == myObject(a) returns false because you are creating two separate instances of myObject (with the same attribute a). So the two objects have the same attributes, but they are different instances of your class, so they are not equal.
If you want to check whether an object is in a list, then yeah,
if x in myIterable
would probably be the easiest way to do that.
If you want to check whether an object has the exact same attributes as another object in a list, maybe try something like this:
x = myObject(a)
for y in myIterable:
if x.mySlot == y.mySlot:
print("Exists")
break
Or, you could use __eq__(self,other) in your class definition to set the conditions for eqality.

Related

type(x) is list vs type(x) == list

In Python, suppose one wants to test whether the variable x is a reference to a list object. Is there a difference between if type(x) is list: and if type(x) == list:? This is how I understand it. (Please correct me if I am wrong)
type(x) is list tests whether the expressions type(x) and list evaluate to the same object and type(x) == list tests whether the two objects have equivalent (in some sense) values.
type(x) == list should evaluate to True as long as x is a list. But can type(x) evaluate to a different object from what list refers to?
What exactly does the expression list evaluate to? (I am new to Python, coming from C++, and still can't quite wrap my head around the notion that types are also objects.) Does list point to somewhere in memory? What data live there?

The "one obvious way" to do it, that will preserve the spirit of "duck typing" is isinstance(x, list). Rather, in most cases, one's code won't be specific to a list, but could work with any sequence (or maybe it needs a mutable sequence). So the recomendation is actually:
from collections.abc import MutableSequence
...
if isinstance(x, MutableSequence):
...
Now, going into your specific questions:
What exactly does the expression list evaluate to? Does list point to somewhere in memory? What data live there?
list in Python points to a class. A class that can be inherited, extended, etc...and thanks to a design choice of Python, the syntax for creating an instance of a class is indistinguishable from calling a function.
So, when teaching Python to novices, one could just casually mention that list is a "function" (I prefer not, since it is straightout false - the generic term for both functions and classes in regards to that they can be "called" and will return a result is callable)
Being a class, list does live in a specific place in memory - the "where" does not make any difference when coding in Python - but yes, there is one single place in memory where a class, which in Python is also an object, an instance of type, exists as a data structure with pointers to the various methods that one can use in a Python list.
As for:
type(x) is list tests whether the expressions type(x) and list evaluate to the same object and type(x) == list tests whether the two objects have equivalent (in some sense) values.
That is correct: is is a special operator that unlike others cannot be overriden for any class and checks for object itentity - in the cPython implementation, it checks if both operands are at the same memory address (but keep in mind that that address, though visible through the built-in function id, behaves as if it is opaque from Python code). As for the "sense" in which objects are "equal" in Python: one can always override the behavior of the == operator for a given object, by creating the special named method __eq__ in its class. (The same is true for each other operator - the language data model lists all available "magic" methods).
For lists, the implemented default comparison automatically compares each element recursively (calling the .__eq__ method for each item pair in both lists, if they have the same size to start with, of course)
type(x) == list should evaluate to True as long as x is a list. But can type(x) evaluate to a different object from what list refers to?
Not if "x" is a list proper: type(x) will always evaluate to list. But == would fail if x were an instance of a subclass of list, or another Sequence implementation: that is why it is always better to compare classes using the builtins isinstance and issubclass.

is checks exact identity, and only works if there is exactly one and only one of the list type. Fortunately, that's true for the list type (unless you've done something silly), so it usually works.
== test standard equality, which is equivalent to is for most types including list
Taking those together, there is no effective difference between type(x) is list and type(x) == list, but the former is construction better describes what's happening under the hood.
Consider avoiding use of the type(x) is sometype construction in favor of the isinstance function instead, because it will work for inherited classes too:
x = [1, 2, 3]
isinstance(x, list) # True
class Y(list):
'''A class that inherits from list'''
...
y = Y([1, 2, 3])
isinstance(y, list) # True
type(y) is list # False
Better yet, if you really just want to see if something is list-like, then use isinstance with either typing or collections.abc like so:
import collections.abc
x = [1, 2, 3]
isinstance(x, collections.abc.Iterable) # True
isinstance(x, collections.abc.Sequence) # True
x = set([1, 2, 3])
isinstance(x, collections.abc.Iterable) # True
isinstance(x, collections.abc.Sequence) # False
Note that lists are both Iterable (usable in a for loop) and a Sequence (ordered). A set is Iterable but not a Sequence, because you can use it in a for loop, but it isn't in any particular order. Use Iterable when you want to use in a for loop, or Sequence if it's important that the list be in a certain order.
Final note: The dict type is Mapping, but also counts as Iterable since you can loop over it's keys in a for loop.

Questions about python dictionary equality [duplicate]

Curiously:
>>> a = 123
>>> b = 123
>>> a is b
True
>>> a = 123.
>>> b = 123.
>>> a is b
False
Seems a is b being more or less defined as id(a) == id(b). It is easy to make bugs this way:
basename, ext = os.path.splitext(fname)
if ext is '.mp3':
# do something
else:
# do something else
Some fnames unexpectedly ended up in the else block. The fix is simple, we should use ext == '.mp3' instead, but nonetheless if ext is '.mp3' on the surface seems like a nice pythonic way to write this and it's more readable than the "correct" way.
Since strings are immutable, what are the technical details of why it's wrong? When is an identity check better, and when is an equality check better?

They are fundamentally different.
== compares by calling the __eq__ method
is returns true if and only if the two references are to the same object
So in comparision with say Java:
is is the same as == for objects
== is the same as equals for objects

As far as I can tell, is checks for object identity equivalence. As there's no compulsory "string interning", two strings that just happen to have the same characters in sequence are, typically, not the same string object.
When you extract a substring from a string (or, really, any subsequence from a sequence), you will end up with two different objects, containing the same value(s).
So, use is when and only when you are comparing object identities. Use == when comparing values.

Simple rule for determining if to use is or == in Python
Here is an easy rule (unless you want to go to theory in Python interpreter or building frameworks doing funny things with Python objects):
Use is only for None comparison.
if foo is None
Otherwise use ==.
if x == 3
Then you are on the safe side. The rationale for this is already explained int the above comments. Don't use is if you are not 100% sure why to do it.

It would be also useful to define a class like this to be used as the default value for constants used in your API. In this case, it would be more correct to use is than the == operator.
class Sentinel(object):
"""A constant object that does not change even when copied."""
def __deepcopy__(self, memo):
# Always return the same object because this is essentially a constant.
return self
def __copy__(self):
# called via copy.copy(x)
return self

You should be warned by PyCharm when you use is with a literal with a warning such as SyntaxWarning: "is" with a literal. Did you mean "=="?. So, when comparing with a literal, always use ==. Otherwise, you may prefer using is in order to compare objects through their references.

Behavior of NotImpemented in comparison

I recently found out that python has a special value NotImpemented to be used with respect to binary special methods to indicate that some operation has not been implemented.
The peculiar about this is that when checked in a binary situation it is always equivalent to True.
For example using io.BytesIO (which is a case where __eq__ in not implemented for example) for two objects in comparison will virtually return True. As in this example (encoded_jpg_io1 and encoded_jpg_io2 are objects of the io.BytesIO class):
if encoded_jpg_io1.__ne__(encoded_jpg_io2):
print('Equal')
else:
print('Unequal')
Equal
if encoded_jpg_io1.__eq__(encoded_jpg_io2) == True:
print('Equal')
else:
print('Unequal')
Unequal
Since the second style is a bit too verbose and normally not prefered (even my pyCharm suggests to remove the explicit comparison with True) isn't a bit tricky behavior? I wouldn't have noticed it if I haven't explicitly print the result of the Boolean operation (which is not Boolean in this case at all).
I guess suggesting to be considered False would cause the same problem with __ne__ so we arew back to step one.
So, the only way to check out for these cases is by doing an exact comparison with True or False in the opposite case.
I know that NotImpemented is preferred over NotImplementedError for various reasons so I am not asking for any explanation over why this matter.

Per convention, objects that do not define a __bool__ method are considered truthy. From the docs:
By default, an object is considered true unless its class defines either a __bool__() method that returns False or a __len__() method that returns zero
This means that most classes, functions, and other builtin singletons are considered true, since they don't go out of their way to specify different behavior. (An exception is None, which is one of the few built-in singletons that does specifically signal it should be considered false):
>>> bool(int) # the class, not an integer object
True
>>> bool(min)
True
>>> bool(object())
True
>>> bool(...) # that's the Ellipsis object
True
>>> bool(NotImplemented)
True
There is no real reason for the NotImplemented object to break this convention. The problem with your code isn't that NotImplemented is considered truthy; the real problem is that x.__eq__(y) is not equivalent to x == y.
If you want to compare two objects for equality, doing it with x.__eq__(y) is incorrect. Using x.__eq__(y) == True instead is still incorrect.
The correct solution is to do comparison with the == operator. If, for whatever reason, you can't use the == operator directly, you should use the operator.eq function instead.

In python, is there some kind of mapping to return the "False value" of a type?

I am looking for some kind of a mapping function f() that does something similar to this:
f(str) = ''
f(complex) = 0j
f(list) = []
Meaning that it returns an object of type that evaluates to False when cast to bool.
Does such a function exist?

No, there is no such mapping. Not every type of object has a falsy value, and others have more than one. Since the truth value of a class can be customized with the __bool__ method, a class could theoretically have an infinite number of (different) falsy instances.
That said, most builtin types return their falsy value when their constructor is called without arguments:
>>> str()
''
>>> complex()
0j
>>> list()
[]

Nope, and in general, there may be no such value. The Python data model is pretty loose about how the truth-value of a type may be implemented:
object.__bool__(self)
Called to implement truth value testing and the built-in operation
bool(); should return False or True. When this method is not defined,
__len__() is called, if it is defined, and the object is considered true if its result is nonzero. If a class defines neither __len__()
nor __bool__(), all its instances are considered true.
So consider:
import random
class Wacky:
def __bool__(self):
return bool(random.randint(0,1))
What should f(Wacky) return?

This is actually called an identity element, and in programming is most often seen as part of the definition of a monoid. In python, you can get it for a type using the mzero function in the PyMonad package. Haskell calls it mempty.

Not all types have such a value to begin with. Others may have many such values. The most correct way of doing this would be to create a type-to-value dict, because then you could check if a given type was in the dict at all, and you could chose which value is the correct one if there are multiple options. The drawback is of course that you would have to somehow register every type you were interested in.
Alternatively, you could write a function using some heuristics. If you were very careful about what you passed into the function, it would probably be of some limited use. For example, all the cases you show except complex are containers that generalize with cls().
complex actually works like that too, but I mention it separately because int and float do not. So if your attempt with the empty constructor fails by returning a truthy object or raising a TypeError, you can try cls(0). And so on and so forth...
Update
#juanpa.arrivillaga's answer actually suggests a clever workaround that will work for most classes. You can extend the class and forcibly create an instance that will be falsy but otherwise identical to the original class. You have to do this by extending because dunder methods like __bool__ are only ever looked up on the class, never on an instance. There are also many types where such methods can not be replaced on the instance to begin with. As #Aran-Fey's now-deleted comment points out, you can selectively call object.__new__ or t.__new__, depending on whether you are dealing with a very special case (like int) or not:
def f(t):
class tx(t):
def __bool__(self):
return False
try:
return object.__new__(tx)
except TypeError:
return tx.__new__(tx)
This will only work for 99.9% of classes you ever encounter. It is possible to create a contrived case that raises a TypeError when passed to object.__new__ as int does, and does not allow for a no-arg version of t.__new__, but I doubt you will ever find such a thing in nature. See the gist #Aran-Fey made to demonstrate this.

No such function exists because it's not possible in general. A class may have no falsy value or it may require reversing an arbitrarily complex implementation of __bool__.
What you could do by breaking everything else is to construct a new object of that class and forcibly assign its __bool__ function to one that returns False. Though I suspect that you are looking for an object that would otherwise be a valid member of the class.
In any case, this is a Very Bad Idea in classic style.

why some types of data refer to the same memory location

Asked such a question. Why only the type only str and boolean with the same variables refer to one memory location:
a = 'something'
b = 'something'
if a is b: print('True') # True
but we did not write anywhere a = b. hence the interpreter saw that the strings are equal to each other and made a reference to one memory cell.
Of course, if we assign a new value to either of these two variables, there will be no conflict, so now the variable will refer to another memory location
b = 'something more'
if a is b: print('True') # False
with type boolean going on all the same
a = True
b = True
if a is b: print('True') # True
I first thought that this happens with all mutable types. But no. There remained one unchangeable type - tuple. But it has a different behavior, that is, when we assign the same values to variables, we already refer to different memory cells. Why does this happen only with tuple of immutable types
a = (1,9,8)
b = (1,9,8)
if a is b: print('True') # False

In Python, == checks for value equality, while is checks if basically its the same object like so: id(object) == id(object)
Python has some builtin singletons which it starts off with (I'm guessing lower integers and some commonly used strings)
So, if you dig deeper into your statement
a = 'something'
b = 'something'
id(a)
# 139702804094704
id(b)
# 139702804094704
a is b
# True
But if you change it a bit:
a = 'something else'
b = 'something else'
id(a)
# 139702804150640
id(b)
# 139702804159152
a is b
# False
We're getting False because Python uses different memory location for a and b this time, unlike before.
My guess is with tuples (and someone correct me if I'm mistaken) Python allocates different memory every time you create one.

Why do some types cache values? Because you shouldn't be able to notice the difference!
is is a very specialized operator. Nearly always you should use == instead, which will do exactly what you want.
The cases where you want to use is instead of == basically are when you're dealing with objects that have overloaded the behavior of == to not mean what you want it to mean, or where you're worried that you might be dealing with such objects.
If you're not sure whether you're dealing with such objects or not, you're probably not, which means that == is always right and you don't have to ever use is.
It can be a matter of "style points" to use is with known singleton objects, like None, but there's nothing wrong with using == there (again, in the absence of a pathological implementation of ==).
If you're dealing with potentially untrustworthy objects, then you should never do anything that may invoke a method that they control.... and that's a good place to use is. But almost nobody is doing that, and those who do should be aware of the zillion other ways a malicious object could cause problems.
If an object implements == incorrectly then you can get all kinds of weird problems. In the course of debugging those problems, of course you can and should use is! But that shouldn't be your normal way of comparing objects in code you write.
The one other case where you might want to use is rather than == is as a performance optimization, if the object you're dealing with implements == in a particularly expensive way. This is not going to be the case very often at all, and most of the time there are better ways to reduce the number of times you have to compare two objects (e.g. by comparing hash codes instead) which will ultimately have a much better effect on performance without bringing correctness into question.
If you use == wherever you semantically want an equality comparison, then you will never even notice when some types sneakily reuse instances on you.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.