When implementing a custom equality function for a class, does it make sense to check for identity first? An example:
def __eq__(self, other):
return (self is other) or (other criteria)
This interesting is for cases when the other criteria may be more expensive (e.g. comparing some long strings).
It may be a perfectly reasonable shortcut to check for identity first, and in equality methods good shortcuts (for both equality and non equality) are what you should be looking for so that you can return as soon as possible.
But, on the other hand, it could also be a completely superfluous check if your test for equality is otherwise cheap and you are unlikely in practice to be comparing an object with itself.
For example, if equality between objects can be gauged by comparing one or two integers then this should be quicker than the identity test, so in less than the time it would take to compare ids you've got the whole answer. And remember that if you check the identities and the objects don't have the same id (which is likely in most scenarios) then you've not gained anything as you've still got to do the full check.
So if full equality checking is not cheap and it's possible that an object could be compared against itself, then checking identity first can be a good idea.
Note that another reason the check isn't done by default is that it is quite reasonable (though rare) for objects with equal identities to compare as non equal, for example:
>>> s = float('nan')
>>> s == s
False
necesarry: no
does it make sense: sure, why not?
No such check is done by default, as you can see here:
class bad(object):
def __eq__(self, other):
return False
x = bad()
print x is x, x==x # True, False
When you implement custom equality in a class, you can decide for yourself whether to check for identify first. It's entirely up to you. Note that in Python, it's also perfectly valid to decide that __eq__ and __ne__ will return the same value for a given argument; so it's possible to define equality such that identity isn't a shortcut.
It's certainly a speed improvement, although how much of one depends on the complexity of the method. I generally don't bother in my custom classes, but I don't have a lot of speed-critical code (and where I do, object comparisons aren't the hotspot).
For most of my objects, the equality method looks like:
def __eq__(self, o):
try:
return self.x == o.x and self.y == o.y
except AttributeError:
return False
I could easily add a if self is o: return True check at the beginning of the method.
Also remember to override __hash__ if you override __eq__, or you'll get odd behaviors in sets and dicts.
I asked a similar question on comp.lang.python a few years ago - here is the thread. The conclusions at that time were that the up-front identity test was worth it if you did many tests for equality of objects with themselves, or if your other equality testing logic was slow.
This is only done for performance reasons.
At one programming job I worked on, in Java, this was always done, altough it does not change any functionality.
Related
PEP 8 Programming Recommendations says:
Comparisons to singletons like None should always be done with is or is not, never the equality operators.
According to the docs, enum members are singletons. Does that mean they should also be compared by identity?
class Color(Enum):
RED = 1
GREEN = 2
BLUE = 3
# like this?
if color is Color.RED:
...
# or like this
if color == Color.RED:
...
When using equality operators, I haven't noticed any issues with this to warrant such strong wording as PEP 8. What's the drawback of using equality, if any? Doesn't it just fall back to an identity-based comparison anyway? Is this just a micro-optimisation?
From https://docs.python.org/3/library/enum.html#module-enum:
Within an enumeration, the members can be compared by identity
In particular, from https://docs.python.org/3/howto/enum.html#comparisons :
Enumeration members are compared by identity
First, we can definitely rule out x.value is y.value, because those aren't singletons, those are perfectly ordinary values that you've stored in attributes.
But what about x is y?
First, I believe that, by "singletons like None", PEP 8 is specifically referring to the small, fixed set of built-in singletons that are like None in some important way. What important way? Why do you want to compare None with is?
Readability: if foo is None: reads like what it means. On the rare occasions when you want to distinguish True from other truthy values, if spam is True: reads better than if spam == True:, as well as making it more obvious that this isn't just a frivolous == True used by someone improperly following a C++ coding standard in Python. That might apply in foo is Potato.spud, but not so much in x is y.
Use as a sentinel: None is used to mean "value missing" or "search failed" or similar cases. It shouldn't be used in cases where None itself can be a value, of course. And if someone creates a class whose instances compare equal to None, it's possible to run into that problem without realizing it. is None protects against that. This is even more of a problem with True and False (again, on those rare occasions when you want to distinguish them), since 1 == True and 0 == False. This reason doesn't seem to apply here—if 1 == Potato.spud, that's only because you intentionally chose to use an IntEnum instead of an Enum, in which case that's exactly what you want…
(Quasi-)keyword status: None and friends have gradually migrated from perfectly normal builtin to keyword over the years. Not only is the default value of the symbol None always going to be the singleton, the only possible value is that singleton. This means that an optimizer, static linter, etc. can make an assumption about what None means in your code, in a way that it can't for anything defined at runtime. Again, this reason doesn't seem to apply.
Performance: This really is not a consideration at all. It might be faster, on some implementations, to compare with is than with ==, but this is incredibly unlikely to ever make a difference in real code (or, if it does, that real code probably needs a higher-level optimization, like turning a list into a set…).
So, what's the conclusion?
Well, it's hard to get away from an opinion here, but I think it's reasonable to say that:
if devo is Potato.spud: is reasonable if it makes things more readable, but as long as you're consistent within a code base, I don't think anyone will complain either way.
if x is y:, even when they're both known to be Potato objects, is not reasonable.
if x.value is Potato.spud.value is not reasonable.
PEP 8 says:
Comparisons to singletons like None should always be done with is or is not, never the equality operators.
I disagree with abarnert: the reason for this isn't because these are built-in or special in any way. It's because in these cases you care about having the object, not something that looks like it.
When using is None, for example, you care about whether it's the None that you put there, not a different None that had been passed in. This might be difficult in practice (there is only one None, after all) but it does sometimes matter.
Take, for example:
no_argument = object()
def foo(x=no_argument):
if x OP no_argument:
...
...
If OP is is, this is perfectly idiomatic code. If it's ==, it's not.
For the same reason, you should make the decision as so:
If you want equality to duck type, such as with IntEnum or an Enum that you might want to subclass and overwrite (such as when you have complex enum types with methods and other extras) it makes sense to use ==.
When you are using enums as dumb sentinels, use is.
According to the object.__eq__() documentation, the default (that is, in the object class) implementation for == is as follows:
True if x is y else NotImplemented
Still following the documentation for NotImplemented, I inferred that NotImplemented implies that the Python runtime will try the comparison the other way around. That is try y.__eq__(x) if x.__eq__(y) returns NotImplemented (in the case of the == operator).
Now, the following code prints False and True in python 3.9:
class A:
pass
print(A() == A())
print(bool(NotImplemented))
So my question is the following: where does the documentation mention the special behavior of NotImplemented in the context of __eq__ ?
PS : I found an answer in CPython source code but I guess that this must/should be somewhere in the documentation.
According to the object.__eq__() documentation, the default (that is, in the object class) implementation for == is as follows
No; that is the default implementation of __eq__. ==, being an operator, cannot be implemented in classes.
Python's implementation of operators is cooperative. There is hard-coded logic that uses the dunder methods to figure out what should happen, and possibly falls back on a default. This logic is outside of any class.
You can see another example with the built-in len: a class can return whatever it likes from its __len__ method, and you can in principle call it directly and get a value of any type. However, this does not properly implement the protocol, and len will complain when it doesn't get a positive integer back. There is not any class which contains that type-checking and value-checking logic. It is external.
Still following the documentation for NotImplemented, I inferred that NotImplemented implies that the Python runtime will try the comparison the other way around. That is try y.__eq__(x) if x.__eq__(y) returns NotImplemented (in the case of the == operator).
NotImplemented is just an object. It is not syntax. It does not have any special behavior, and in Python, simply returning a value does not trigger special behavior besides that the value is returned.
The external code for binary operators will try to look for the matching __op__, and try to look for the matching __rop__ if __op__ didn't work. At this point, NotImplemented is not an acceptable answer (it is a sentinel that exists specifically for this purpose, because None is an acceptable answer). In general, if the answer so far is still NotImplemented, then the external code will raise NotImplementedError.
As a special case, objects that don't provide their own comparison (i.e., the default from object is used for __eq__ or __ne__) will compare as "not equal" unless they are identical. The C implementation repeats the identity check (in case a class explicitly defines __eq__ or __ne__ to return NotImplemented directly, I guess). This is because it is considered sensible to give this result, and obnoxious to make == fail all the time when there is a sensible default.
However, the two objects are still not orderable without explicit logic, since there isn't a reasonable default. (You could compare the pointer values, but they're arbitrary and don't have anything to do with the Python logic that got you to that point; so ordering things that way isn't realistically useful for writing Python code.) So, for example, x < y will raise a TypeError if the comparison logic isn't provided. (It does this even if x is y; you could reasonably say that <= and >= should be true in this case, and < and > should be false, but it makes things too complicated and is not very useful.)
[Observation: print(bool(NotImplemented)) prints True]
Well, yes; NotImplemented is an object, so it's truthy by default; and it doesn't represent a numeric value, and isn't a container, so there's no reason for it to be falsy.
However, that also doesn't tell us anything useful. We don't care about the truthiness of NotImplemented here, and it isn't used that way in the Python implementation. It is just a sentinel value.
where does the documentation mention the special behavior of NotImplemented in the context of __eq__ ?
Nowhere, because it isn't a behavior of NotImplemented, as explained above.
Okay, but that leaves underlying question: where does the documentation explain what the == operator does by default?
Answer: because we are talking about an operator, and not about a method, it's not in the section about dunder methods. It's in section 6, which talks about expressions. Specifically, 6.10.1. Value comparisons:
The default behavior for equality comparison (== and !=) is based on the identity of the objects. Hence, equality comparison of instances with the same identity results in equality, and equality comparison of instances with different identities results in inequality. A motivation for this default behavior is the desire that all objects should be reflexive (i.e. x is y implies x == y).
I recently found out that python has a special value NotImpemented to be used with respect to binary special methods to indicate that some operation has not been implemented.
The peculiar about this is that when checked in a binary situation it is always equivalent to True.
For example using io.BytesIO (which is a case where __eq__ in not implemented for example) for two objects in comparison will virtually return True. As in this example (encoded_jpg_io1 and encoded_jpg_io2 are objects of the io.BytesIO class):
if encoded_jpg_io1.__ne__(encoded_jpg_io2):
print('Equal')
else:
print('Unequal')
Equal
if encoded_jpg_io1.__eq__(encoded_jpg_io2) == True:
print('Equal')
else:
print('Unequal')
Unequal
Since the second style is a bit too verbose and normally not prefered (even my pyCharm suggests to remove the explicit comparison with True) isn't a bit tricky behavior? I wouldn't have noticed it if I haven't explicitly print the result of the Boolean operation (which is not Boolean in this case at all).
I guess suggesting to be considered False would cause the same problem with __ne__ so we arew back to step one.
So, the only way to check out for these cases is by doing an exact comparison with True or False in the opposite case.
I know that NotImpemented is preferred over NotImplementedError for various reasons so I am not asking for any explanation over why this matter.
Per convention, objects that do not define a __bool__ method are considered truthy. From the docs:
By default, an object is considered true unless its class defines either a __bool__() method that returns False or a __len__() method that returns zero
This means that most classes, functions, and other builtin singletons are considered true, since they don't go out of their way to specify different behavior. (An exception is None, which is one of the few built-in singletons that does specifically signal it should be considered false):
>>> bool(int) # the class, not an integer object
True
>>> bool(min)
True
>>> bool(object())
True
>>> bool(...) # that's the Ellipsis object
True
>>> bool(NotImplemented)
True
There is no real reason for the NotImplemented object to break this convention. The problem with your code isn't that NotImplemented is considered truthy; the real problem is that x.__eq__(y) is not equivalent to x == y.
If you want to compare two objects for equality, doing it with x.__eq__(y) is incorrect. Using x.__eq__(y) == True instead is still incorrect.
The correct solution is to do comparison with the == operator. If, for whatever reason, you can't use the == operator directly, you should use the operator.eq function instead.
I am looking into ways to use a quasi Singleton Pattern in python. Quick problem description:
I have objects that describe subset of a certain group. For simplicity assume integer numbers like in set([1,2,3]). In my case, comparison is difficult and if possible expensive, so I assume that if I have
complex_set1 = ...
complex_set2 = ...
these are different. Also, all set are immutable (like frozenset). Except there exists the full and empty sets for convenience
full_set = FullSet()
empty_set = EmptySet()
These seem to make sense to be singletons. One way would be to create one instance and just add it to the package on import. So that there exists only one and you cannot create another.
Now my idea:
Since I do not care if I have multiple objects as long as they are considered the same in any case (besides a is b is obviously false) I just make them look equal (like a singleton would). An example would be
len(set([FullSet(), FullSet()]))
>>> 1
So, I experimented with
def hash(self):
return 0 # make sure all have the same hash
def __eq__(self, other):
if is instance(other, FullSet):
return True
return NotImplemented
Does this have a name? Is it considered a singleton pattern or something else?
Should I use this or are there caveats to be aware of?
Any comments on the hash value which is also used for other purposes than comparison? Does it make more sense to use e.g. return hash(FullSet)
What's the suggested Python semantics for ordering of objects of distinct types? In other words, what behavior should one implement when a custom-written comparison method (using rich comparisons like e.g. __lt__, but perhaps also when using the ‘poor’ comparison __cmp__ in Python 2) encounters an object of a different type than self?
Should one invent an order, e.g. “all unexpected objects compare as less than my own type”?
Should one throw a TypeError?
Is there some easy way to let the other object have a try, i.e. if one does foo < bar and foo.__lt__ doesn't know about bar's type, can it fall back to bar.__gt__?
Are there any guidelines at all about how to achieve sane ordering of objects of distinct types, preferrably a total order but perhaps less?
Is there any part in the documentation which explains why 3 < "3"?
PEP 207 apparently leaves a lot of freedom of how things can be implemented, but nevertheless I expect there might be some guidelines how things should be implemented to help interoperability.
While writing the question, the “similar questions” list made me aware of this post. It mentions NotImplemented as a return value (not exception) of rich comparisons. Searching for this keyword in the docs turned up relevant parts as well:
Python 2 data model:
NotImplemented – This type has a single value. There is a single object with this value. This object is accessed through the built-in name NotImplemented. Numeric methods and rich comparison methods may return this value if they do not implement the operation for the operands provided. (The interpreter will then try the reflected operation, or some other fallback, depending on the operator.) Its truth value is true.
And later on:
A rich comparison method may return the singleton NotImplemented if it does not implement the operation for a given pair of arguments. By convention, False and True are returned for a successful comparison. However, these methods can return any value, so if the comparison operator is used in a Boolean context (e.g., in the condition of an if statement), Python will call bool() on the value to determine if the result is true or false.
PEP 207:
If the function cannot compare the particular combination of objects, it should return a new reference to Py_NotImplemented.
This answers points 2. and 3. of my original question, at least for the rich comparison scenario. The docs for __cmp__ don't mention NotImplemented at all, which might be the reason why I missed that at first. So this is one more reason to switch to rich comparisons.
I'm still not sure whether returning that value is to be preferred to inventing an order. I guess a lot depends on what ideas the other object might have. And I fear that for a mix of types to achieve any kind of sane orderings, a lot of cooperation might be needed. But perhaps someone else can shed more light on the other parts of my question.