Comparison of objects of distinct types

Comparison of objects of distinct types - python

What's the suggested Python semantics for ordering of objects of distinct types? In other words, what behavior should one implement when a custom-written comparison method (using rich comparisons like e.g. __lt__, but perhaps also when using the ‘poor’ comparison __cmp__ in Python 2) encounters an object of a different type than self?
Should one invent an order, e.g. “all unexpected objects compare as less than my own type”?
Should one throw a TypeError?
Is there some easy way to let the other object have a try, i.e. if one does foo < bar and foo.__lt__ doesn't know about bar's type, can it fall back to bar.__gt__?
Are there any guidelines at all about how to achieve sane ordering of objects of distinct types, preferrably a total order but perhaps less?
Is there any part in the documentation which explains why 3 < "3"?
PEP 207 apparently leaves a lot of freedom of how things can be implemented, but nevertheless I expect there might be some guidelines how things should be implemented to help interoperability.

While writing the question, the “similar questions” list made me aware of this post. It mentions NotImplemented as a return value (not exception) of rich comparisons. Searching for this keyword in the docs turned up relevant parts as well:
Python 2 data model:
NotImplemented – This type has a single value. There is a single object with this value. This object is accessed through the built-in name NotImplemented. Numeric methods and rich comparison methods may return this value if they do not implement the operation for the operands provided. (The interpreter will then try the reflected operation, or some other fallback, depending on the operator.) Its truth value is true.
And later on:
A rich comparison method may return the singleton NotImplemented if it does not implement the operation for a given pair of arguments. By convention, False and True are returned for a successful comparison. However, these methods can return any value, so if the comparison operator is used in a Boolean context (e.g., in the condition of an if statement), Python will call bool() on the value to determine if the result is true or false.
PEP 207:
If the function cannot compare the particular combination of objects, it should return a new reference to Py_NotImplemented.
This answers points 2. and 3. of my original question, at least for the rich comparison scenario. The docs for __cmp__ don't mention NotImplemented at all, which might be the reason why I missed that at first. So this is one more reason to switch to rich comparisons.
I'm still not sure whether returning that value is to be preferred to inventing an order. I guess a lot depends on what ideas the other object might have. And I fear that for a mix of types to achieve any kind of sane orderings, a lot of cooperation might be needed. But perhaps someone else can shed more light on the other parts of my question.

Related

Is this the proper way to test if an enum is a specific value? [duplicate]

PEP 8 Programming Recommendations says:
Comparisons to singletons like None should always be done with is or is not, never the equality operators.
According to the docs, enum members are singletons. Does that mean they should also be compared by identity?
class Color(Enum):
RED = 1
GREEN = 2
BLUE = 3
# like this?
if color is Color.RED:
...
# or like this
if color == Color.RED:
...
When using equality operators, I haven't noticed any issues with this to warrant such strong wording as PEP 8. What's the drawback of using equality, if any? Doesn't it just fall back to an identity-based comparison anyway? Is this just a micro-optimisation?

From https://docs.python.org/3/library/enum.html#module-enum:
Within an enumeration, the members can be compared by identity
In particular, from https://docs.python.org/3/howto/enum.html#comparisons :
Enumeration members are compared by identity

First, we can definitely rule out x.value is y.value, because those aren't singletons, those are perfectly ordinary values that you've stored in attributes.
But what about x is y?
First, I believe that, by "singletons like None", PEP 8 is specifically referring to the small, fixed set of built-in singletons that are like None in some important way. What important way? Why do you want to compare None with is?
Readability: if foo is None: reads like what it means. On the rare occasions when you want to distinguish True from other truthy values, if spam is True: reads better than if spam == True:, as well as making it more obvious that this isn't just a frivolous == True used by someone improperly following a C++ coding standard in Python. That might apply in foo is Potato.spud, but not so much in x is y.
Use as a sentinel: None is used to mean "value missing" or "search failed" or similar cases. It shouldn't be used in cases where None itself can be a value, of course. And if someone creates a class whose instances compare equal to None, it's possible to run into that problem without realizing it. is None protects against that. This is even more of a problem with True and False (again, on those rare occasions when you want to distinguish them), since 1 == True and 0 == False. This reason doesn't seem to apply here—if 1 == Potato.spud, that's only because you intentionally chose to use an IntEnum instead of an Enum, in which case that's exactly what you want…
(Quasi-)keyword status: None and friends have gradually migrated from perfectly normal builtin to keyword over the years. Not only is the default value of the symbol None always going to be the singleton, the only possible value is that singleton. This means that an optimizer, static linter, etc. can make an assumption about what None means in your code, in a way that it can't for anything defined at runtime. Again, this reason doesn't seem to apply.
Performance: This really is not a consideration at all. It might be faster, on some implementations, to compare with is than with ==, but this is incredibly unlikely to ever make a difference in real code (or, if it does, that real code probably needs a higher-level optimization, like turning a list into a set…).
So, what's the conclusion?
Well, it's hard to get away from an opinion here, but I think it's reasonable to say that:
if devo is Potato.spud: is reasonable if it makes things more readable, but as long as you're consistent within a code base, I don't think anyone will complain either way.
if x is y:, even when they're both known to be Potato objects, is not reasonable.
if x.value is Potato.spud.value is not reasonable.

PEP 8 says:
Comparisons to singletons like None should always be done with is or is not, never the equality operators.
I disagree with abarnert: the reason for this isn't because these are built-in or special in any way. It's because in these cases you care about having the object, not something that looks like it.
When using is None, for example, you care about whether it's the None that you put there, not a different None that had been passed in. This might be difficult in practice (there is only one None, after all) but it does sometimes matter.
Take, for example:
no_argument = object()
def foo(x=no_argument):
if x OP no_argument:
...
...
If OP is is, this is perfectly idiomatic code. If it's ==, it's not.
For the same reason, you should make the decision as so:
If you want equality to duck type, such as with IntEnum or an Enum that you might want to subclass and overwrite (such as when you have complex enum types with methods and other extras) it makes sense to use ==.
When you are using enums as dumb sentinels, use is.

Where is the default behavior for object equality (`==`) defined?

According to the object.__eq__() documentation, the default (that is, in the object class) implementation for == is as follows:
True if x is y else NotImplemented
Still following the documentation for NotImplemented, I inferred that NotImplemented implies that the Python runtime will try the comparison the other way around. That is try y.__eq__(x) if x.__eq__(y) returns NotImplemented (in the case of the == operator).
Now, the following code prints False and True in python 3.9:
class A:
pass
print(A() == A())
print(bool(NotImplemented))
So my question is the following: where does the documentation mention the special behavior of NotImplemented in the context of __eq__ ?
PS : I found an answer in CPython source code but I guess that this must/should be somewhere in the documentation.

According to the object.__eq__() documentation, the default (that is, in the object class) implementation for == is as follows
No; that is the default implementation of __eq__. ==, being an operator, cannot be implemented in classes.
Python's implementation of operators is cooperative. There is hard-coded logic that uses the dunder methods to figure out what should happen, and possibly falls back on a default. This logic is outside of any class.
You can see another example with the built-in len: a class can return whatever it likes from its __len__ method, and you can in principle call it directly and get a value of any type. However, this does not properly implement the protocol, and len will complain when it doesn't get a positive integer back. There is not any class which contains that type-checking and value-checking logic. It is external.
Still following the documentation for NotImplemented, I inferred that NotImplemented implies that the Python runtime will try the comparison the other way around. That is try y.__eq__(x) if x.__eq__(y) returns NotImplemented (in the case of the == operator).
NotImplemented is just an object. It is not syntax. It does not have any special behavior, and in Python, simply returning a value does not trigger special behavior besides that the value is returned.
The external code for binary operators will try to look for the matching __op__, and try to look for the matching __rop__ if __op__ didn't work. At this point, NotImplemented is not an acceptable answer (it is a sentinel that exists specifically for this purpose, because None is an acceptable answer). In general, if the answer so far is still NotImplemented, then the external code will raise NotImplementedError.
As a special case, objects that don't provide their own comparison (i.e., the default from object is used for __eq__ or __ne__) will compare as "not equal" unless they are identical. The C implementation repeats the identity check (in case a class explicitly defines __eq__ or __ne__ to return NotImplemented directly, I guess). This is because it is considered sensible to give this result, and obnoxious to make == fail all the time when there is a sensible default.
However, the two objects are still not orderable without explicit logic, since there isn't a reasonable default. (You could compare the pointer values, but they're arbitrary and don't have anything to do with the Python logic that got you to that point; so ordering things that way isn't realistically useful for writing Python code.) So, for example, x < y will raise a TypeError if the comparison logic isn't provided. (It does this even if x is y; you could reasonably say that <= and >= should be true in this case, and < and > should be false, but it makes things too complicated and is not very useful.)
[Observation: print(bool(NotImplemented)) prints True]
Well, yes; NotImplemented is an object, so it's truthy by default; and it doesn't represent a numeric value, and isn't a container, so there's no reason for it to be falsy.
However, that also doesn't tell us anything useful. We don't care about the truthiness of NotImplemented here, and it isn't used that way in the Python implementation. It is just a sentinel value.
where does the documentation mention the special behavior of NotImplemented in the context of __eq__ ?
Nowhere, because it isn't a behavior of NotImplemented, as explained above.
Okay, but that leaves underlying question: where does the documentation explain what the == operator does by default?
Answer: because we are talking about an operator, and not about a method, it's not in the section about dunder methods. It's in section 6, which talks about expressions. Specifically, 6.10.1. Value comparisons:
The default behavior for equality comparison (== and !=) is based on the identity of the objects. Hence, equality comparison of instances with the same identity results in equality, and equality comparison of instances with different identities results in inequality. A motivation for this default behavior is the desire that all objects should be reflexive (i.e. x is y implies x == y).

Comparing if datetime.datetime exists or None

I'm running a small app on Google App Engine with Python.
In the model I have a property of type DateTimeProperty, which is datetime.datetime. When it's created there is no value (i.e. "None").
I want compare if that datetime.datetime is None, but I can't.
if object.updated_date is None or object.updated_date >= past:
object.updated_date = now
Both updated_date and past is datetime.datetime.
I get the following error.
TypeError: can't compare datetime.datetime to NoneType
What is the correct way to do this?

Given that the previous discussion seems to have established that either of the variables could be None, one approach would be (assuming you want to set object.updated_date when either of the variables is None):
if None in (past, object.updated_date) or object.updated_date >= past:
object.updated_date = now
point being that the check None in (past, object.updated_date) may be handier than the semantically equivalent alternative (past is None or object.update_date is None) (arguably an epsilon more readable thanks to its better compactness, but it is, of course, an arguable matter of style).
As an aside, and a less-arguable matter of style;-), I strongly recommend against using built-ins' names as names for your own variables (and functions, etc) -- object is such a built-in name which in this context is clearly being used for your own purposes. Using obj instead is more concise, still readable (arguably more so;-), and has no downside. You're unlikely to be "bitten" in any given case by the iffy practice of "shadowing" built-ins' names with your own, but eventually it will happen (as you happen to need the normal meaning of the shadowed name during some later ordinary maintenance operation) and you may be in for a confusing debugging situation then; meanwhile, you risk confusing other readers / maintainers... and are getting absolutely no advantage in return for these disadvantages.
I realize that many of Python's built-ins' names are an "attractive nuisance" in this sense... file, object, list, dict, set, min, max... all apparently attractive name for "a file", "an object", "a list`, etc. But, it's worthwhile to learn to resist this particular temptation!-)

Perhaps you mean:
if object.updated_date and object.updated_date >= past:
If it's truthy (which implies not null), we check that it's also >= past. This uses short-circuit evaluation, which means the second condition isn't checked if the first is falsy.

You want and, not or.
You may also want to use is None.
EDIT:
Since you've determined that object.updated_date isn't None, this only other possibility is that past is None.

Why does Python use 'magic methods'?

I'm a bit surprised by Python's extensive use of 'magic methods'.
For example, in order for a class to declare that instances have a "length", it implements a __len__ method, which it is called when you write len(obj). Why not just define a len method which is called directly as a member of the object, e.g. obj.len()?
See also: Why does Python code use len() function instead of a length method?

AFAIK, len is special in this respect and has historical roots.
Here's a quote from the FAQ:
Why does Python use methods for some
functionality (e.g. list.index()) but
functions for other (e.g. len(list))?
The major reason is history. Functions
were used for those operations that
were generic for a group of types and
which were intended to work even for
objects that didn’t have methods at
all (e.g. tuples). It is also
convenient to have a function that can
readily be applied to an amorphous
collection of objects when you use the
functional features of Python (map(),
apply() et al).
In fact, implementing len(), max(),
min() as a built-in function is
actually less code than implementing
them as methods for each type. One can
quibble about individual cases but
it’s a part of Python, and it’s too
late to make such fundamental changes
now. The functions have to remain to
avoid massive code breakage.
The other "magical methods" (actually called special method in the Python folklore) make lots of sense, and similar functionality exists in other languages. They're mostly used for code that gets called implicitly when special syntax is used.
For example:
overloaded operators (exist in C++ and others)
constructor/destructor
hooks for accessing attributes
tools for metaprogramming
and so on...

From the Zen of Python:
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
This is one of the reasons - with custom methods, developers would be free to choose a different method name, like getLength(), length(), getlength() or whatsoever. Python enforces strict naming so that the common function len() can be used.
All operations that are common for many types of objects are put into magic methods, like __nonzero__, __len__ or __repr__. They are mostly optional, though.
Operator overloading is also done with magic methods (e.g. __le__), so it makes sense to use them for other common operations, too.

Python uses the word "magic methods", because those methods really performs magic for you program. One of the biggest advantages of using Python's magic methods is that they provide a simple way to make objects behave like built-in types. That means you can avoid ugly, counter-intuitive, and nonstandard ways of performing basic operators.
Consider a following example:
dict1 = {1 : "ABC"}
dict2 = {2 : "EFG"}
dict1 + dict2
Traceback (most recent call last):
File "python", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'dict' and 'dict'
This gives an error, because the dictionary type doesn't support addition. Now, let's extend dictionary class and add "__add__" magic method:
class AddableDict(dict):
def __add__(self, otherObj):
self.update(otherObj)
return AddableDict(self)
dict1 = AddableDict({1 : "ABC"})
dict2 = AddableDict({2 : "EFG"})
print (dict1 + dict2)
Now, it gives following output.
{1: 'ABC', 2: 'EFG'}
Thus, by adding this method, suddenly magic has happened and the error you were getting earlier, has gone away.
I hope, it makes things clear to you. For more information, refer to:
A Guide to Python's Magic Methods (Rafe Kettler, 2012)

Some of these functions do more than a single method would be able to implement (without abstract methods on a superclass). For instance bool() acts kind of like this:
def bool(obj):
if hasattr(obj, '__nonzero__'):
return bool(obj.__nonzero__())
elif hasattr(obj, '__len__'):
if obj.__len__():
return True
else:
return False
return True
You can also be 100% sure that bool() will always return True or False; if you relied on a method you couldn't be entirely sure what you'd get back.
Some other functions that have relatively complicated implementations (more complicated than the underlying magic methods are likely to be) are iter() and cmp(), and all the attribute methods (getattr, setattr and delattr). Things like int also access magic methods when doing coercion (you can implement __int__), but do double duty as types. len(obj) is actually the one case where I don't believe it's ever different from obj.__len__().

They are not really "magic names". It's just the interface an object has to implement to provide a given service. In this sense, they are not more magic than any predefined interface definition you have to reimplement.

While the reason is mostly historic, there are some peculiarities in Python's len that make the use of a function instead of a method appropriate.
Some operations in Python are implemented as methods, for example list.index and dict.append, while others are implemented as callables and magic methods, for example str and iter and reversed. The two groups differ enough so the different approach is justified:
They are common.
str, int and friends are types. It makes more sense to call the constructor.
The implementation differs from the function call. For example, iter might call __getitem__ if __iter__ isn't available, and supports additional arguments that don't fit in a method call. For the same reason it.next() has been changed to next(it) in recent versions of Python - it makes more sense.
Some of these are close relatives of operators. There's syntax for calling __iter__ and __next__ - it's called the for loop. For consistency, a function is better. And it makes it better for certain optimisations.
Some of the functions are simply way too similar to the rest in some way - repr acts like str does. Having str(x) versus x.repr() would be confusing.
Some of them rarely use the actual implementation method, for example isinstance.
Some of them are actual operators, getattr(x, 'a') is another way of doing x.a and getattr shares many of the aforementioned qualities.
I personally call the first group method-like and the second group operator-like. It's not a very good distinction, but I hope it helps somehow.
Having said this, len doesn't exactly fit in the second group. It's more close to the operations in the first one, with the only difference that it's way more common than almost any of them. But the only thing that it does is calling __len__, and it's very close to L.index. However, there are some differences. For example, __len__ might be called for the implementation of other features, such as bool, if the method was called len you might break bool(x) with custom len method that does completely different thing.
In short, you have a set of very common features that classes might implement that might be accessed through an operator, through a special function (that usually does more than the implementation, as an operator would), during object construction, and all of them share some common traits. All the rest is a method. And len is somewhat of an exception to that rule.

There is not a lot to add to the above two posts, but all the "magic" functions are not really magic at all. They are part of the __ builtins__ module which is implicitly/automatically imported when the interpreter starts. I.e.:
from __builtins__ import *
happens every time before your program starts.
I always thought it would be more correct if Python only did this for the interactive shell, and required scripts to import the various parts from builtins they needed. Also probably different __ main__ handling would be nice in shells vs interactive. Anyway, check out all the functions, and see what it is like without them:
dir (__builtins__)
...
del __builtins__

Perhaps, you have noticed it is possible to use certain built-in methods (ex. len(my_list_or_my_string)), and syntaxes (ex. my_list_or_my_string[:3], my_fancy_dict['some_key']) on some native types such as list, dict. Maybe you have been curious as to why it is not possible (yet) to use these same syntaxes on some of the classes you have written.
Variables of native types (list, dict, int, str) have unique behaviours and respond to certain syntaxes because they have some special methods defined in their respective classes — these methods are called Magic Methods.
A few magic methods include: __len__, __gt__, __eq__, etc.
Read more here: https://tomisin.dev/blog/supercharging-python-classes-with-magic-methods

The advantages of having static function like len(), max(), and min() over inherited method calls

i am a python newbie, and i am not sure why python implemented len(obj), max(obj), and min(obj) as a static like functions (i am from the java language) over obj.len(), obj.max(), and obj.min()
what are the advantages and disadvantages (other than obvious inconsistency) of having len()... over the method calls?
why guido chose this over the method calls? (this could have been solved in python3 if needed, but it wasn't changed in python3, so there gotta be good reasons...i hope)
thanks!!

The big advantage is that built-in functions (and operators) can apply extra logic when appropriate, beyond simply calling the special methods. For example, min can look at several arguments and apply the appropriate inequality checks, or it can accept a single iterable argument and proceed similarly; abs when called on an object without a special method __abs__ could try comparing said object with 0 and using the object change sign method if needed (though it currently doesn't); and so forth.
So, for consistency, all operations with wide applicability must always go through built-ins and/or operators, and it's those built-ins responsibility to look up and apply the appropriate special methods (on one or more of the arguments), use alternate logic where applicable, and so forth.
An example where this principle wasn't correctly applied (but the inconsistency was fixed in Python 3) is "step an iterator forward": in 2.5 and earlier, you needed to define and call the non-specially-named next method on the iterator. In 2.6 and later you can do it the right way: the iterator object defines __next__, the new next built-in can call it and apply extra logic, for example to supply a default value (in 2.6 you can still do it the bad old way, for backwards compatibility, though in 3.* you can't any more).
Another example: consider the expression x + y. In a traditional object-oriented language (able to dispatch only on the type of the leftmost argument -- like Python, Ruby, Java, C++, C#, &c) if x is of some built-in type and y is of your own fancy new type, you're sadly out of luck if the language insists on delegating all the logic to the method of type(x) that implements addition (assuming the language allows operator overloading;-).
In Python, the + operator (and similarly of course the builtin operator.add, if that's what you prefer) tries x's type's __add__, and if that one doesn't know what to do with y, then tries y's type's __radd__. So you can define your types that know how to add themselves to integers, floats, complex, etc etc, as well as ones that know how to add such built-in numeric types to themselves (i.e., you can code it so that x + y and y + x both work fine, when y is an instance of your fancy new type and x is an instance of some builtin numeric type).
"Generic functions" (as in PEAK) are a more elegant approach (allowing any overriding based on a combination of types, never with the crazy monomaniac focus on the leftmost arguments that OOP encourages!-), but (a) they were unfortunately not accepted for Python 3, and (b) they do of course require the generic function to be expressed as free-standing (it would be absolutely crazy to have to consider the function as "belonging" to any single type, where the whole POINT is that can be differently overridden/overloaded based on arbitrary combination of its several arguments' types!-). Anybody who's ever programmed in Common Lisp, Dylan, or PEAK, knows what I'm talking about;-).
So, free-standing functions and operators are just THE right, consistent way to go (even though the lack of generic functions, in bare-bones Python, does remove some fraction of the inherent elegance, it's still a reasonable mix of elegance and practicality!-).

It emphasizes the capabilities of an object, not its methods or type. Capabilites are declared by "helper" functions such as __iter__ and __len__ but they don't make up the interface. The interface is in the builtin functions, and beside this also in the buit-in operators like + and [] for indexing and slicing.
Sometimes, it is not a one-to-one correspondance: For example, iter(obj) returns an iterator for an object, and will work even if __iter__ is not defined. If not defined, it goes on to look if the object defines __getitem__ and will return an iterator accessing the object index-wise (like an array).
This goes together with Python's Duck Typing, we care only about what we can do with an object, not that it is of a particular type.

Actually, those aren't "static" methods in the way you are thinking about them. They are built-in functions that really just alias to certain methods on python objects that implement them.
>>> class Foo(object):
... def __len__(self):
... return 42
...
>>> f = Foo()
>>> len(f)
42
These are always available to be called whether or not the object implements them or not. The point is to have some consistency. Instead of some class having a method called length() and another called size(), the convention is to implement len and let the callers always access it by the more readable len(obj) instead of obj.methodThatDoesSomethingCommon

I thought the reason was so these basic operations could be done on iterators with the same interface as containers. However, it actually doesn't work with len:
def foo():
for i in range(10):
yield i
print len(foo())
... fails with TypeError. len() won't consume and count an iterator; it only works with objects that have a __len__ call.
So, as far as I'm concerned, len() shouldn't exist. It's much more natural to say obj.len than len(obj), and much more consistent with the rest of the language and the standard library. We don't say append(lst, 1); we say lst.append(1). Having a separate global method for length is an odd, inconsistent special case, and eats a very obvious name in the global namespace, which is a very bad habit of Python.
This is unrelated to duck typing; you can say getattr(obj, "len") to decide whether you can use len on an object just as easily--and much more consistently--than you can use getattr(obj, "__len__").
All that said, as language warts go--for those who consider this a wart--this is a very easy one to live with.
On the other hand, min and max do work on iterators, which gives them a use apart from any particular object. This is straightforward, so I'll just give an example:
import random
def foo():
for i in range(10):
yield random.randint(0, 100)
print max(foo())
However, there are no __min__ or __max__ methods to override its behavior, so there's no consistent way to provide efficient searching for sorted containers. If a container is sorted on the same key that you're searching, min/max are O(1) operations instead of O(n), and the only way to expose that is by a different, inconsistent method. (This could be fixed in the language relatively easily, of course.)
To follow up with another issue with this: it prevents use of Python's method binding. As a simple, contrived example, you can do this to supply a function to add values to a list:
def add(f):
f(1)
f(2)
f(3)
lst = []
add(lst.append)
print lst
and this works on all member functions. You can't do that with min, max or len, though, since they're not methods of the object they operate on. Instead, you have to resort to functools.partial, a clumsy second-class workaround common in other languages.
Of course, this is an uncommon case; but it's the uncommon cases that tell us about a language's consistency.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.