(kind of) Singleton Pattern in Python using Hash and `==` - python

I am looking into ways to use a quasi Singleton Pattern in python. Quick problem description:
I have objects that describe subset of a certain group. For simplicity assume integer numbers like in set([1,2,3]). In my case, comparison is difficult and if possible expensive, so I assume that if I have
complex_set1 = ...
complex_set2 = ...
these are different. Also, all set are immutable (like frozenset). Except there exists the full and empty sets for convenience
full_set = FullSet()
empty_set = EmptySet()
These seem to make sense to be singletons. One way would be to create one instance and just add it to the package on import. So that there exists only one and you cannot create another.
Now my idea:
Since I do not care if I have multiple objects as long as they are considered the same in any case (besides a is b is obviously false) I just make them look equal (like a singleton would). An example would be
len(set([FullSet(), FullSet()]))
>>> 1
So, I experimented with
def hash(self):
return 0 # make sure all have the same hash
def __eq__(self, other):
if is instance(other, FullSet):
return True
return NotImplemented
Does this have a name? Is it considered a singleton pattern or something else?
Should I use this or are there caveats to be aware of?
Any comments on the hash value which is also used for other purposes than comparison? Does it make more sense to use e.g. return hash(FullSet)

Related

Pythonic way to Implement Data Types (Python 2.7)

The majority of my programming experience has been with C++. Inspired by Bjarne Stroustrup's talk here, one of my favorite programming techniques is "type-rich" programming; the development of new robust data-types that will not only reduce the amount of code I have to write by wrapping functionality into the type (for example vector addition, instead of newVec.x = vec1.x + vec2.x; newVec.y = ... etc, we can just use newVec = vec1 + vec2) but will also reveal problems in your code at compile time through the strong type system.
A recent project I have undertaken in Python 2.7 requires integer values that have upper and lower bounds. My first instinct is to create a new data type (class) that will have all the same behavior as a normal number in python, but will always be within its (dynamic) boundary values.
class BoundInt:
def __init__(self, target = 0, low = 0, high = 1):
self.lowerLimit = low
self.upperLimit = high
self._value = target
self._balance()
def _balance(self):
if (self._value > self.upperLimit):
self._value = self.upperLimit
elif (self._value < self.lowerLimit):
self._value = self.lowerLimit
self._value = int(round(self._value))
def value(self):
self._balance()
return self._value
def set(self, target):
self._value = target
self._balance()
def __str__(self):
return str(self._value)
This is a good start, but it requires accessing the meat of these BoundInt types like so
x = BoundInt()
y = 4
x.set(y) #it would be nicer to do something like x = y
print y #prints "4"
print x #prints "1"
z = 2 + x.value() #again, it would be nicer to do z = 2 + x
print z #prints "3"
We can add a large number of python's "magic method" definitions to the class to add some more functionality:
def __add__(self, other):
return self._value + other
def __sub__(self, other):
return self._value - other
def __mul__(self, other):
return self._value * other
def __div__(self, other):
return self._value / other
def __pow__(self, power):
return self._value**power
def __radd__(self, other):
return self._value + other
#etc etc
Now the code is rapidly exploding in size, and there is a ton of repetition to what is being written, for very little return, this doesn't seem very pythonic at all.
Things get even more complicated when I start to want to construct BoundInt objects from normal python numbers (integers?), and other BoundInt objects
x = BoundInt()
y = BoundInt(x)
z = BoundInt(4)
Which, as far as I'm aware requires the use of rather large/ugly if/else type checking statements within the BoundInt() constructor, as python does not support (c style) overloading.
All of this feels terribly like trying to write c++ code in python, a cardinal sin if one of my favorite books, Code Complete 2, is taken seriously. I feel like I am swimming against the dynamic typing current, instead of letting it carry me forward.
I very much want to learn to code python 'pythonic-ally', what is the best way to approach this sort of problem domain? What are good resources to learn proper pythonic style?
There's plenty of code in the standard library, in popular PyPI modules, and in ActiveState recipes that does this kind of thing, so you're probably better off reading examples than trying to figure it out from first principles. Also, note that this is pretty similar to creating a list-like or dict-like class, which there are even more examples of.
However, there are some answers to what you want to do. I'll start with the most serious, then work backward.
Things get even more complicated when I start to want to construct BoundInt objects from normal python numbers (integers?), and other BoundInt objects
…
Which, as far as I'm aware requires the use of rather large/ugly if/else type checking statements within the BoundInt() constructor, as python does not support (c style) overloading.
Ah, but think about what you're doing: You're constructing a BoundInt from anything that can act like an integer, including, say, an actual int or a BoundInt, right? So, why not:
def __init__(self, target, low, high):
self.target, self.low, self.high = int(target), int(low), int(high)
I'm assuming you've already added a __int__ method to BoundInt, of course (the equivalent of a C++ explicit operator int() const).
Also, keep in mind that the lack of overloading isn't as serious as you're thinking coming from C++, because there is no "copy constructor" for making copies; you just pass the object around, and all that gets taken care of under the covers.
For example, imagine this C++ code:
BoundInt foo(BoundInt param) { BoundInt local = param; return local; }
BoundInt bar;
BoundInt baz = foo(bar);
This copies bar to param, param to local, local to an unnamed "return value" variable, and that to baz. Some of these will be optimized out, and others (in C++11) will use move instead of copy, but still, you've got 4 conceptual invocations of the copy/move constructors/assignment operators.
Now look at the Python equivalent:
def foo(param): local = param; return local
bar = BoundInt();
baz = foo(bar)
Here, we've just got one BoundInt instance—the one that was explicitly created—and all we're doing is binding new names to it. Even assigning baz as a member of a new object that outlives the scope of bar and baz won't make a copy. The only thing that makes a copy is explicitly calling BoundInt(baz) again. (This isn't quite 100% true, because someone can always inspect your object and attempt to clone it from the outside, and pickle, deepcopy, etc. may actually do so… but in that case, they're still not calling a "copy constructor" that you or the compiler wrote.)
Now, what about forwarding all those operators to the value?
Well, one possibility is to do it dynamically. The details depend on whether you're in Python 3 or 2 (and, for 2, how far back you need to support). But the idea is you just have a list of names, and for each one, you define a method with that name that calls the method of the same name on the value object. If you want a sketch of this, provide the extra info and ask, but you're probably better off looking for examples of dynamic method creation.
So, is that Pythonic? Well, it depends.
If you're creating dozens of "integer-like" classes, then yes, it's certainly better than copy-paste code or adding a "compile-time" generation step, and it's probably better than adding an otherwise-unnecessary base class.
And if you're trying to work across many versions of Python and don't want to have to remember "which version am I supposed to stop supplying __cmp__ to act like int again?" type questions, I might go even further and get the list of methods out of int itself (take dir(int()) and blacklist out a few names).
But if you're just doing this one class, in, say, just Python 2.6-2.7 or just 3.3+, I think it's a toss-up.
A good class to read is the fractions.Fraction class in the standard library. It's clearly-written pure Python code. And it partially demonstrates both the dynamic and explicit mechanisms (because it explicitly defines each special message in terms of generic dynamic forwarding functions), and if you've got both 2.x and 3.x around you can compare and contrast the two.
Meanwhile, it seems like your class is underspecified. If x is a BoundInt and y is an int, should x+y really return an int (as it does in your code)? If not, do you need to bound it? What about y+x? What should x+=y do? And so on.
Finally, in Python, it's often worth making "value classes" like this immutable, even if the intuitive C++ equivalent would be mutable. For example, consider this:
>>> i = BoundInt(3, 0, 10)
>>> j = i
>>> i.set(5)
>>> j
5
I don't think you'd expect this. This wouldn't happen in C++ (for a typical value class), because j = i would create a new copy, but in Python, it's just binding a new name to the same copy. (It's equivalent to BoundInt &j = i, not BoundInt j = i.)
If you want BoundInt to be immutable, besides eliminating obvious things like set, also make sure not to implement __iadd__ and friends. If you leave out __iadd__, i += 2 will be turned into i = i.__add__(2): in other words, it will create a new instance, then rebind i to that new instance, leaving the old one alone.
There are likely many opinions on this. But regarding the proliferation of of special methods, you will just have to do that to make it complete. But at least you only do that once, in one place. Also the built-in number types can be subclassed. That's what I did for a similar implementation, that you can look it.
Your set method is an abomination. You do not create a number with a default value of zero and then change the number into some other number. That is very much trying to program C++ in Python, and will cause you endless amounts of headaches if you actually want to treat these the same way you do numbers, because every time you pass them to functions they are passed by reference (like everything in Python). So you'll end up with large amounts of aliasing in things you think you can treat like numbers, and you will almost certainly encounter bugs due to mutating the value of numbers you don't realise are aliased, or expecting to be able to retrieve a value stored in a dictionary with a BoundInt as a key by providing another BoundInt with the same value.
To me, high and low aren't data values associated with a particular BoundInt value, they're type parameters. I want a number 7 in the type BoundInt(1, 10), not a number 7 which is constrained to be between 1 and 10, all of which being a value in the type BoundInt.
If I really wanted to do something like this, the approach I would take would be to subclass intis to treat BoundInt as a class factory; you give it a range, and it gives you the type of integers restricted to be in that range. You can apply that type to any "int-like" object and it will give you a value clamped to that range. Something like:
_bound_int_cache = {}
def BoundInt(low, low):
try:
return _bound_int_cache[(low, high)]
except KeyError:
class Tmp(int):
low = low
high = high
def __new__(cls, value):
value = max(value, cls.low)
value = min(value, cls.max)
return int.__new__(cls, value)
Tmp.__name__ = 'BoundInt({}, {})'.format(low, high)
_bound_int_cache[(low, high)] = Tmp
return _bound_int_cache[(low, high)]
(The cache is just to ensure that two different attempts to get the BoundInt type for the same low/high values give you the exact same class, not two different classes that behave the same way. Probably wouldn't matter in practice most of the time, but it seems nicer.)
You would use this like:
B = BoundInt(1, 10)
x = B(7)
The "class factory" approach means that if you have a small number of meaningful ranges in which you want to bound your integers, you can create the classes for those ranges globally (with meaningful names), and then use them exactly like regular classes.
Subclassing int makes these objects immutable (which is why the initialisation had to be done in __new__), which frees you from aliasing bugs (which people don't expect to have to worry about when they're programming with simple value types like numbers, and for good reason). It also gives you all the integer methods for free, and so these BoundInt types behave exactly as int, except that when you create one the value is clamped by the type. Unfortunately that means that all operations on these types return int objects, not BoundInt objects.
If you could come up with a way of reconciling the low/high values for the two different values involved in e.g. x + y, then you can override the special methods to make them return BoundInt values. The approaches that spring to mind are:
Take the left operand's bounds and ignore the right operand (seems messy and asymmetrical; violates the assumption that x + y = y + x)
Take the maximum low value and the minimum high value. It's nicely symmetrical, and you can treat numeric values that don't have low and high values as if they were sys.minint and sys.maxint (i.e. just use the bounds from the other value). Doesn't make a whole lot of sense if the ranges don't overlap at all, because you'll end up with an empty range, but operating on such numbers together probably doesn't make a whole lot of sense anyway.
Take the minimum low value and the maximum high value. Also symmetrical, but here you probably want to explicitly ignore normal numbers rather than pretending they're BoundInt values that can range over the whole integer range.
Any of the above could work, and any of the above will probably surprise you at some point (e.g. negating a number constrained to be in a positive range will always give you the smallest positive number in the range, which seems weird to me).
If you take this approach, you probably don't want to subclass int. Because if you have normalInt + boundedInt, then normalInt would handle the addition without respecting your code. You instead want it not to recognose boundedInt as an int value, so that int's __add__ wont' work and will give your class a chance to try __radd__. But I would still treat your class as "immutable", and make every operation that comes up with a new number construct a new object; mutating numbers in place is practically guaranteed to cause bugs sometime.
So I'd handle that approach something like this:
class BoundIntBase(object):
# Don't use this class directly; use a subclass that specifies low and high as
# class attributes.
def __init__(self, value):
self.value = min(self.high, max(self.low, int(value)))
def __int__(self):
return self.value
# add binary operations to BoundInt
for method in ['__add__', '__radd__', ...]:
def tmp(self, other):
try:
low = min(self.low, other.low)
high = max(self.high, other.high)
except AttributError:
cls = type(self)
else:
cls = BountInd(low, high)
v = getattr(int(self), method)(int(other))
return cls(v)
tmp.__name__ = method
setattr(BountIntBase, method, tmp)
_bound_int_cache = {}
def BoundInt(low, low):
try:
return _bound_int_cache[(low, high)]
except KeyError:
class Tmp(int):
low = low
high = high
def __new__(cls, value):
value = max(value, cls.low)
value = min(value, cls.max)
return int.__new__(cls, value)
Tmp.__name__ = 'BoundInt({}, {})'.format(low, high)
_bound_int_cache[(low, high)] = Tmp
return _bound_int_cache[(low, high)]
Still seems like more code than it should be, but what you're trying to do is actually more complicated than you think it is.
The type that behaves exactly like numbers in all situations needs
many special methods due to rich syntax support in Python (it seems no
other types require so much methods e.g., it is much simpler to define
types that behave like a list, dict in Python: a couple of methods and you have a Sequence). There are several
ways to make the code less repetitive.
ABC classes such as
numbers.Integral
provide default implementations for some methods e.g., if __add__,
__radd__ are implemented in a subclass then __sub__, __rsub__
are available automatically.
fractions.Fraction
uses
_operator_fallbacks
to define __r*__ and provide fallback operators to
deal with other numeric types:
__op__, __rop__ = _operator_fallbacks(monomorphic_operator, operator.op)
Python allows to generate/modify a class dynamically in a factory
function/metaclass e.g.,
Can anyone help condense this Python code?. Even
exec could be used in (very) rare cases e.g.,
namedtuple().
Numbers are immutable in Python so you should use __new__ instead of __init__.
Rare cases that are not covered by __new__ could be defined in
from_sometype(cls, d: sometype) -> your_type class methods. And in
reverse, cases that are not covered by special methods could use
as_sometype(self) -> sometype methods.
A simpler solution in your case might be to define a higher-level type
specific for your application domain. Number abstraction might be too
low-level e.g.,
decimal.Decimal is
more than 6 KLOC.

Avoiding Python sum default start arg behavior

I am working with a Python object that implements __add__, but does not subclass int. MyObj1 + MyObj2 works fine, but sum([MyObj1, MyObj2]) led to a TypeError, becausesum() first attempts 0 + MyObj. In order to use sum(), my object needs __radd__ to handle MyObj + 0 or I need to provide an empty object as the start parameter. The object in question is not designed to be empty.
Before anyone asks, the object is not list-like or string-like, so use of join() or itertools would not help.
Edit for details: the module has a SimpleLocation and a CompoundLocation. I'll abbreviate Location to Loc. A SimpleLoc contains one right-open interval, i.e. [start, end). Adding SimpleLoc yields a CompoundLoc, which contains a list of the intervals, e.g. [[3, 6), [10, 13)]. End uses include iterating through the union, e.g. [3, 4, 5, 10, 11, 12], checking length, and checking membership.
The numbers can be relatively large (say, smaller than 2^32 but commonly 2^20). The intervals probably won't be extremely long (100-2000, but could be longer). Currently, only the endpoints are stored. I am now tentatively thinking of attempting to subclass set such that the location is constructed as set(xrange(start, end)). However, adding sets will give Python (and mathematicians) fits.
Questions I've looked at:
python's sum() and non-integer values
why there's a start argument in python's built-in sum function
TypeError after overriding the __add__ method
I'm considering two solutions. One is to avoid sum() and use the loop offered in this comment. I don't understand why sum() begins by adding the 0th item of the iterable to 0 rather than adding the 0th and 1st items (like the loop in the linked comment); I hope there's an arcane integer optimization reason.
My other solution is as follows; while I don't like the hard-coded zero check, it's the only way I've been able to make sum() work.
# ...
def __radd__(self, other):
# This allows sum() to work (the default start value is zero)
if other == 0:
return self
return self.__add__(other)
In summary, is there another way to use sum() on objects that can neither be added to integers nor be empty?
Instead of sum, use:
import operator
from functools import reduce
reduce(operator.add, seq)
in Python 2 reduce was built-in so this looks like:
import operator
reduce(operator.add, seq)
Reduce is generally more flexible than sum - you can provide any binary function, not only add, and you can optionally provide an initial element while sum always uses one.
Also note: (Warning: maths rant ahead)
Providing support for add w/r/t objects that have no neutral element is a bit awkward from the algebraic points of view.
Note that all of:
naturals
reals
complex numbers
N-d vectors
NxM matrices
strings
together with addition form a Monoid - i.e. they are associative and have some kind of neutral element.
If your operation isn't associative and doesn't have a neutral element, then it doesn't "resemble" addition. Hence, don't expect it to work well with sum.
In such case, you might be better off with using a function or a method instead of an operator. This may be less confusing since the users of your class, seeing that it supports +, are likely to expect that it will behave in a monoidic way (as addition normally does).
Thanks for expanding, I'll refer to your particular module now:
There are 2 concepts here:
Simple locations,
Compound locations.
It indeed makes sense that simple locations could be added, but they don't form a monoid because their addition doesn't satisfy the basic property of closure - the sum of two SimpleLocs isn't a SimpleLoc. It's, generally, a CompoundLoc.
OTOH, CompoundLocs with addition looks like a monoid to me (a commutative monoid, while we're at it): A sum of those is a CompoundLoc too, and their addition is associative, commutative and the neutral element is an empty CompoundLoc that contains zero SimpleLocs.
If you agree with me (and the above matches your implementation), then you'll be able to use sum as following:
sum( [SimpleLoc1, SimpleLoc2, SimpleLoc3], start=ComplexLoc() )
Indeed, this appears to work.
I am now tentatively thinking of attempting to subclass set such that the location is constructed as set(xrange(start, end)). However, adding sets will give Python (and mathematicians) fits.
Well, locations are some sets of numbers, so it makes sense to throw a set-like interface on top of them (so __contains__, __iter__, __len__, perhaps __or__ as an alias of +, __and__ as the product, etc).
As for construction from xrange, do you really need it? If you know that you're storing sets of intervals, then you're likely to save space by sticking to your representation of [start, end) pairs. You could throw in an utility method that takes an arbitrary sequence of integers and translates it to an optimal SimpleLoc or CompoundLoc if you feel it's going to help.
I think that the best way to accomplish this is to provide the __radd__ method, or pass the start object to sum explicitly.
In case you really do not want to override __radd__ or provide a start object, how about redefining sum()?
>>> from __builtin__ import sum as builtin_sum
>>> def sum(iterable, startobj=MyCustomStartObject):
... return builtin_sum(iterable, startobj)
...
Preferably use a function with a name like my_sum(), but I guess that is one of the things you want to avoid (even though globally redefining builtin functions is probably something that a future maintainer will curse you for)
Actually, implementing __add__ without the concept of an "empty object" makes little sense. sum needs a start parameter to support the sums of empty and one-element sequences, and you have to decide what result you expect in these cases:
sum([o1, o2]) => o1 + o2 # obviously
sum([o1]) => o1 # But how should __add__ be called here? Not at all?
sum([]) => ? # What now?
You could use an object that's universally neutral wrt. addition:
class Neutral:
def __add__(self, other):
return other
print(sum("A BC D EFG".split(), Neutral())) # ABCDEFG
You could so something like:
from operator import add
try:
total = reduce(add, whatever) # or functools.reduce in Py3.x
except TypeError as e:
# I'm not 100% happy about branching on the exception text, but
# figure this msg isn't likely to be changed after so long...
if e.args[0] == 'reduce() of empty sequence with no initial value':
pass # do something appropriate here if necessary
else:
pass # Most likely that + isn't usable between objects...

Modifying the immutable class instance

I've got a somewhat complex data-type Mask that I'd like to be able to use fast identity checking for such as:
seen = set()
m = Mask(...)
if m in seen:
...
This suggests that Mask should be hashable and therefore immutable. However, I'd like to generate variants of m and Mask seems like the place to encapsulate the variation logic. Here is a minimalish example that demonstrates what I want to accomplish:
class Mask(object):
def __init__(self, seq):
self.seq = seq
self.hash = reduce(lambda x, y: x ^ y, self.seq)
# __hash__ and __cmp__ satisfy the hashable contract §3.4.1)
def __hash__(self):
return self.hash
def __cmp__(self, other):
return cmp(self.seq, other.seq)
def complement(self):
# cannot modify self without violating immutability requirement
# so return a new instance instead
return Mask([-x for x in self.seq])
This satisfies all of the hashable and immutable properties. The peculiarity is having what is effectively a Factory method complement; is this a reasonable way to implement the desired function? Note that I am not at all interested in protecting against "malicious" modification as many related questions on SO are looking to achieve.
As this example is intentionally small it could be trivially solved by making a tuple of seq. The type that I am actually working with does not afford such simple casting.
Yes, this is pretty much how you write an immutable class; methods that would otherwise change an object's state become, in your terms, "factories" that create new instances.
Note that in the specific case of computing a complement, you can name the method __invert__ and use it as
inv_mask = ~mask
Operator syntax is a strong signal to client code that operations return new values of the same type.
When using immutable types, all 'variation' methods must return new objects/instances, and thus be factories in that sense.
Many languages make strings immutable (including Python). So all string operations return new strings; strB = strA.replace(...) etc.
If you could change the instance at all, it wouldn't be immutable.
Edit:
Rereading, you're asking if this is reasonable, and I say yes. The logic would not be better put somewhere else, and as I pointed out with string immutability it is a common paradigm to get new variations from existing ones.
Python doesn't inforce immutability. It is up to you to make sure that objects are not modified while they are in a set. Otherwise you don't need immutability to use sets, dictionaries in Python.

Python - Function attributes or mutable default values

Say you have a function that needs to maintain some sort of state and behave differently depending on that state. I am aware of two ways to implement this where the state is stored entirely by the function:
Using a function attribute
Using a mutable default value
Using a slightly modified version of Felix Klings answer to another question, here is an example function that can be used in re.sub() so that only the third match to a regex will be replaced:
Function attribute:
def replace(match):
replace.c = getattr(replace, "c", 0) + 1
return repl if replace.c == 3 else match.group(0)
Mutable default value:
def replace(match, c=[0]):
c[0] += 1
return repl if c[0] == 3 else match.group(0)
To me the first seems cleaner, but I have seen the second more commonly. Which is preferable and why?
I use closure instead, no side effects.
Here is the example (I've just modified the original example of Felix Klings answer):
def replaceNthWith(n, replacement):
c = [0]
def replace(match):
c[0] += 1
return replacement if c[0] == n else match.group(0)
return replace
And the usage:
# reset state (in our case count, c=0) for each string manipulation
re.sub(pattern, replaceNthWith(n, replacement), str1)
re.sub(pattern, replaceNthWith(n, replacement), str2)
#or persist state between calls
replace = replaceNthWith(n, replacement)
re.sub(pattern, replace, str1)
re.sub(pattern, replace, str2)
For mutable what should happen if somebody call replace(match, c=[])?
For attribute you broke encapsulation (yes i know that python didn't implemented in classes from diff reasons ...)
Both ways feel strange to me. The first though is much better. But when you think about it this way: "Something with a state that can do operations with that state and additional input", it really sounds like a normal object. And when something sounds like an object, it should be an object...
SO, my solution would be to use a simple object with a __call__ method:
class StatefulReplace(object):
def __init__(self, initial_c=0):
self.c = initial_c
def __call__(self, match):
self.c += 1
return repl if self.c == 3 else match.group(0)
And then you can write in the global space or your module init:
replace = StatefulReplace(0)
How about:
Use a class
Use a global variable
True, these are not stored entirely within the function. I would probably use a class:
class Replacer(object):
c = 0
#staticmethod # if you like
def replace(match):
replace.c += 1
...
To answer your actual question, use getattr. It's a very clear and readable way to store data away for later. It should be pretty obvious to someone reading it what you're trying to do.
The mutable default argument version is an example of a common programming error (assuming you'll get a new list every time). For that reason alone I would avoid it. Someone reading it later might decide that it's a good idea without fully understanding the consequences. And even in this case, it seems though your function would only work once (your c value is never reset to zero).
To me, both of these approaches look dodgy. The problem is crying out for a class instance. We don't normally think about functions as maintaining state between calls; that's what classes are for.
That said, I've used function attributes before for this sort of thing. Particularly if it's a one-shot function defined within other code (i.e. not possible for it to be used from anywhere else), just tacking on attributes to it is more concise than defining a whole new class and creating an instance of it.
I would never abuse default values for this. There's a large barrier to understanding because the natural purpose of default values is to provide default values of arguments, not to maintain state between calls. Plus a default argument invites you to supply a non-default value, and you typically get very strange behaviour if you do that with a function that is abusing default values to maintain state.

Does it make sense to check for identity in __eq__?

When implementing a custom equality function for a class, does it make sense to check for identity first? An example:
def __eq__(self, other):
return (self is other) or (other criteria)
This interesting is for cases when the other criteria may be more expensive (e.g. comparing some long strings).
It may be a perfectly reasonable shortcut to check for identity first, and in equality methods good shortcuts (for both equality and non equality) are what you should be looking for so that you can return as soon as possible.
But, on the other hand, it could also be a completely superfluous check if your test for equality is otherwise cheap and you are unlikely in practice to be comparing an object with itself.
For example, if equality between objects can be gauged by comparing one or two integers then this should be quicker than the identity test, so in less than the time it would take to compare ids you've got the whole answer. And remember that if you check the identities and the objects don't have the same id (which is likely in most scenarios) then you've not gained anything as you've still got to do the full check.
So if full equality checking is not cheap and it's possible that an object could be compared against itself, then checking identity first can be a good idea.
Note that another reason the check isn't done by default is that it is quite reasonable (though rare) for objects with equal identities to compare as non equal, for example:
>>> s = float('nan')
>>> s == s
False
necesarry: no
does it make sense: sure, why not?
No such check is done by default, as you can see here:
class bad(object):
def __eq__(self, other):
return False
x = bad()
print x is x, x==x # True, False
When you implement custom equality in a class, you can decide for yourself whether to check for identify first. It's entirely up to you. Note that in Python, it's also perfectly valid to decide that __eq__ and __ne__ will return the same value for a given argument; so it's possible to define equality such that identity isn't a shortcut.
It's certainly a speed improvement, although how much of one depends on the complexity of the method. I generally don't bother in my custom classes, but I don't have a lot of speed-critical code (and where I do, object comparisons aren't the hotspot).
For most of my objects, the equality method looks like:
def __eq__(self, o):
try:
return self.x == o.x and self.y == o.y
except AttributeError:
return False
I could easily add a if self is o: return True check at the beginning of the method.
Also remember to override __hash__ if you override __eq__, or you'll get odd behaviors in sets and dicts.
I asked a similar question on comp.lang.python a few years ago - here is the thread. The conclusions at that time were that the up-front identity test was worth it if you did many tests for equality of objects with themselves, or if your other equality testing logic was slow.
This is only done for performance reasons.
At one programming job I worked on, in Java, this was always done, altough it does not change any functionality.

Categories

Resources