I am using a hashable object as a key to a dictionary. The objects are hashable and I can store key-value-pairs in the dict, but when I create a copy of the same object (that gives me the same hash), I get a KeyError.
Here is some small example code:
class Object:
def __init__(self, x): self.x = x
def __hash__(self): return hash(self.x)
o1 = Object(1.)
o2 = Object(1.)
hash(o1) == hash(o2) # This is True
data = {}
data[o1] = 2.
data[o2] # Desired: This should output 2.
In my scenario above, how can I achieve that data[o2] also returns 2.?
You need to implement both __hash__ and __eq__:
class Object:
def __init__(self, x): self.x = x
def __hash__(self): return hash(self.x)
def __eq__(self, other): return self.x == other.x if isinstance(other, self.__class__) else NotImplemented
Per Python documentation:
if a class does not define an __eq__() method it should not define a __hash__() operation either
After finding the hash, Python's dictionary compares the keys using __eq__ and realize they're different, that's why you're not getting the correct output.
You can use the __eq__ magic method to implement a equality check on your object.
def __eq__(self, other):
if (isinstance(other, C)):
return self.x == self.x
You can learn more about magic methods from this link.
So as stated before your object need to implement __ eq__ trait (equality ==), If you want to understand why:
Sometimes hash of different object are the same, this is called collision.
Dictionary manages that by testing if the objects are equals. If they are not dictionary has to manage the collision. How they do that Is implementation details and can vary a lot. A dummy implementation would be list of tuple key value.
Under the hood, a dummy implementation may look like that :
dico[key] = [(object1, value), (object2, value)]
Related
This question already has an answer here:
add object into python's set collection and determine by object's attribute
(1 answer)
Closed 6 years ago.
I'm using set() and __hash__ method of python class to prevent adding same hash object in set. According to python data-model document, set() consider same hash object as same object and just add them once.
But it behaves different as below:
class MyClass(object):
def __hash__(self):
return 0
result = set()
result.add(MyClass())
result.add(MyClass())
print(len(result)) # len = 2
While in case of string value, it works correctly.
result.add('aida')
result.add('aida')
print(len(result)) # len = 1
My question is: why the same hash objects are not same in set?
Your reading is incorrect. The __eq__ method is used for equality checks. The documents just state that the __hash__ value must also be the same for 2 objects a and b for which a == b (i.e. a.__eq__(b)) is true.
This is a common logic mistake: a == b being true implies that hash(a) == hash(b) is also true. However, an implication does not necessarily mean equivalence, that in addition to the prior, hash(a) == hash(b) would mean that a == b.
To make all instances of MyClass compare equal to each other, you need to provide an __eq__ method for them; otherwise Python will compare their identities instead. This might do:
class MyClass(object):
def __hash__(self):
return 0
def __eq__(self, other):
# another object is equal to self, iff
# it is an instance of MyClass
return isinstance(other, MyClass)
Now:
>>> result = set()
>>> result.add(MyClass())
>>> result.add(MyClass())
1
In reality you'd base the __hash__ on those properties of your object that are used for __eq__ comparison, for example:
class Person
def __init__(self, name, ssn):
self.name = name
self.ssn = ssn
def __eq__(self, other):
return isinstance(other, Person) and self.ssn == other.ssn
def __hash__(self):
# use the hashcode of self.ssn since that is used
# for equality checks as well
return hash(self.ssn)
p = Person('Foo Bar', 123456789)
q = Person('Fake Name', 123456789)
print(len({p, q}) # 1
Sets need two methods to make an object hashable: __hash__ and __eq__. Two instances must return the same hash value when they are considered equal. An instance is considered already present in a set if both the hash is present in the set and the instance is considered equal to one of the instances with that same hash in the set.
Your class doesn't implement __eq__, so the default object.__eq__ is used instead, which only returns true if obj1 is obj2 is also true. In other words, two instances are only considered equal if they are the exact same instance.
Just because their hashes match, doesn't make them unique as far as a set is concerned; even objects with different hashes can end up in the same hash table slot, as the modulus of the hash against the table size is used.
Add your a custom __eq__ method that returns True when two instances are supposed to be equal:
def __eq__(self, other):
if not isinstance(other, type(self)):
return False
# all instances of this class are considered equal to one another
return True
I'm coming to Python from Racket. In Racket, I would define a Point structure like this:
(struct Point (x y) #:transparent)
A point is now a structure with two fields named x and y. I can compare two structures for (deep) equality by calling equal?.
What is the equivalent in Python? It looks to me like I have to write twelve lines:
class Point():
def __init__(self,x,y):
self.x = x;
self.y = y;
def __eq__(self, other):
return ((type(other) is Point)
and self.x == other.x
and self.y == other.y)
def __ne__(self, other):
return not(self == other)
... but surely there's an easier way?
Yes, well, if you need an entire class to represent your data type, then you will have to rely on the __eq__ and related dunder methods. However, in this particular case, a Pythonista would use a namedtuple:
from collections import namedtuple
Point = namedtuple('Point', ['x','y'])
Which will inherit all that from tuple.
If you don't need mutability, the simplest way to make basic classes of this sort is collections.namedtuple:
import collections
Point = collections.namedtuple('Point', 'x y')
That's it. You can just make Point objects with pt = Point(1, 2) or the like, and they work like two-tuples, but they also let you access them via named attributes, e.g. pt.x, pt.y.
The equality checking will be a little looser (Point(1, 2) == (1, 2) evaluates to True, because all namedtuples are subclasses of tuple and will compare using tuple rules, and in fact, different subclasses of tuple that don't override the comparison methods will compare equal to each other if they have the same values), but given that tuples are typically used as anonymous lightweight "classes", this is often what you want.
If you need to customize some behavior (adding functionality, or make the type comparisons stricter), you can make a custom class inherit from a namedtuple to get the basic features for free, then customize the bits you care about, e.g., to prevent it testing equal to non-Point types, you can do:
class Point(collections.namedtuple('PointBase', 'x y')):
def __eq__(self, other):
if not isinstance(other, Point):
return False
return super().__eq__(other)
# Sadly, tuple defines __ne__, so you must override it too to behave properly
# You don't need the canonical __ne__ implementation that handles NotImplemented
# though, since you're explicitly unfriendly to non-Point types
def __ne__(self, other): return not (self == other)
I'm looking for the most efficient way of comparing the contents of two class instances. I have a list containing these class instances, and before appending to the list I want to determine if their property values are the same. This may seem trivial to most, but after perusing these forums I wasn't able specific to what I'm trying to do. Also note that I don't have an programming background.
This is what I have so far:
class BaseObject(object):
def __init__(self, name=''):
self._name = name
def __repr__(self):
return '<{0}: \'{1}\'>'.format(self.__class__.__name__, self.name)
def _compare(self, other, *attributes):
count = 0
if isinstance(other, self.__class__):
if len(attributes):
for attrib in attributes:
if (attrib in self.__dict__.keys()) and (attrib in other.__dict__.keys()):
if self.__dict__[attrib] == other.__dict__[attrib]:
count += 1
return (count == len(attributes))
else:
for attrib in self.__dict__.keys():
if (attrib in self.__dict__.keys()) and (attrib in other.__dict__.keys()):
if self.__dict__[attrib] == other.__dict__[attrib]:
count += 1
return (count == len(self.__dict__.keys()))
def _copy(self):
return (copy.deepcopy(self))
Before adding to my list, I'd do something like:
found = False
for instance in myList:
if instance._compare(newInstance):
found = True
Break
if not found: myList.append(newInstance)
However I'm unclear whether this is the most efficient or python-ic way of comparing the contents of instances of the same class.
Implement a __eq__ special method instead:
def __eq__(self, other, *attributes):
if not isinstance(other, type(self)):
return NotImplemented
if attributes:
d = float('NaN') # default that won't compare equal, even with itself
return all(self.__dict__.get(a, d) == other.__dict__.get(a, d) for a in attributes)
return self.__dict__ == other.__dict__
Now you can just use:
if newInstance in myList:
and Python will automatically use the __eq__ special method to test for equality.
In my version I retained the ability to pass in a limited set of attributes:
instance1.__eq__(instance2, 'attribute1', 'attribute2')
but using all() to make sure we only test as much as is needed.
Note that we return NotImplemented, a special singleton object to signal that the comparison is not supported; Python will ask the other object if it perhaps supports equality testing instead for that case.
You can implement the comparison magic method __eq__(self, other) for your class, then simply do
if instance == newInstance:
As you apparently don't know what attributes your instance will have, you could do:
def __eq__(self, other):
return isinstance(other, type(self)) and self.__dict__ == other.__dict__
Your method has one major flaw: if you have reference cycles with classes that both derive from BaseObject, your comparison will never finish and die with a stack overflow.
In addition, two objects of different classes but with the same attribute values compare as equal. Trivial example: any instance of BaseObject with no attributes will compare as equal to any instance of a BaseObject subclass with no attributes (because if issubclass(C, B) and a is an instance of C, then isinstance(a, B) returns True).
Finally, rather than writing a custom _compare method, just call it __eq__ and reap all the benefits of now being able to use the == operator (including contain testing in lists, container comparisons, etc.).
As a matter of personal preference, though, I'd stay away from that sort-of automatically-generated comparison, and explicitly compare explicit attributes.
What is the difference between using a special method and just defining a normal class method? I was reading this site which lists a lot of them.
For example it gives a class like this.
class Word(str):
'''Class for words, defining comparison based on word length.'''
def __new__(cls, word):
# Note that we have to use __new__. This is because str is an immutable
# type, so we have to initialize it early (at creation)
if ' ' in word:
print "Value contains spaces. Truncating to first space."
word = word[:word.index(' ')] # Word is now all chars before first space
return str.__new__(cls, word)
def __gt__(self, other):
return len(self) > len(other)
def __lt__(self, other):
return len(self) < len(other)
def __ge__(self, other):
return len(self) >= len(other)
def __le__(self, other):
return len(self) <= len(other)
For each of those special methods why can't I just make a normal method instead, what are they doing different? I think I just need a fundamental explanation that I can't find, thanks.
It is a pythonic way to do this:
word1 = Word('first')
word2 = Word('second')
if word1 > word2:
pass
instead of direct usage of comparator method
NotMagicWord(str):
def is_greater(self, other)
return len(self) > len(other)
word1 = NotMagicWord('first')
word2 = NotMagicWord('second')
if word1.is_greater(word2):
pass
And the same with all other magic method. You define __len__ method to tell python its length using built-in len function, for example. All magic method will be called implicitly while standard operations like binary operators, object calling, comparision and a lot of other. A Guide to Python's Magic Methods is really good, read it and see what behavior you can give to your objects. It similar to operator overloading in C++, if you are familiar with it.
A method like __gt__ is called when you use comparison operators in your code. Writing something like
value1 > value2
Is the equivalent of writing
value1.__gt__(value2)
"Magic methods" are used by Python to implement a lot of its underlying structure.
For example, let's say I have a simple class to represent an (x, y) coordinate pair:
class Point(object):
def __init__(self, x, y):
self.x = x
self.y = y
So, __init__ would be an example of one of these "magic methods" -- it allows me to automatically initialize the class by simply doing Point(3, 2). I could write this without using magic methods by creating my own "init" function, but then I would need to make an explicit method call to initialize my class:
class Point(object):
def init(self, x, y):
self.x = x
self.y = y
return self
p = Point().init(x, y)
Let's take another example -- if I wanted to compare two point variables, I could do:
class Point(object):
def __init__(self, x, y):
self.x = x
self.y = y
def __eq__(self, other):
return self.x == other.x and self.y == other.y
This lets me compare two points by doing p1 == p2. In contrast, if I made this a normal eq method, I would have to be more explicit by doing p1.eq(p2).
Basically, magic methods are Python's way of implementing a lot of its syntactic sugar in a way that allows it to be easily customizable by programmers.
For example, I could construct a class that pretends to be a function by implementing __call__:
class Foobar(object):
def __init__(self, a):
self.a = a
def __call__(self, b):
return a + b
f = Foobar(3)
print f(4) # returns 7
Without the magic method, I would have to manually do f.call(4), which means I can no longer pretend the object is a function.
Special methods are handled specially by the rest of the Python language. For example, if you try to compare two Word instances with <, the __lt__ method of Word will be called to determine the result.
The magic methods are called when you use <, ==, > to compare the objects. functools has a helper called total_ordering that will fill in the missing comparison methods if you just define __eq__ and __gt__.
Because str already has all the comparison operations defined, it's necessary to add them as a mixin if you want to take advantage of total_ordering
from functools import total_ordering
#total_ordering
class OrderByLen(object):
def __eq__(self, other):
return len(self) == len(other)
def __gt__(self, other):
return len(self) > len(other)
class Word(OrderByLen, str):
'''Class for words, defining comparison based on word length.'''
def __new__(cls, word):
# Note that we have to use __new__. This is because str is an immutable
# type, so we have to initialize it early (at creation)
if ' ' in word:
print "Value contains spaces. Truncating to first space."
word = word[:word.index(' ')] # Word is now all chars before first space
return str.__new__(cls, word)
print Word('cat') < Word('dog') # False
print Word('cat') > Word('dog') # False
print Word('cat') == Word('dog') # True
print Word('cat') <= Word('elephant') # True
print Word('cat') >= Word('elephant') # False
I have a class which is a subclass of tuple. I want to use instances of that class as elements of a set, but I get the error that it is an unhashable type. I guess this is because I've overridden the __eq__ and __ne__ methods. What should I do to restore my type's hashability? I'm using Python 3.2.
objects that compare equal should have the same hash value
So it's a good idea to base the hash on the properties you are using to compare equality
Adrien's example would be better like this
class test(tuple):
def __eq__(self,comp):
return self[0] == comp[0]
def __ne__(self,comp):
return self[0] != comp[0]
def __hash__(self):
return hash((self[0],))
Simply leverage the hash of the tuple containing the stuff we care about for equality
you will need to majke your type hashable, which means implementing the __hash__() member function in your class deriving from tuple.
for example:
class test(tuple):
def __eq__(self,comp):
return self[0] == comp[0]
def __ne__(self,comp):
return self[0] != comp[0]
def __hash__(self):
return hash(self[0])
and this is what it looks like now:
>>> set([test([1,]),test([2,]),test([3,])])
{(1,), (2,), (3,)}
>>> hash(test([1,]))
1
note: you should absolutely read the documentation for the __hash__() function, in order to understand the relationship between the comparison operators and the hash computation.