How does Python guarantees all objects in a set are unique? [duplicate]

How does Python guarantees all objects in a set are unique? [duplicate] - python

This question already has an answer here:
add object into python's set collection and determine by object's attribute
(1 answer)
Closed 6 years ago.
I'm using set() and __hash__ method of python class to prevent adding same hash object in set. According to python data-model document, set() consider same hash object as same object and just add them once.
But it behaves different as below:
class MyClass(object):
def __hash__(self):
return 0
result = set()
result.add(MyClass())
result.add(MyClass())
print(len(result)) # len = 2
While in case of string value, it works correctly.
result.add('aida')
result.add('aida')
print(len(result)) # len = 1
My question is: why the same hash objects are not same in set?

Your reading is incorrect. The __eq__ method is used for equality checks. The documents just state that the __hash__ value must also be the same for 2 objects a and b for which a == b (i.e. a.__eq__(b)) is true.
This is a common logic mistake: a == b being true implies that hash(a) == hash(b) is also true. However, an implication does not necessarily mean equivalence, that in addition to the prior, hash(a) == hash(b) would mean that a == b.
To make all instances of MyClass compare equal to each other, you need to provide an __eq__ method for them; otherwise Python will compare their identities instead. This might do:
class MyClass(object):
def __hash__(self):
return 0
def __eq__(self, other):
# another object is equal to self, iff
# it is an instance of MyClass
return isinstance(other, MyClass)
Now:
>>> result = set()
>>> result.add(MyClass())
>>> result.add(MyClass())
1
In reality you'd base the __hash__ on those properties of your object that are used for __eq__ comparison, for example:
class Person
def __init__(self, name, ssn):
self.name = name
self.ssn = ssn
def __eq__(self, other):
return isinstance(other, Person) and self.ssn == other.ssn
def __hash__(self):
# use the hashcode of self.ssn since that is used
# for equality checks as well
return hash(self.ssn)
p = Person('Foo Bar', 123456789)
q = Person('Fake Name', 123456789)
print(len({p, q}) # 1

Sets need two methods to make an object hashable: __hash__ and __eq__. Two instances must return the same hash value when they are considered equal. An instance is considered already present in a set if both the hash is present in the set and the instance is considered equal to one of the instances with that same hash in the set.
Your class doesn't implement __eq__, so the default object.__eq__ is used instead, which only returns true if obj1 is obj2 is also true. In other words, two instances are only considered equal if they are the exact same instance.
Just because their hashes match, doesn't make them unique as far as a set is concerned; even objects with different hashes can end up in the same hash table slot, as the modulus of the hash against the table size is used.
Add your a custom __eq__ method that returns True when two instances are supposed to be equal:
def __eq__(self, other):
if not isinstance(other, type(self)):
return False
# all instances of this class are considered equal to one another
return True

Related

Python: Accessing dict with hashable object fails

I am using a hashable object as a key to a dictionary. The objects are hashable and I can store key-value-pairs in the dict, but when I create a copy of the same object (that gives me the same hash), I get a KeyError.
Here is some small example code:
class Object:
def __init__(self, x): self.x = x
def __hash__(self): return hash(self.x)
o1 = Object(1.)
o2 = Object(1.)
hash(o1) == hash(o2) # This is True
data = {}
data[o1] = 2.
data[o2] # Desired: This should output 2.
In my scenario above, how can I achieve that data[o2] also returns 2.?

You need to implement both __hash__ and __eq__:
class Object:
def __init__(self, x): self.x = x
def __hash__(self): return hash(self.x)
def __eq__(self, other): return self.x == other.x if isinstance(other, self.__class__) else NotImplemented
Per Python documentation:
if a class does not define an __eq__() method it should not define a __hash__() operation either
After finding the hash, Python's dictionary compares the keys using __eq__ and realize they're different, that's why you're not getting the correct output.

You can use the __eq__ magic method to implement a equality check on your object.
def __eq__(self, other):
if (isinstance(other, C)):
return self.x == self.x
You can learn more about magic methods from this link.

So as stated before your object need to implement __ eq__ trait (equality ==), If you want to understand why:
Sometimes hash of different object are the same, this is called collision.
Dictionary manages that by testing if the objects are equals. If they are not dictionary has to manage the collision. How they do that Is implementation details and can vary a lot. A dummy implementation would be list of tuple key value.
Under the hood, a dummy implementation may look like that :
dico[key] = [(object1, value), (object2, value)]

How does one make Python objects hashable when there is nothing to distinguish them?

Let's say I want to use a set() to store a bunch of objects whose only distinction is that they exist and are not other instances of the same class. Otherwise, they are not distinguishable, e.g., no def __eq__(self, other): return self.qux == other.qux, because that qux is the same (or random) for all of them. How do you define an __eq__ and __hash__ function for that class?

You don't need to implement either __eq__ or __hash__.
User-defined classes have __eq__() and __hash__() methods by
default; with them, all objects compare unequal (except with
themselves) and x.__hash__() returns an appropriate value such that
x == y implies both that x is y and hash(x) == hash(y).
Source: Data model
The default is something like:
class OnlyExists:
def __eq__(self, other):
return False
def __hash__(self):
return id(self)
Because it's unequal to everything, instances can only be found by identity. Giving a minimal hash implementation (i.e. not just returning the same hash value for every instance) means that the instances don't all end up in the same "bucket", which would be a catastrophic collision and mean all dictionary/set searches fall to O(n).
>>> class OnlyExists:
... pass
...
>>> a = OnlyExists()
>>> b = OnlyExists()
>>> s = {a, b}
>>> len(s)
2
>>> a in s
True
>>> b in s
True
>>> OnlyExists() in s
False

Overriding eq and hash to compare a dict attribute of two instances

I'm struggling to understand how to correctly compare objects based on an underlying dict attribute that each instance possesses.
Since I'm overriding __eq__, do I need to override __hash__ as well? I haven't a firm grasp on when/where to do so and could really use some help.
I created a simple example below to illustrate the maximum recursion exception that I've run into. A RegionalCustomerCollection organizes account IDs by geographical region. RegionalCustomerCollection objects are said to be equal if the regions and their respective accountids are. Essentially, all items() should be equal in content.
from collections import defaultdict
class RegionalCustomerCollection(object):
def __init__(self):
self.region_accountids = defaultdict(set)
def get_region_accountid(self, region_name=None):
return self.region_accountids.get(region_name, None)
def set_region_accountid(self, region_name, accountid):
self.region_accountids[region_name].add(accountid)
def __eq__(self, other):
if (other == self):
return True
if isinstance(other, RegionalCustomerCollection):
return self.region_accountids == other.region_accountids
return False
def __repr__(self):
return ', '.join(["{0}: {1}".format(region, acctids)
for region, acctids
in self.region_accountids.items()])
Let's create two object instances and populate them with some sample data:
>>> a = RegionalCustomerCollection()
>>> b = RegionalCustomerCollection()
>>> a.set_region_accountid('northeast',1)
>>> a.set_region_accountid('northeast',2)
>>> a.set_region_accountid('northeast',3)
>>> a.set_region_accountid('southwest',4)
>>> a.set_region_accountid('southwest',5)
>>> b.set_region_accountid('northeast',1)
>>> b.set_region_accountid('northeast',2)
>>> b.set_region_accountid('northeast',3)
>>> b.set_region_accountid('southwest',4)
>>> b.set_region_accountid('southwest',5)
Now let's try to compare the two instances and generate the recursion exception:
>>> a == b
...
RuntimeError: maximum recursion depth exceeded while calling a Python object

Your object shouldn't return a hash because it's mutable. If you put this object into a dictionary or set and then change it afterward, you may never be able to find it again.
In order to make an object unhashable, you need to do the following:
class MyClass(object):
__hash__ = None
This will ensure that the object is unhashable.
[in] >>> m = MyClass()
[in] >>> hash(m)
[out] >>> TypeError: unhashable type 'MyClass'
Does this answer your question? I'm suspecting not because you were explicitly looking for a hash function.
As far as the RuntimeError you're receiving, it's because of the following line:
if self == other:
return True
That gets you into an infinite recursion loop. Try the following instead:
if self is other:
return True

You don't need to override __hash__ to compare two objects (you'll need to if you want custom hashing, i.e. to improve performance when inserting into sets or dictionaries).
Also, you have infinite recursion here:
def __eq__(self, other):
if (other == self):
return True
if isinstance(other, RegionalCustomerCollection):
return self.region_accountids == other.region_accountids
return False
If both objects are of type RegionalCustomerCollection then you'll have infinite recursion since == calls __eq__.

Most efficient way of comparing the contents of two class instances in python

I'm looking for the most efficient way of comparing the contents of two class instances. I have a list containing these class instances, and before appending to the list I want to determine if their property values are the same. This may seem trivial to most, but after perusing these forums I wasn't able specific to what I'm trying to do. Also note that I don't have an programming background.
This is what I have so far:
class BaseObject(object):
def __init__(self, name=''):
self._name = name
def __repr__(self):
return '<{0}: \'{1}\'>'.format(self.__class__.__name__, self.name)
def _compare(self, other, *attributes):
count = 0
if isinstance(other, self.__class__):
if len(attributes):
for attrib in attributes:
if (attrib in self.__dict__.keys()) and (attrib in other.__dict__.keys()):
if self.__dict__[attrib] == other.__dict__[attrib]:
count += 1
return (count == len(attributes))
else:
for attrib in self.__dict__.keys():
if (attrib in self.__dict__.keys()) and (attrib in other.__dict__.keys()):
if self.__dict__[attrib] == other.__dict__[attrib]:
count += 1
return (count == len(self.__dict__.keys()))
def _copy(self):
return (copy.deepcopy(self))
Before adding to my list, I'd do something like:
found = False
for instance in myList:
if instance._compare(newInstance):
found = True
Break
if not found: myList.append(newInstance)
However I'm unclear whether this is the most efficient or python-ic way of comparing the contents of instances of the same class.

Implement a __eq__ special method instead:
def __eq__(self, other, *attributes):
if not isinstance(other, type(self)):
return NotImplemented
if attributes:
d = float('NaN') # default that won't compare equal, even with itself
return all(self.__dict__.get(a, d) == other.__dict__.get(a, d) for a in attributes)
return self.__dict__ == other.__dict__
Now you can just use:
if newInstance in myList:
and Python will automatically use the __eq__ special method to test for equality.
In my version I retained the ability to pass in a limited set of attributes:
instance1.__eq__(instance2, 'attribute1', 'attribute2')
but using all() to make sure we only test as much as is needed.
Note that we return NotImplemented, a special singleton object to signal that the comparison is not supported; Python will ask the other object if it perhaps supports equality testing instead for that case.

You can implement the comparison magic method __eq__(self, other) for your class, then simply do
if instance == newInstance:
As you apparently don't know what attributes your instance will have, you could do:
def __eq__(self, other):
return isinstance(other, type(self)) and self.__dict__ == other.__dict__

Your method has one major flaw: if you have reference cycles with classes that both derive from BaseObject, your comparison will never finish and die with a stack overflow.
In addition, two objects of different classes but with the same attribute values compare as equal. Trivial example: any instance of BaseObject with no attributes will compare as equal to any instance of a BaseObject subclass with no attributes (because if issubclass(C, B) and a is an instance of C, then isinstance(a, B) returns True).
Finally, rather than writing a custom _compare method, just call it __eq__ and reap all the benefits of now being able to use the == operator (including contain testing in lists, container comparisons, etc.).
As a matter of personal preference, though, I'd stay away from that sort-of automatically-generated comparison, and explicitly compare explicit attributes.

How to make a subclass of tuple hashable in Python?

I have a class which is a subclass of tuple. I want to use instances of that class as elements of a set, but I get the error that it is an unhashable type. I guess this is because I've overridden the __eq__ and __ne__ methods. What should I do to restore my type's hashability? I'm using Python 3.2.

objects that compare equal should have the same hash value
So it's a good idea to base the hash on the properties you are using to compare equality
Adrien's example would be better like this
class test(tuple):
def __eq__(self,comp):
return self[0] == comp[0]
def __ne__(self,comp):
return self[0] != comp[0]
def __hash__(self):
return hash((self[0],))
Simply leverage the hash of the tuple containing the stuff we care about for equality

you will need to majke your type hashable, which means implementing the __hash__() member function in your class deriving from tuple.
for example:
class test(tuple):
def __eq__(self,comp):
return self[0] == comp[0]
def __ne__(self,comp):
return self[0] != comp[0]
def __hash__(self):
return hash(self[0])
and this is what it looks like now:
>>> set([test([1,]),test([2,]),test([3,])])
{(1,), (2,), (3,)}
>>> hash(test([1,]))
1
note: you should absolutely read the documentation for the __hash__() function, in order to understand the relationship between the comparison operators and the hash computation.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How does Python guarantees all objects in a set are unique? [duplicate] - python

Related

Python: Accessing dict with hashable object fails

How does one make Python objects hashable when there is nothing to distinguish them?

Overriding eq and hash to compare a dict attribute of two instances

Most efficient way of comparing the contents of two class instances in python

How to make a subclass of tuple hashable in Python?

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How does Python guarantees all objects in a set are unique? [duplicate] - python

Related

Python: Accessing dict with hashable object fails

How does one make Python objects hashable when there is nothing to distinguish them?

Overriding __eq__ and __hash__ to compare a dict attribute of two instances

Most efficient way of comparing the contents of two class instances in python

How to make a subclass of tuple hashable in Python?

Categories

Resources

Overriding eq and hash to compare a dict attribute of two instances