How can I make a dataclass hash the same as a string? - python

I want to replace string keys in dictionaries in my code with a dataclass so that I can provide meta data to the keys for debugging. However, I still want to be able to use a string to lookup dictionaries. I tried implementing a data-class with a replaced __hash__ function, however my code is not working as expected:
from dataclasses import dataclass
#dataclass(eq=True, frozen=True)
class Key:
name: str
def __hash__(self):
return hash(self.name)
k = "foo"
foo = Key(name=k)
d = {}
d[foo] = 1
print(d[k]) # Key Error
The two hash functions are the same:
print(hash(k) == hash(foo)) # True
So I don't understand why this doesn't work.

Two objects having different hashes guarantees that they're different, but two objects having the same hash doesn't in itself guarantee that they're the same (because hash collisions exist). If you want the Key to be considered equal to a corresponding str, implement that in __eq__:
def __eq__(self, other):
if isinstance(other, Key):
return self.name == other.name
if isinstance(other, str):
return self.name == other
return False
This fixes the KeyError you're encountering.

Adding my notes here from the comments on the answer above, as no one looks at those in any case, so those are likely to get swept under the rug at some point.
PyCharm also produces a helpful warning:
'eq' is ignored if the class already defines '__eq__' method.
I think this means to remove the eq=True usage as well, from the #dataclass(...) decorator.
technically, you could also remove the last if isinstance(..., str): as well as the last return statement. I'm not entirely sure what would be the implications of that, however.
Here then, is a slightly more optimized approach (timings with timeit module below):
class Key:
name: str
def __hash__(self):
return hash(self.name)
def __eq__(self, other):
return self.name == getattr(other, 'name', other)
Timings with timeit
from dataclasses import dataclass
from timeit import timeit
#dataclass(frozen=True)
class Key:
name: str
def __hash__(self):
return hash(self.name)
def __eq__(self, other):
if isinstance(other, Key):
return self.name == other.name
if isinstance(other, str):
return self.name == other
return False
class KeyTwo(Key):
def __eq__(self, other):
return self.name == getattr(other, 'name', other)
k = "foo"
foo = Key(name=k)
foo_two = KeyTwo(name=k)
print('__eq__() Timings --')
print('isinstance(): ', timeit("foo == k", globals=globals()))
print('getattr(): ', timeit("foo_two == k", globals=globals()))
assert foo == foo_two == k
Results on my M1 Mac:
__eq__() Timings --
isinstance(): 0.10553250007797033
getattr(): 0.08371329202782363

Related

return NotImplemented but when printed, I get False

Here is a minimal reproducible example:
class Attribut:
def __init__(
self,
name: str,
other_name: str,
):
self.name: str = name
self.other_name: str = other_name
def __eq__(self, other):
if isinstance(other, Attribut):
return self.name == other.name and self.other_name == other.other_name
else:
return NotImplemented
def __hash__(self):
return 0
If I try to do:
a = Attribut("lol", "a")
print(a==4)
I thought I would get NotImplemented, but instead I get False.
EDIT: (following chepner's answer)
Comparing one object from one class to an object to another class instead of comparing it to an integer:
class Attribut:
def __init__(
self,
name: str,
other_name: str,
):
self.name: str = name
self.other_name: str = other_name
def __eq__(self, other):
if isinstance(other, Attribut):
return self.name == other.name and self.other_name == other.other_name
else:
return NotImplemented
def __hash__(self):
return 0
class Attribut2:
def __init__(
self,
name: str,
other_name: str,
):
self.name: str = name
self.other_name: str = other_name
def __eq__(self, other):
if isinstance(other, Attribut2):
return self.name == other.name and self.other_name == other.other_name
else:
return NotImplemented
def __hash__(self):
return 1
a = Attribut("lol", "a")
b = Attribut2("lol", "b")
print(a==b)
I also get False.
Also, what is the point of overriding __hash__, I cannot find a situation where this is useful?
When a.__eq__(4) returns NotImplemented, you don't get the value back immediately. Instead, Python attempts to call the reflected version of __eq__ (__eq__ itself) with the other argument, namely (4).__eq__(a). It's this call that returns False.
By returning NotImplemented, you are not saying that self and other cannot be compared for equality (for that, raise a ValueError), but rather that Attribut.__eq__ does not know how to do so, but perhaps the other argument's __eq__ method does.
In your two-class example, we have the following:
Attribut.__eq__ returns NotImplemented, so Attribut2.__eq__ (since self and other have different types) is tried.
Attribut2.__eq__ returns NotImplemented, so what gets tried next?
Since calling Attribute.__eq__ again would just put us in an endless cycle, I think that Python falls back to object.__eq__ instead (which can compare any two objects via object identity), but this is not obvious from the descriptions of either NotImplemented or object.__eq__ itself. (This is supported by the fact that if you define both methods as
def __eq__(self, other):
return NotImplemented
and attempt to evaluate a == a, it evaluates to True.)
The closest thing to documentation for this that I can find is the following sentence from the description of NotImplemented:
(The interpreter will then try the reflected operation, or some other fallback, depending on the operator.)
I suspect that whether or not A.__eq__ is considered the reflection of B.__eq__ depends on the context in which B.__eq__ is called. That is, A.__eq__ is not the reflection of B.__eq__ in the case where B.__eq__ was just called as the reflection of A.__eq__.
Overriding hash method allows you to use other methods that would involve sorting a list of your Attribut objects.
To test it create a script in which you would create such a list
a = [Attribut("lol", "a"), Attribut("lol", "a"), Attribut("lol", "a")]
and then try to make a set out of it
set(a)
If the method hash is implemented in Attribut class then there would be no problem. If not then the program will return an error TypeError: unhashable type: 'Attribut'
However, hash is only being used in Python hash-table and it's sole purpose is to make a number out of a class object.
Refer to these articles for more information:
How does hash-table in set works in python?
What does "hashable" mean in Python?
Also, there can be more than one object that would have the same hash - the comparison goes deeper and checks each fields of a class in methods that use comparison between two instances (like set). To make it possible there must be implemented the first step - main class must be hashable.
The hash should also be equal for the objects that should be equal. When comparison appears, Python first checks if two objects have the same hash, if not then the comparison results in False, no matter if two objects should be 'equal'.
You can try this code and check this behaviour. Comment/uncomment function body in hash function to be same or different hash:
from random import random
class Attribut:
def __init__(
self,
name: str,
other_name: str,
):
self.name: str = name
self.other_name: str = other_name
def __eq__(self, other):
if isinstance(other, Attribut):
return self.name == other.name and self.other_name == other.other_name
else:
return NotImplemented
def __hash__(self):
# Same hash
# return 0
# Different hash
return hash(random())
c = [Attribut("lol", "a"), Attribut("lol", "a")]
print(set(c))

Equality in dunder method

I'd like to compare two objects of the same type with the dunder method _eq_ for equality. Every object stores values for "word", "pronunciation", "weight", and "source" and equality is reached, when everything is the same.
My solution looks like the following and works but it feels clunky and I am sure that there is a better way.
def __eq__(self, other):
if self.check_other(other): # checks of both objects are snstances of LexicalEntity
return_bool = True
if self.word != other.get_word():
return_bool = False
if self.weight != other.get_weight():
return_bool = False
if self.source != other.get_source():
return_bool = False
if self.pron != other.get_pron():
return_bool = False
return return_bool
Thanks for your help.
For starters, dispense with getters and setters in Python. That will make your code much less clunky and more idiomatic, i.e., you don't need other.get_word(), you just need other.word, and remove your definition of get_word, it is useless. Python != Java.
So, then for something like this, a typical implementation would be:
def __eq__(self, other):
if isinstance(other, LexicalEntity):
these_values = self.word, self.weight, self.source, self.pron
other_values = other.word, other.weight, other.source, other.pron
return these_values == other_values
return NotImplemented # important, you don't want to return None
Alternatively, you might also just use one long boolean expression:
def __eq__(self, other):
if isinstance(other, LexicalEntity):
return (
self.word == other.word and self.weight == other.weight
and self.source == other.source and self.pron == other.pron
)
return NotImplemented
I think this maybe is little more readable:
def __eq__(self, other):
if self.check_other(other):
attrs = ["word", "weight", "source", "pron"]
return all([getattr(self, attr) == getattr(other, attr) for attr for attrs])
But I guess it's a preference if we want more readable or more smart solution
Getters and setters don't make much sense in Python, you should start using the #property annotation instead, if you do have important validations - if you're just doing this for data encapsulation, Python principles are much more loose in that aspect, so just ditch getters/setters.
As for asserting equality, if you want to avoid manually referring to each attribute, the below reflection is appliable to virtually any case:
def __eq__(self, other):
if isinstance(other, self.__class__):
attrs = [
a for a in dir(self) if not a.startswith('_') and not callable(getattr(self, a))
]
return all([getattr(self, attr) == getattr(other, attr) for attr in attrs])
return NotImplemented
As #juanpa.arrivillaga already mentioned, returning the NotImplemented (not the same as raising NotImplementedError, as noted in the comments below) is important because if other is from a different class this stops you from returning None in the equality check. A better explanation of why return NotImplemented is the fallback in these cases is found in this answer.

Implementing hierarchy for Enum members

I would like to establish a hierarchy for the members of my Enum. My (simplified) enum aims at representing different types of food. Of course, everyone knows a burger is "superior" to a pizza and my enum needs to convey this idea:
from functools import total_ordering
from enum import IntEnum, unique
#unique
#total_ordering
class FoodType(IntEnum):
PIZZA = 100
COOKIE = 200
STEAK = 300
BURGER = 400
def __lt__(self, other):
if self.__class__ is other.__class__:
return self.FOOD_HIERARCHY.index(self) < self.FOOD_HIERARCHY.index(other)
return NotImplemented
def __gt__(self, other):
if self.__class__ is other.__class__:
return self.FOOD_HIERARCHY.index(self) > self.FOOD_HIERARCHY.index(other)
return NotImplemented
def __eq__(self, other):
if self.__class__ is other.__class__:
return self.FOOD_HIERARCHY.index(self) == self.FOOD_HIERARCHY.index(other)
return NotImplemented
# Order is important here; smallest entity first
FoodType.FOOD_HIERARCHY = [
FoodType.COOKIE,
FoodType.STEAK,
FoodType.PIZZA,
FoodType.BURGER,
]
Here my food types are arbitrary integers. They need to be integers for reasons outside of the scope of this question. I also can't use the integer values for comparison, nor the order of definition of the food types. That is why I create the hierarchy of FoodType outside the enums, and make it an attribute of the Enum after the definition.
I would like to use the positions of the food types (aka indexes) to implement the comparison methods.
However when I run a simple comparison on two of the FoodType mentioned above, I get a recursion error:
In [2]: from test import FoodType
In [3]: FoodType.PIZZA < FoodType.BURGER
---------------------------------------------------------------------------
RecursionError Traceback (most recent call last)
<ipython-input-3-1880a19bb0cd> in <module>
----> 1 FoodType.PIZZA < FoodType.BURGER
~/projects/test.py in __lt__(self, other)
13 def __lt__(self, other):
14 if self.__class__ is other.__class__:
---> 15 return self.FOOD_HIERARCHY.index(self) < self.FOOD_HIERARCHY.index(other)
16 return NotImplemented
17
~/projects//test.py in __eq__(self, other)
23 def __eq__(self, other):
24 if self.__class__ is other.__class__:
---> 25 return self.FOOD_HIERARCHY.index(self) == self.FOOD_HIERARCHY.index(other)
26 return NotImplemented
27
... last 1 frames repeated, from the frame below ...
~/projects/test.py in __eq__(self, other)
23 def __eq__(self, other):
24 if self.__class__ is other.__class__:
---> 25 return self.FOOD_HIERARCHY.index(self) == self.FOOD_HIERARCHY.index(other)
26 return NotImplemented
27
RecursionError: maximum recursion depth exceeded while calling a Python object
I can't figure out why I get a recursion error. If I use the enum values to build the hierarchy and to look up the indexes, I can make this code work, but I would like to avoid that if possible.
Any idea why I get the recursion error and how I could make this code more elegant?
EDIT: as people mentioned in the comments, I do override __eq__, __lt__ and __gt__. I wouldn't have done it normally, but in my real life example I have two different hierarchies and some enum members can be in the two hierarchies. So I need to first check the 2 enum members I'm comparing are in the same hierarchy. That said, I can probably use __super()__. Thanks for the observation.
EDIT 2:
Base on #Ethan Furman's answer, here is what the final code looks like:
from enum import IntEnum, unique
def hierarchy(hierarchy_name, member_names):
def decorate(enum_cls):
for name in enum_cls.__members__:
if not hasattr(enum_cls[name], "ordering"):
enum_cls[name].ordering = {}
for i, name in enumerate(member_names.split()):
# FIXME, check if name in __members__
# FIXME, shouldn't exist yet, check!
enum_cls[name].ordering[hierarchy_name] = i
return enum_cls
return decorate
#hierarchy("food_hierarchy", "COOKIE STEAK PIZZA BURGER")
#unique
class FoodType(IntEnum):
PIZZA = 100
COOKIE = 200
STEAK = 300
BURGER = 400
def __lt__(self, other) -> bool:
if self.__class__ is other.__class__:
try:
hierarchy = (self.ordering.keys() & other.ordering.keys()).pop()
except KeyError:
raise ValueError("uncomparable, hierachies don't overlap")
return self.ordering[hierarchy] < other.ordering[hierarchy]
return NotImplemented
def __eq__(self, other) -> bool:
if self.__class__ is other.__class__:
return int(self) == int(other)
return NotImplemented
The recursion error is not important as your design is flawed:
total_ordering is useless/harmful because IntEnum is an int and ints already have total ordering
the food items, being ints will compare with other ints
not properly comparing with other ints will be a hard-to-find bug at some point
Possible solutions:
add an extra attribute to each member to control food ordering
(optional) make FoodType be a normal Enum and add an __int__ method to easily convert to int (and keep total_ordering)
The extra attribute can be done in one of two ways:
defined with the member
added afterwards
Defined with the member could easily be confusing:
class FoodType(IntEnum):
PIZZA = 100, 3
COOKIE = 200, 1
STEAK = 300, 2
BURGER = 400, 4
So I would do it as a decorator
#add_order('COOKIE STEAK PIZZA BURGER')
class FoodType(IntEnum):
PIZZA = 100
COOKIE = 200
STEAK = 300
BURGER = 400
If FoodType becomes an Enum you can still use total_ordering, otherwise you should use different methods for comparison; if you don't then you'll have 100 (PIZZA) not < 101 (a normal int) which will be a bug at some point -- an easy example being FoodTypes and ints both being keys in the same dict().
The decorator and __lt__ would look like:
def add_order(enum_cls, member_names):
for i, name in enumerate(member_names.split()):
enum_cls[name].order = i
class FoodType(IntEnum):
...
def __lt__(self, other):
if isinstance(other, self.__class__):
return self.order < other.order
return NotImplemented
N.B. total_ordering had a bug regarding NotImplemented which was fixed in 3.4, and somewhere in 2.7. Make sure your version works properly if using 2.7 (or just add the comparison methods yourself).
You get a recursion error because in order to determine the index the list elements need to compared for equality, which in turn will invoke __eq__.
Alternatively you could use a mapping from the enum members to some ordering, e.g.:
FoodType.FOOD_HIERARCHIES = [
{FoodType.COOKIE: 1, FoodType.PIZZA: 2, FoodType.BURGER: 3},
{FoodType.STEAK: 1, FoodType.BURGER: 2},
]
This requires to make the enum hashable:
def __hash__(self):
return hash(self._name_)
This works because the dictionary lookup checks for object identity before considering __eq__.
Since total_ordering won't replace the methods inherited from the base class, you'd need to override all comparison methods (or inherit from Enum instead of IntEnum):
from enum import IntEnum, unique
import operator
#unique
class FoodType(IntEnum):
PIZZA = 100
COOKIE = 200
STEAK = 300
BURGER = 400
def __hash__(self):
return hash(self._name_)
def __lt__(self, other):
return self._compare(other, operator.lt)
def __le__(self, other):
return self._compare(other, operator.le)
def __gt__(self, other):
return self._compare(other, operator.gt)
def __ge__(self, other):
return self._compare(other, operator.ge)
def __eq__(self, other):
return self._compare(other, operator.eq)
def __ne__(self, other):
return self._compare(other, operator.ne)
def _compare(self, other, op):
if self.__class__ is other.__class__:
hierarchy = next(h for h in self.FOOD_HIERARCHIES if self in h)
try:
return op(hierarchy[self], hierarchy[other])
except KeyError:
return False # or: return NotImplemented
return NotImplemented
FoodType.FOOD_HIERARCHIES = [
{FoodType.COOKIE: 1, FoodType.PIZZA: 2, FoodType.BURGER: 3},
{FoodType.STEAK: 1, FoodType.BURGER: 2},
]
print(FoodType.COOKIE < FoodType.BURGER) # True
print(FoodType.STEAK > FoodType.BURGER) # False
print(FoodType.STEAK < FoodType.PIZZA) # False

Unexpected behavior for python set.__contains__

Borrowing the documentation from the __contains__ documentation
print set.__contains__.__doc__
x.__contains__(y) <==> y in x.
This seems to work fine for primitive objects such as int, basestring, etc. But for user-defined objects that define the __ne__ and __eq__ methods, I get unexpected behavior. Here is a sample code:
class CA(object):
def __init__(self,name):
self.name = name
def __eq__(self,other):
if self.name == other.name:
return True
return False
def __ne__(self,other):
return not self.__eq__(other)
obj1 = CA('hello')
obj2 = CA('hello')
theList = [obj1,]
theSet = set(theList)
# Test 1: list
print (obj2 in theList) # return True
# Test 2: set weird
print (obj2 in theSet) # return False unexpected
# Test 3: iterating over the set
found = False
for x in theSet:
if x == obj2:
found = True
print found # return True
# Test 4: Typcasting the set to a list
print (obj2 in list(theSet)) # return True
So is this a bug or a feature?
For sets and dicts, you need to define __hash__. Any two objects that are equal should hash the same in order to get consistent / expected behavior in sets and dicts.
I would reccomend using a _key method, and then just referencing that anywhere you need the part of the item to compare, just as you call __eq__ from __ne__ instead of reimplementing it:
class CA(object):
def __init__(self,name):
self.name = name
def _key(self):
return type(self), self.name
def __hash__(self):
return hash(self._key())
def __eq__(self,other):
if self._key() == other._key():
return True
return False
def __ne__(self,other):
return not self.__eq__(other)
This is because CA doesn't implement __hash__
A sensible implementation would be:
def __hash__(self):
return hash(self.name)
A set hashes it's elements to allow a fast lookup. You have to overwrite the __hash__ method so that a element can be found:
class CA(object):
def __hash__(self):
return hash(self.name)
Lists don't use hashing, but compare each element like your for loop does.

Python: Is this an ok way of overriding __eq__ and __hash__?

I'm new to Python, and I wanted to make sure that I overrode __eq__ and __hash__ correctly, so as not to cause painful errors later:
(I'm using Google App Engine.)
class Course(db.Model):
dept_code = db.StringProperty()
number = db.IntegerProperty()
title = db.StringProperty()
raw_pre_reqs = db.StringProperty(multiline=True)
original_description = db.StringProperty()
def getPreReqs(self):
return pickle.loads(str(self.raw_pre_reqs))
def __repr__(self):
title_msg = self.title if self.title else "Untitled"
return "%s %s: %s" % (self.dept_code, self.number, title_msg)
def __attrs(self):
return (self.dept_code, self.number, self.title, self.raw_pre_reqs, self.original_description)
def __eq__(self, other):
return isinstance(other, Course) and self.__attrs() == other.__attrs()
def __hash__(self):
return hash(self.__attrs())
A slightly more complicated type:
class DependencyArcTail(db.Model):
''' A list of courses that is a pre-req for something else '''
courses = db.ListProperty(db.Key)
''' a list of heads that reference this one '''
forwardLinks = db.ListProperty(db.Key)
def __repr__(self):
return "DepArcTail %d: courses='%s' forwardLinks='%s'" % (id(self), getReprOfKeys(self.courses), getIdOfKeys(self.forwardLinks))
def __eq__(self, other):
if not isinstance(other, DependencyArcTail):
return False
for this_course in self.courses:
if not (this_course in other.courses):
return False
for other_course in other.courses:
if not (other_course in self.courses):
return False
return True
def __hash__(self):
return hash((tuple(self.courses), tuple(self.forwardLinks)))
Everything look good?
Updated to reflect #Alex's comments
class DependencyArcTail(db.Model):
''' A list of courses that is a pre-req for something else '''
courses = db.ListProperty(db.Key)
''' a list of heads that reference this one '''
forwardLinks = db.ListProperty(db.Key)
def __repr__(self):
return "DepArcTail %d: courses='%s' forwardLinks='%s'" % (id(self), getReprOfKeys(self.courses), getIdOfKeys(self.forwardLinks))
def __eq__(self, other):
return isinstance(other, DependencyArcTail) and set(self.courses) == set(other.courses) and set(self.forwardLinks) == set(other.forwardLinks)
def __hash__(self):
return hash((tuple(self.courses), tuple(self.forwardLinks)))
The first one is fine. The second one is problematic for two reasons:
there might be duplicates in .courses
two entities with identical .courses but different .forwardLinks would compare equal but have different hashes
I would fix the second one by making equality depend on both courses and forward links, but both changes to sets (hence no duplicates), and the same for hashing. I.e.:
def __eq__(self, other):
if not isinstance(other, DependencyArcTail):
return False
return (set(self.courses) == set(other.courses) and
set(self.forwardLinks) == set(other.forwardLinks))
def __hash__(self):
return hash((frozenset(self.courses), frozenset(self.forwardLinks)))
This of course is assuming that the forward links are crucial to an object's "real value", otherwise they should be omitted from both __eq__ and __hash__.
Edit: removed from __hash__ calls to tuple which were at best redundant (and possibly damaging, as suggested by a comment by #Mark [[tx!!!]]); changed set to frozenset in the hashing, as suggested by a comment by #Phillips [[tx!!!]].

Categories

Resources