python compare 2 similar objects with duck typing - python

Maybe my design is totally out of whack, but if I have 2 derived class objects that are comparable, but class D1 will basically always > class D2. (Say comparing Ivy Bridge to 286). How would I implement class D1's comparison to reflect that without using isinstance(D2)?
I saw this:
Comparing two objects
and
If duck-typing in Python, should you test isinstance?
I could add a "type" attribute, and then compare the types, but then I might as well use isinstance. The easiest way would be to use isinstance... Any better suggestions?

I would ask myself "what is it about D2 that makes it always greater than D1?" In other words, do they have some common attribute that it would make sense to base the comparison off of. If there is no good answer for this question, it might be worth asking whether creating comparisons for these two objects actually make sense.
IF, after considering these things, you still think that doing the comparison is a good idea, then just use isinstance. There's a reason it still exists in the language -- and python is constantly deprecating things that are considered bad practice which implies that isinstance isn't always a bad thing.
The problem is when isinstance is used to do type checking unnecessarily. In other words, users often use it in a "Look before you leap" context which is completely unnecessary.
if not isinstance(arg,Foo):
raise ValueError("I want a Foo")
In this case, if the user doesn't put something that looks enough like a Foo into the function, it will raise an exception anyway. Why restrict it to only Foo objects? In your case however, it seems like the type of the objects actually matter from a conceptual standpoint. This is why isinstance exists (in my opinion).

I would do something like this:
class D1(object):
def __val_cmp(self, other):
# compare by attributes here
if self.attr < other.attr:
return -1
elif self.attr > other.attr:
return 1
return 0
def __cmp__(self, other):
greater = isinstance(other, type(self))
lesser = isinstance(self, type(other))
if greater and lesser:
# same type so compare by attributes
return self.__val_cmp(other)
elif greater:
return 1
elif lesser:
return -1
else:
# other type is not a parent or child type, so just compare by attributes
return self.__val_cmp(other)
If D2 is a subtype of D1, instances of D2 will always compare less than instances of D1.
If D0 is a parent type of D1, instances of D0 will always compare greater than instances of D1.
If you compare an instance of D1 to another instance of D1, the comparison will be by the class's attributes.
If you compare an instance of D1 to an instance of an unknown class, the comparison will be by the class's attributes

Related

In Python, can an object become hashable after it is created?

I want to create recursive data structures via a class Wrapper, and I want to be able to do something like this:
A = Wrapper()
A.assign_value((A, 5))
my_set = set()
my_set.add(A)
This requires A to be hashable, but one of the requirements is (according to docs):
it has a hash value which never changes during its lifetime
The Wrapper object can become hashable, and then used freely in sets and as dictionary keys, so long as assign_value can only be used once, but I'm not sure if that meets the definition, since the value changes over its lifetime (even though the value is guaranteed not to change any more). Also, I'm unsure how I would implement a __hash__ function that sometimes indicates "this object is not actually hashable", if it's even possible. It seems the very existence of a valid __hash__ function indicates that an object is hashable, regardless of what it returns when called, in standard Python.
My current idea is to set the __hash__ function in the instance rather than the class, like this:
class Wrapper():
def __init__(self):
self.assigned = False
def _hash(self):
return hash(Wrapper, self.val)
def _eq(self, a):
if isinstance(self, Wrapper) and a.assigned:
return self.val == a.val
return NotImplemented
def assign_value(self, val):
if self.assigned == True:
raise NotImplementedError()
self.assigned = True
self.val = val
self.__hash__ = self._hash
self.__eq__ = self._eq
I am realizing that the hash function will enter an infinite loop, constantly trying to compute hash(A), in my given example. __eq__ will also face this problem, though in that case it can be resolved the same way recursive lists are handled in Python. Setting a default hash value (like hash(Wrapper)) to be used in the computation for Wrapper objects does seem to fix the infinite recursion problem in theory though, making the whole thing still technically possible. I could also just say def _hash(self): return 0, despite the performance cost for sets and dicts.
Would this work as I expect? Is it the best way to do this? Most importantly, should I be doing this in the first place?

Is there a hash of a class instance in Python?

Let's suppose I have a class like this:
class MyClass:
def __init__(self, a):
self._a = a
And I construct such instances:
obj1 = MyClass(5)
obj2 = MyClass(12)
obj3 = MyClass(5)
Is there a general way to hash my objects such that objects constructed with same values have equal hashes? In this case:
myhash(obj1) != myhash(obj2)
myhash(obj1) == myhash(obj3)
By general I mean a Python function that can work with objects created by any class I can define. For different classes and same values the hash function must return different results, of course; otherwise this question would be about hashing of several arguments instead.
def myhash(obj):
items = sorted(obj.__dict__.items(), key=lambda it: it[0])
return hash((type(obj),) + tuple(items))
This solution obviously has limitations:
It assumes that all fields in __dict__ are important.
It assumes that __dict__ is present, e.g. this won't work with __slots__.
It assumes that all values are hashable
It breaks the Liskov substitution principle.
The question is badly formed for a couple reasons:
Hashes don't test eqaulity, just inequality. That is, they guarantee that hash(a) != hash(b) implies a != b, but the reverse does not hold true. For example, checking "aKey" in myDict will do a linear search through all keys in myDict that have the same hash as "aKey".
You seem to wanting to do something with storage. Note that the hash of "aKey" will change between runs, so don't write it to a file. See the bottom of __hash__ for more information.
In general, you need to think carefully about subclasses, hashes, and equality. There is a pit here, so even the official documentation quietly sidesteps what the hash of instance means. Do note that each instance has a __dict__ for local variables and the __class__ with more information.
Hope this helps those who come after you.

Hashing Custom Objects

hash() method in python can match all immutable objects to unique hash value. However, I cannot understand the behavior of hash() method for objects of user-defined classes. Some of the resources say that if user-defined class do not contain __hash__() and __eq__() methods, that object cannot be hashed. On the other hand, the others claim the opposite one.
In other words, what is the role of __eq__() and __hash__() methods in order to hash custom objects ?
If you don't implement __hash__, hash() will use the default implementation. If you don't implement __eq__, the default implementation will be used when you compare two instances.
class C:
pass
class D:
def __hash__(self):
return 1
def __eq__(self, other):
return True
print(hash(C())) # changing for every C instance
print(C() == C()) # False since objects are different
print(hash(D())) # always 1
print(D() == D()) # always True
Basically, 'hash' should be quick, and act as a "triage" calculation to know if two objects are equal or not.
The 'eq' should precisely be the function that tells if the objects are definitely
equal or not. Maybe this funciton has to perform a lot of checks ( for instance if you want to define the equality of your objects by the equality of all the member fields, and maybe there is are a lot of them)
The purpose of these two functions is to have a quick way of saying "no, they are not equal" (the hash function), since the comparisons are often used a lot, and most often two objects are not supposed to be "equals".
Instead of executing a lot of "eq" functions, you execute a lot of quick "hash" functions, and if both the hashes match, you execute "eq" to confirm the equality or not.

return something other than the object on creation of object

I want to make a class that knows to not instantiate based on input parameters. For a simple example if I want to create an object that can only exist if one of its input parameters is > 1 then
foo = new_object(0.1)
should return None to foo rather than the object.
Its strikes me as an elegant way to create objects as it means I need no code outside the class to decide whether to create it or not
Is there a way to do this, or equally useful, would this be bad practice, and why?
You'll need to override __new__ -- make sure it takes the same arguments as __init__:
class Test(object):
def __init__(self, value):
self.value = value
def __new__(cls, value):
if value > 1:
return object.__new__(cls)
return None
def __repr__(self):
return "Test value %d" % self.value
t1 = Test(2)
print repr(t1)
t2 = Test(1)
print repr(t2)
Python has support for returning objects of different types from __new__ but it's a fairly rare practice.
In your use-case, if you are choosing between
if value < 1:
foo = None
else:
foo = Test(value)
and
foo = Test(value) # will None if value <= 1
and this is something you have to do many times, then I would definitely consider having the class do it.
In those cases where you don't have control over new_object you can make your own factory function:
def maybe_foo(value):
if value > 1:
return new_object(value)
return None
You can override __new__() to effectively turn object instantiation in to a factory-like operation like you want to do here. I sometimes like to use __new__() of an abstract base class as a factory for the concrete subclasses as long as the list of concrete subclasses can be limited and known. Just make sure it is the best solution for your problem, as it probably isn't...
Quite obviously this would be a bad practice, for the very simple reason that nobody does it like this. Calling a constructor is supposed to construct the object instance, not selectively decide if it wants to or not. You're not supposed to need to check for failure when constructing objects. So it has a quite high "wtf quota", which is never a good idea.
That said, I'm not sure if it's even possible, since __init__() is run on the instance after it's already been created (and doesn't end with return self). This being Python, I'm sure something can be wrangled. My point is that doing so is a bad idea.

Python: detect duplicates using a set

I have a large number of objects I need to store in memory for processing in Python. Specifically, I'm trying to remove duplicates from a large set of objects. I want to consider two objects "equal" if a certain instance variable in the object is equal. So, I assumed the easiest way to do this would be to insert all my objects into a set, and override the __hash__ method so that it hashes the instance variable I'm concerned with.
So, as a test I tried the following:
class Person:
def __init__(self, n, a):
self.name = n
self.age = a
def __hash__(self):
return hash(self.name)
def __str__(self):
return "{0}:{1}".format(self.name, self.age)
myset = set()
myset.add(Person("foo", 10))
myset.add(Person("bar", 20))
myset.add(Person("baz", 30))
myset.add(Person("foo", 1000)) # try adding a duplicate
for p in myset: print(p)
Here, I define a Person class, and any two instances of Person with the same name variable are to be equal, regardless of the value of any other instance variable. Unfortunately, this outputs:
baz:30
foo:10
bar:20
foo:1000
Note that foo appears twice, so this program failed to notice duplicates. Yet the expression hash(Person("foo", 10)) == hash(Person("foo", 1000)) is True. So why doesn't this properly detect duplicate Person objects?
You forgot to also define __eq__().
If a class does not define a __cmp__() or __eq__() method it should not define a __hash__() operation either; if it defines __cmp__() or __eq__() but not __hash__(), its instances will not be usable in hashed collections. If a class defines mutable objects and implements a __cmp__() or __eq__() method, it should not implement __hash__(), since hashable collection implementations require that a object’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).
A set obviously will have to deal with hash collisions. If the hash of two objects matches, the set will compare them using the == operator to make sure they are really equal. In your case, this will only yield True if the two objects are the same object (the standard implementation for user-defined classes).
Long story short: Also define __eq__() to make it work.
Hash function is not enough to distinguish object you have to implement the comparison function (ie. __eq__).
A hash function effectively says "A maybe equals B" or "A not equals B (for sure)".
If it says "maybe equals" then equality has to be checked anyway to make sure, which is why you also need to implement __eq__.
Nevertheless, defining __hash__ will significantly speed things up by making "A not equal B (for sure)" an O(1) operation.
The hash function must however always follow the "hash rule":
"hash rule": equal things must hash to the same value
(justification: or else we'd say "A not equals B (for sure)" when that is not the case)
For example you could hash everything by def __hash__(self): return 1. This would still be correct, but it would be inefficient because you'd have to check __eq__ each time, which may be a long process if you have complicated large data structures (e.g. with large lists, dictionaries, etc.).
Do note that you technically follow the "hash rule" do this by ignoring age in your implementation def __hash__(self): return self.name. If Bob is a person of age 20 and Bob is another person of age 30 and they are different people (likely unless this is some sort of keeps-track-of-people-over-time-as-they-age program), then they will hash to the same value and have to be compared with __eq__. This is perfectly fine, but I would implement it like so:
def __hash__(self):
return hash( (self.name, self.age) )
Do note that your way is still correct. It would however have been a coding error to use hash( (self.name, self.age) ) in a world where Person("Bob", age=20) and Person("Bob", age=30) were actually the same person, because the hash function would be saying they're different while the equals function would not (but be ignored).
You also need the __ eq __() method.

Categories

Resources