Python - Using the default __hash__ method in __hash__ method definition

Python - Using the default __hash__ method in __hash__ method definition - python

I have a class, and I wish to write a __hash__() method for this class. The method I want to write will return the default hash of the object in some cases, and a hash of one of its attributes in some others. So, as a simple test case:
class Foo:
def __init__(self, bar):
self.bar = bar
def __hash__(self):
if self.bar < 10:
return hash(self) # <- this line doesn't work
else:
return hash(self.bar)
The problem with this is that hash(self) simply calls self.__hash__(), leading to infinite recursion.
I gather that the hash of an object is based on the id() of that object, so I could rewrite return hash(self) as return id(self), or return id(self) / 16, but it seems bad form to me to recreate the default implementation in my own code.
It also occurred to me that I could rewrite it as return object.__hash__(self). This works, but seems even worse, as special methods are not intended to be called directly.
So, what I'm asking is; is there a way to use the default hash of an object without implicitly calling the __hash__() method of the class that object is an instance of?

To call parent implementation use:
super(Foo, self).__hash__()
It also occurred to me that I could rewrite it as return
object.__hash__(self). This works, but seems even worse, as special
methods are not intended to be called directly.
You are overriding a magic method, so it's ok to call parent's implementation directly.

Related

When does getattr get triggered?

I have a class as follows:
class Lz:
def __init__(self, b):
self.b = b
def __getattr__(self, item):
return self.b.__getattribute__(item)
And I create an instance and print :
a = Lz('abc')
print(a)
Result is: abc
I have set a breakpoint at line return self.b.__getattribute__(item), item show __str__
I don't know why it calls __getattr__, and item is __str__ when I access the instance.

print calls __str__ (see this question for details), but as Lz does not have a __str__ method, a lookup for an attribute named '__str__' takes place using __getattr__.
So if you add a __str__ method, __getattr__ should not be called anymore when printing objects of the Lz class.

print(obj) invokes str(obj) (to get a printable representation), which in turn tries to invokes obj.__str__() (and fallback to something else if this fails, but that's not the point here).
You defined Lz as an old-style class, so it's doesn't by default have a __str__ method (new-style classes inherit this from object), but you defined a __getattr__() method, so this is what gets invoked in the end (__getattr__() is the last thing the attribute lookup will invoke when everything else has failed).
NB: in case you don't already know, since everything in Python is an object - include classes, functions, methods etc - Python doesn't make difference between "data" attributes and "method" attributes - those are all attributes, period.
NB2: directly accessing __magic__ names is considered bad practice. Those names are implementation support for operators or operator-like generic functions (ie len(), type etc), and you are supposed to use the operator or generic function instead. IOW, this:
return self.b.__getattribute__(item)
should be written as
return getattr(self.b, item)
(getattr() is the generic function version of the "dot" attribute lookup operator (.))

Design eq that compares dict of self and other safe from RecursionError

I've stumbled upon really weird python 3 issue, cause of which I do not understand.
I'd like to compare my objects by checking if all their attributes are equal.
Some of the child classes will have fields that contain references to methods bound to self - and that causes RecursionError
Here's the PoC:
class A:
def __init__(self, field):
self.methods = [self.method]
self.field = field
def __eq__(self, other):
if type(self) != type(other):
return False
return self.__dict__ == other.__dict__
def method(self):
pass
first = A(field='foo')
second = A(field='bar')
print(first == second)
Running the code above in python 3 raises RecursionError and I'm not sure why. It seems that the A.__eq__ is used to compare the functions kept in self.methods. So my first question is - why? Why the object's __eq__ is called to compare bound function of that object?
The second question is - What kind of filter on __dict__ should I use to protect the __eq__ from this issue? I mean - in the PoC above the self.method is kept simply in a list, but sometimes it may be in another structure. The filtering would have to include all the possible containers that can hold the self-reference.
One clarification: I do need to keep the self.method function in a self.methods field. The usecase here is similar to unittest.TestCase._cleanups - a stack of methods that are to be called after the test is finished. The framework must be able to run the following code:
# obj is a child instance of the A class
obj.append(obj.child_method)
for method in obj.methods:
method()
Another clarification: the only code I can change is the __eq__ implementation.

"Why the object's __eq__ is called to compare bound function of that object?":
Because bound methods compare by the following algorithm:
Is the self bound to each method equal?
If so, is the function implementing the method the same?
Step 1 causes your infinite recursion; in comparing the __dict__, it eventually ends up comparing the bound methods, and to do so, it has to compare the objects to each other again, and now you're right back where you started, and it continues forever.
The only "solution"s I can come up with off-hand are:
Something like the reprlib.recursive_repr decorator (which would be extremely hacky, since you'd be heuristically determining if you're comparing for bound method related reasons based on whether __eq__ was re-entered), or
A wrapper for any bound methods you store that replaces equality testing of the respective selfs with identity testing.
The wrapper for bound methods isn't terrible at least. You'd basically just make a simple wrapper of the form:
class IdentityComparableMethod:
__slots__ = '_method',
def __new__(cls, method):
# Using __new__ prevents reinitialization, part of immutability contract
# that justifies defining __hash__
self = super().__new__(cls)
self._method = method
return self
def __getattr__(self, name):
'''Attribute access should match bound method's'''
return getattr(self._method, name)
def __eq__(self, other):
'''Comparable to other instances, and normal methods'''
if not isinstance(other, (IdentityComparableMethod, types.MethodType)):
return NotImplemented
return (self.__self__ is other.__self__ and
self.__func__ is other.__func__)
def __hash__(self):
'''Hash identically to the method'''
return hash(self._method)
def __call__(self, *args, **kwargs):
'''Delegate to method'''
return self._method(*args, **kwargs)
def __repr__(self):
return '{0.__class__.__name__}({0._method!r})'.format(self)
then when storing bound methods, wrap them in that class, e.g.:
self.methods = [IdentityComparableMethod(self.method)]
You may want to make methods itself enforce this via additional magic (so it only stores functions or IdentityComparableMethods), but that's the basic idea.
Other answers address more targeted filtering, this is just a way to make that filtering unnecessary.
Performance note: I didn't heavily optimize for performance; __getattr__ is the simplest way of reflecting all the attributes of the underlying method. If you want comparisons to go faster, you can fetch out __self__ during initialization and cache it on self directly to avoid __getattr__ calls, changing the __slots__ and __new__ declaration to:
__slots__ = '_method', '__self__'
def __new__(cls, method):
# Using __new__ prevents reinitialization, part of immutability contract
# that justifies defining __hash__
self = super().__new__(cls)
self._method = method
self.__self__ = method.__self__
return self
That makes a pretty significant difference in comparison speed; in local %timeit tests, the first == second comparison dropped from 2.77 μs to 1.05 μs. You could cache __func__ as well if you like, but since it's the fallback comparison, it's less likely to be checked at all (and you'd slow construction a titch for an optimization you're less likely to use).
Alternatively, instead of caching, you can just manually define #propertys for __self__ and __func__, which are slower than raw attributes (comparison ran in 1.41 μs), but incur no construction time cost at all (so if no comparison is ever run, you don't pay the lookup cost).

The reason why self.methods = [self.method] and then performing __eq__ ends up creating a recursion error is nicely explained in one of the comments in this question by #Aran-Fey
self.getX == other.getX compares two bound methods. Bound methods are considered equal if the method is the same, and the instances they're bound to are equal. So comparing two bound methods also compares the instances, which calls the __eq__ method again, which compares bound methods again, etc
One way to resolve it is to perform key-wise comparison on self.__dict__ and ignore methods key
class A:
def __init__(self, field):
self.methods = [self.method]
self.field = field
def __eq__(self, other):
#Iterate through all keys
for key in self.__dict__:
#Perform comparison on values except the key methods
if key != 'methods':
if self.__dict__[key] != other.__dict__[key]:
return False
return True
def method(self):
pass
first = A(field='foo')
second = A(field='bar')
print(first == second)
The output will be False

Edit:
I think the "==" cause the error. You can install deepdiff and modify your code to:
class A:
def __init__(self, field):
self.methods = [self.method]
self.field = field
def __eq__(self, other):
import deepdiff
if type(self) != type(other):
return False
return deepdiff.DeepDiff(self.__dict__, other.__dict__) == {}
def method(self):
pass
Then,
A(field='foo') == A(field='bar') returns False
and
A(field='foo') == A(field='foo') returns True
Original Answer:
Try replacing
self.methods = [self.method]
with
self.methods = [A.method]
And the result is False

The issue you're running into is being caused by a very old bug in CPython. The good news is that it has already been fixed for Python 3.8 (which will soon be getting its first beta release).
To understand the issue, you need to understand how the equality check for methods from Python 2.5 through 3.7 worked. A bound method has a self and a func attribute. In the versions of Python where this bug is an issue, a comparison of two bound methods would compare both the func and the self values for Python-level equality (using the C-API equivalent to the Python == operator). With your class, this leads to infinite recursion, since the objects want to compare the bound methods stored in their methods lists, and the bound methods need to compare their self attributes.
The fixed code uses an identity comparison, rather than an equality comparison, for the self attribute of bound method objects. This has additional benefits, as methods of "equal" but not identical objects will no longer be considered equal when they shouldn't be. The motivating example was a set of callbacks. You might want your code to avoid calling the same callback several times if it was registered multiple times, but you wouldn't want to incorrectly skip over a callback if it was bound to an equal (but not identical) object. For instance, two empty containers append method registered, and you wouldn't want them to be equal:
class MyContainer(list): # inherits == operator and from list, so empty containers are equal
def append(self, value):
super().append(value)
callbacks = []
def register_callback(cb):
if cb not in callbacks: # this does an == test against all previously registered callbacks
callbacks.append(cb)
def do_callbacks(*args):
for cb in callbacks:
cb(*args)
container1 = MyContainer()
register_callback(container1.append)
container2 = MyContainer()
register_callback(container2.append)
do_callbacks('foo')
print(container1 == container2) # this should be true, if both callbacks got run
The print call at the end of the code will output False in most recent versions, but in Python 3.8 thanks to the bug fix, it will write True, as it should.

I'll post the solution I came up with (inspired by #devesh-kumar-singh answer), however it does seem bitter-sweet.
def __eq__(self, other):
if type(self) != type(other):
return False
for key in self.__dict__:
try:
flag = self.__dict__[key] == other.__dict__[key]
if flag is False:
# if one of the attributes is different, the objects are as well
return False
except RecursionError:
# We stumbled upon an attribute that is somehow bound to self
pass
return flag
The benefit over #tianbo-ji solution is that it's faster if we find a difference in __dict__ values before we stuble upon bound method. But if we don't - it's an order of magnitude slower.

Porting Subclass of Unicode to Python 3

I'm porting a legacy codebase from Python 2.7 to Python 3.6. In that codebase I have a number of instances of things like:
class EntityName(unicode):
#staticmethod
def __new__(cls, s):
clean = cls.strip_junk(s)
return super(EntityName, cls).__new__(cls, clean)
def __init__(self, s):
self._clean = s
self._normalized = normalized_name(self._clean)
self._simplified = simplified_name(self._clean)
self._is_all_caps = None
self._is_all_lower = None
super(EntityName, self).__init__(self._clean)
It might be called like this:
EntityName("Guy DeFalt")
When porting this to Python 3 the above code fails because unicode is no longer a class you can extend (at least, if there is an equivalent class I cannot find it). Given that str is unicode now, I tried to just swap str in, but the parent init doesn't take a the string value I'm trying to pass:
TypeError: object.__init__() takes no parameters
This makes sense because str does not have an __init__ method - this does not seem to be an idiomatic way of using this class. So my question has two major branches:
Is there a better way to be porting classes that sub-classed the old unicode class?
If subclassing str is appropriate, how should I modify the __init__ function for idiomatic behavior?

The right way to subclass a string or another immutable class in Python 3 is same as in Python 2:
class MyString(str):
def __new__(cls, initial_arguments): # no staticmethod
desired_string_value = get_desired_string_value(initial_arguments)
return super(MyString, cls).__new__(cls, desired_string_value)
# can be shortened to super().__new__(...)
def __init__(self, initial_arguments): # arguments are unused
self.whatever = whatever(self)
# no need to call super().__init__(), but if you must, do not pass arguments
There are several issues with your sample. First, why __new__ is #staticmethod? It's #classmethod, although you don't need to specify this. Second, the code seems to operate under the assumption that when you call __new__ of the superclass, it somehow calls your __init__ as well. I'm deriving this from looking at how self._clean is supposed to be set. This is not the case. When you call MyString(arguments), the following happens:
First Python calls __new__ with the class parameter (usually called cls) and arguments. __new__ must return the class instance. To do this it can create it, as we do, or do something else; e.g. it may return an existing one or, in fact, anything.
Then Python calls __init__ with the instance it received from __new__ (this parameter is usually called self) and the same arguments.
(There's a special case: Python won't call __init__ if __new__ returned something that is not a subclass of the passed class.)
Python uses class hierarchy to see which __new__ and __init__ to call. It's up to you to correctly sort out the arguments and use proper superclass calls in these two methods.

How to know which next attribute is requested in python

I have class with custom getter, so I have situations when I need to use my custom getter, and situations when I need to use default.
So consider following.
If I call method of object c in this way:
c.somePyClassProp
In that case I need to call custom getter, and getter will return int value, not Python object.
But if I call method on this way:
c.somePyClassProp.getAttributes()
In this case I need to use default setter, and first return need to be Python object, and then we need to call getAttributes method of returned python object (from c.somePyClassProp).
Note that somePyClassProp is actually property of class which is another Python class instance.
So, is there any way in Python on which we can know whether some other methods will be called after first method call?

No. c.someMethod is a self-contained expression; its evaluation cannot be influenced by the context in which the result will be used. If it were possible to achieve what you want, this would be the result:
x = c.someMethod
c.someMethod.getAttributes() # Works!
x.getAttributes() # AttributeError!
This would be confusing as hell.
Don't try to make c.someMethod behave differently depending on what will be done with it, and if possible, don't make c.someMethod a method call at all. People will expect c.someMethod to return a bound method object that can then be called to execute the method; just define the method the usual way and call it with c.someMethod().

You don't want to return different values based on which attribute is accessed next, you want to return an int-like object that also has the required attribute on it. To do this, we create a subclass of int that has a getAttributes() method. An instance of this class, of course, needs to know what object it is "bound" to, that is, what object its getAttributes() method should refer to, so we'll add this to the constructor.
class bound_int(int):
def __new__(cls, value, obj):
val = int.__new__(cls, value)
val.obj = obj
return val
def getAttributes(self):
return self.obj.somePyClassProp
Now in your getter for c.somePyClassProp, instead of returning an integer, you return a bound_int and pass it a reference to the object its getAttributes() method needs to know about (here I'll just have it refer to self, the object it's being returned from):
#property
def somePyClassProp(self):
return bound_int(42, self)
This way, if you use c.somePyPclassProp as an int, it acts just like any other int, because it is one, but if you want to further call getAttributes() on it, you can do that, too. It's the same value in both cases; it just has been built to fulfill both purposes. This approach can be adapted to pretty much any problem of this type.

It looks like you want two ways to get the property depending on what you want to do with it. I don't think there's any inherent Pythonic way to implement this, and you therefore need to store a variable or property name for each case. Maybe:
c.somePyClassProp
can be used in the __get__ and
c.somePyClassProp__getAttributes()
can be implemented in a more custom way inside the __getattribute__ function.
One way I've used (which is probably not the best) is to check for that exact variable name:
def __getattribute__(self, var_name):
if ('__' in var_name):
var_name, method = var_name.split('__')
return object.__getattribute__(self, var_name).__getattribute__(method)
Using object.__get__(self, var_name) uses the object class's method of getting a property directly.

You can store the contained python object as a variable and the create getters via the #property dectorator for whatever values you want. When you want to read the int, reference the property. When you want the contained object, use its variable name instead.
class SomePyClass(object):
def getInt(self):
return 1
def getAttributes(self):
return 'a b c'
class MyClass(object):
def __init__(self, py_class):
self._py_class = py_class
#property
def some_property(self):
return self._py_class.getInt()
x = MyClass(SomePyClass())
y = self.some_property
x._py_class.getAttributes()

How to decorate an object method?

I need to decorate a object's method. It needs to be at runtime because the decorators applied on the object depends on the arguments that the user gave when calling the program (arguments supplied with argv), so a same object could be decorated 3 times, 2 times, or not be decorated at all.
Here is some context, the program is a puzzle solver, the main behavior is to find a solution for the puzzle automatically, by automatically I mean without user intervention. And here is where the decoration gets to play, one of the things I want to is draw a graph of what happened during the execution, but I want to do so only when the flag --draw-graph is used.
Here is what I've tried:
class GraphDecorator(object):
def __init__(self, wrappee):
self.wrappee = wrappee
def method(self):
# do my stuff here
self.wrappee.method()
# do more of stuff here
def __getattr__(self,attr):
return getattr(self.wrappee,attr)
And why it did NOT work:
It did not work because of the way I built the application, when a method that did not exist in my Decorator class was called it felt back to the implementation of the decorated class, the problem is that the application always started invoking the method run that did not need to be decorated, so the undecorated fall back was used and from inside the undecorated form it always called undecorated methods, what I needed was to replace the method from the object, not to proxy it:
# method responsible to replace the undecorated form by the decorated one
def graphDecorator(obj):
old_method = obj.method
def method(self):
# do my stuff here
old_method()
# do more of my stuff
setattr(obj,'method',method) # replace with the decorated form
And here is my problem, the decorated form does not receive self when it is called resulting on a TypeError because of the wrong number of arguments.

The problem was that I couldn't use func(self) as a method. The reason is that setattr() method does not bound the function, and the function acts like it a static method - not a class method -, thanks to the introspective nature of python I've able to come up with this solution:
def decorator(obj):
old_func = obj.func # can't call 'by name' because of recursion
def decorated_func(self):
# do my stuff here
old_func() # does not need pass obj
# do some othere stuff here
# here is the magic, this get the type of a 'normal method' of a class
method = type(obj.func)
# this bounds the method to the object, so self is passed by default
obj.func = method(decorated_func, obj)
I think this is the best way to decorate a object's method at runtime, though it would be nice to find a way to call method() directly, without the line method = type(obj.func)

You might want to use __getattribute__ instead of __getattr__ (the latter being only called if "standard" lookup fails):
class GraphDecorator(object):
def __init__(self, wrappee):
self.__wrappee = wrappee
def method(self):
# do my stuff here
self.wrappe.method()
# do more of stuff here
def __getattribute__(self, name):
try:
wrappee = object.__getattribute__(self, "_GraphDecorator__wrappee")
return getattr(wrappee, name)
except AttributeError:
return object.__getattribute__(self, name)

I need to decorate a object's method. It needs to be at runtime because the decorators applied on the object depends on the arguments that the user gave when calling the program (arguments supplied with argv), so a same object could be decorated 3 times, 2 times, or not be decorated at all.
The above is unfortunately incorrect, and what you are trying to do is unnecessary.
You can do this at runtime like so. Example:
import sys
args = sys.argv[1:]
class MyClass(object):
pass
if args[0]=='--decorateWithFoo':
MyClass = decoratorFoo(MyClass)
if args[1]=='--decorateWithBar'
MyClass = decoratorBar(MyClass)
The syntax:
#deco
define something
Is the same thing as:
define something
something = deco(something)
You could also make a decorator factory #makeDecorator(command_line_arguments)

"It needs to be at runtime because the decorators applied on the object depends on the arguments that the user gave when calling the program"
The don't use decorators. Decorators are only syntactical support for wrappers, you can just as well use normal function/method calls instead.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Using the default hash method in hash method definition - python

Related

When does getattr get triggered?

Design eq that compares dict of self and other safe from RecursionError

Porting Subclass of Unicode to Python 3

How to know which next attribute is requested in python

How to decorate an object method?

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Using the default __hash__ method in __hash__ method definition - python

Related

When does __getattr__ get triggered?

Design __eq__ that compares __dict__ of self and other safe from RecursionError

Porting Subclass of Unicode to Python 3

How to know which next attribute is requested in python

How to decorate an object method?

Categories

Resources

Python - Using the default hash method in hash method definition - python

When does getattr get triggered?

Design eq that compares dict of self and other safe from RecursionError