Why does deepcopy fail with "KeyError: '__deepcopy__'" when copying custom object? - python

I have a class that converts a dictionary to an object like this
class Dict2obj(dict):
__getattr__= dict.__getitem__
def __init__(self, d):
self.update(**dict((k, self.parse(v))
for k, v in d.iteritems()))
#classmethod
def parse(cls, v):
if isinstance(v, dict):
return cls(v)
elif isinstance(v, list):
return [cls.parse(i) for i in v]
else:
return v
When I try to make a deep copy of the object I get this error
import copy
my_object = Dict2obj(json_data)
copy_object = copy.deepcopy(my_object)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.py", line 172, in deepcopy
copier = getattr(x, "__deepcopy__", None)
KeyError: '__deepcopy__'
But if I override the __getattr__ function in the Dict2obj class I was able to do deep copy operation. See example below
class Dict2obj(dict):
__getattr__= dict.__getitem__
def __init__(self, d):
self.update(**dict((k, self.parse(v))
for k, v in d.iteritems()))
def __getattr__(self, key):
if key in self:
return self[key]
raise AttributeError
#classmethod
def parse(cls, v):
if isinstance(v, dict):
return cls(v)
elif isinstance(v, list):
return [cls.parse(i) for i in v]
else:
return v
Why do I need to override __getattr__ method in order to do a deepcopy of objects returned by this class?

The issue occurs for your first class, because copy.deepcopy tries to call getattr(x, "__deepcopy__", None) . The significance of the third argument is that, if the attribute does not exist for the object, it returns the third argument.
This is given in the documentation for getattr() -
getattr(object, name[, default])
Return the value of the named attribute of object. name must be a string. If the string is the name of one of the object’s attributes, the result is the value of that attribute. For example, getattr(x, 'foobar') is equivalent to x.foobar. If the named attribute does not exist, default is returned if provided, otherwise AttributeError is raised.
This works as , if the underlying __getattr__ raises AttributeError and the default argument was provided for the getattr() function call the AttributeError is caught by the getattr() function and it returns the default argument, otherwise it lets the AttributeError bubble up. Example -
>>> class C:
... def __getattr__(self,k):
... raise AttributeError('asd')
...
>>>
>>> c = C()
>>> getattr(c,'a')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in __getattr__
AttributeError: asd
>>> print(getattr(c,'a',None))
None
But in your case, since you directly assign dict.__getitem__ to __getattr__ , if the name is not found in the dictionary, it raises a KeyError , not an AttributeError and hence it does not get handled by getattr() and your copy.deepcopy() fails.
You should handle the KeyError in your getattr and then raise AttributeError instead. Example -
class Dict2obj(dict):
def __init__(self, d):
self.update(**dict((k, self.parse(v))
for k, v in d.iteritems()))
def __getattr__(self, name):
try:
return self[name]
except KeyError:
raise AttributeError(name)
...

Related

How to make a class work nicely with `hasattr` in `__getattr__` implementation?

I have a class, where __getattr__ is implemented simply like:
def __getattr__(self, item): return self.__dict__[item]
The problem I'm seeing is that many Python libraries (e.g. numpy and pandas are trying to sniff whether my object has something called __array__ using this statement
hasattr(obj, '__array__`)
But my object is throwing an error at them saying there is no such attribute.
My dilemma: How can I make my class behave nicely with hasattr (by returning False) instead of throwing an error, WHILE at the same time, throw an error if any one wanted an attribute that doesn't exist (i.e. I still want that error to be thrown in any other case).
EDIT: reproducible code as requested:
class A:
def __getattr__(self, item): return self.__dict__[item]
a = A()
hasattr(a, "lol")
traceback:
File "<ipython-input-31-b7d3ffac514f>", line 4, in <module>
hasattr(a, "lol")
File "<ipython-input-31-b7d3ffac514f>", line 2, in __getattr__
def __getattr__(self, item): return self.__dict__[item]
KeyError: 'lol'
From the docs:
__getattr__ ... should either return the attribute value or raise an AttributeError exception.
hasattr(object, name) ... is implemented by calling getattr(object, name) and seeing whether it raises an AttributeError or not.
So you just need to raise an AttributeError:
class A:
def __getattr__(self, item):
try:
return self.__dict__[item]
except KeyError:
classname = type(self).__name__
msg = f'{classname!r} object has no attribute {item!r}'
raise AttributeError(msg)
a = A()
print(hasattr(a, "lol")) # -> False
print(a.lol) # -> AttributeError: 'A' object has no attribute 'lol'
(This error message is based on the one from object().lol.)

TypeError for class instance when checking attributes as suggested by #jusbueno

I am referring to the question asked in How to force/ensure class attributes are a specific type? (shown bellow).
The type checking works as suggested. However, the class instance has an error. Namely, when instantiate the class as follows and call __dict__ on it, the error comes up.
excel_parser.py:
one_foo = Foo()
one_foo.__dict__
results in:
Traceback (most recent call last):
File "C:/Users/fiona/PycharmProjects/data_processing/excel_parser.py", line 80, in <module>
Foo.__dict__
TypeError: descriptor '__dict__' for 'Foo' objects doesn't apply to a 'Foo' object
How can I prevent this from happening? Thx
def getter_setter_gen(name, type_):
def getter(self):
return getattr(self, "__" + name)
def setter(self, value):
if not isinstance(value, type_):
raise TypeError(f"{name} attribute must be set to an instance of {type_}")
setattr(self, "__" + name, value)
return property(getter, setter)
def auto_attr_check(cls):
new_dct = {}
for key, value in cls.__dict__.items():
if isinstance(value, type):
value = getter_setter_gen(key, value)
new_dct[key] = value
# Creates a new class, using the modified dictionary as the class dict:
return type(cls)(cls.__name__, cls.__bases__, new_dct)
#auto_attr_check
class Foo(object):
bar = int
baz = str
bam = float

Correct handling of AttributeError in __getattr__ when using property

I have a difficulty implementing properties and __getattr__ so that
when an error happens, it is reported correctly. This is my MWE (python 3.6):
class A:
#property
def F(self):
return self.moo # here should be an error
#property
def G(self):
return self.F
def __getattr__(self, name):
print('call of __getattr__ with name =', name)
if name == 'foo':
return 0
raise AttributeError("'{}' object has no attribute '{}'".format(type(self).__name__, name))
a = A()
print(a.G)
The output is as follows:
call of __getattr__ with name = moo
call of __getattr__ with name = F
call of __getattr__ with name = G
Traceback (most recent call last):
line 18 in <module>
print(a.G)
line 15, in __getattr__
raise AttributeError("'{}' object has no attribute '{}'".format(type(self).__name__, name))
AttributeError: 'A' object has no attribute 'G'
But the error that should be raised is:
AttributeError: 'A' object has no attribute 'moo'
I know that properties and attributes in the __dict__ are attempted before __getattr__ is called in an error-free scenario.
It seems incorrect to me that when a property exists but fails, __getattr__ is still attempted instead of letting the error from the property to go through. How can this be avoided?
The initial error message that was generated about failing to get attribute 'foo' has been lost. The final error message 'A' object has no attribute 'G' is particularly misleading and annoying. How to implement __getattr__ in order to see the initial error?
(EDIT) A related problem is simultaneously to achieve that
hasattr(a, 'moo') returns False while hasattr(a, 'G') returns True or raises an exception of the missing 'moo' attribute. Does that make sense?
What is happening?
First, a little heads up as to why this happens. From the doc on __getattr__:
Called when the default attribute access fails with an AttributeError [...] or __get__() of a name property raises AttributeError.
In this case, since you are using #property, we are looking at an AttributeError raised from the __get__ method of the property F when trying to recover self.moo. This is what your call stack looks like at that moment.
__main__
a.G.__get__
a.F.__get__
a.__getattr__ # called with 'moo' <-- this is where the error is raised
The attribute getter protocol sees an error being raised from inside a.F.__get__, it thus fallback on calling a.__getattr__('F') and that despite the fact the error had been raised because of 'moo'. The same then happens for a.G.__get__
This behaviour is considered normal in Python, since the top-most property that failed to return a value is indeed a.G.
Solution
Now what you want is for an AttributeError raised by a __get__ method to bubble up instead of being caught. To do that you need not to have a __getattr__ method.
Thus, in this particular case, what you want to use is __getattribute__ instead.
Of course, with this solution you have to make sure yourself not to override an existing attribute.
class A:
#property
def F(self):
return self.moo # here should be an error
#property
def G(self):
return self.F
def __getattribute__(self, name):
print('call of __getattribute__ with name =', name)
if name == 'foo':
return 0
else:
return super().__getattribute__(name)
Example
A().G
Output
call of __getattribute__ with name = G
call of __getattribute__ with name = F
call of __getattribute__ with name = moo
Traceback (most recent call last):
...
AttributeError: 'A' object has no attribute 'moo'
Here's a hacky solution, replacing the AttributeError with another exception type:
from functools import wraps
def no_AttributeError(f):
#wraps(f)
def wrapped(self):
try:
return f(self)
except AttributeError as e:
raise Exception('AttributeError inside a property getter') from e
return wrapped
class A:
#property
#no_AttributeError
def F(self):
return self.moo # here should be an error
#property
#no_AttributeError
def G(self):
return self.F
def __getattr__(self, name):
print('call of __getattr__ with name =', name)
if name == 'foo':
return 0
raise AttributeError("'{}' object has no attribute '{}'".format(type(self).__name__, name))
a = A()
print(a.G)
This results in the following output:
call of __getattr__ with name = moo
Traceback (most recent call last):
File ".\test_getattr_redir.py", line 7, in wrapped
return f(self)
File ".\test_getattr_redir.py", line 17, in F
return self.moo # here should be an error
File ".\test_getattr_redir.py", line 28, in __getattr__
raise AttributeError("'{}' object has no attribute '{}'".format(type(self).__name__, name))
AttributeError: 'A' object has no attribute 'moo'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File ".\test_getattr_redir.py", line 31, in <module>
print(a.G)
File ".\test_getattr_redir.py", line 7, in wrapped
return f(self)
File ".\test_getattr_redir.py", line 22, in G
return self.F
File ".\test_getattr_redir.py", line 9, in wrapped
raise Exception('AttributeError inside a property getter') from e
Exception: AttributeError inside a property getter
As an addendum, to make it explicit why Python does what it does, here's an excerpt from the documentation:
[__getattr__ is called] when the default attribute access fails with an AttributeError (either __getattribute__() raises an AttributeError because name is not an instance attribute or an attribute in the class tree for self; or __get__() of a name property raises AttributeError). This method should either return the (computed) attribute value or raise an AttributeError exception.
(It looks like you know this but I think it's good to have it written out for other people running into the same issue.)
So that means when self.moo raises an AttributeError, it results in A.__getattr__(a, 'F') being called, which results into another AttributeError
Given the answers above, I have tried the following solution for the case when __getattr__ is already defined by the base class P that we cannot change.
class P:
def __getattr__(self, name):
print('call of __getattr__ with name =', name)
if name == 'foo':
return 0
raise AttributeError("Cannot recover attribute '{}'".format(name))
class A(P):
e = None
#property
def F(self):
return self.moo
#property
def G(self):
return self.F
def __getattr__(self, name):
raise A.e
def __getattribute__(self, name):
try:
return object.__getattribute__(self, name)
except AttributeError as e1:
try:
return P.__getattr__(self, name)
except AttributeError as e2:
A.e = AttributeError(str(e1) + ' -> ' + str(e2))
raise AttributeError
a = A()
print(a.G)
It replicates what python does when looking for attributes: the order of calls and semantics are kept. It only changes the final error message to
AttributeError: 'A' object has no attribute 'moo' -> Cannot recover attribute 'moo' -> Cannot recover attribute 'F' -> Cannot recover attribute 'G'
However, it might be causing more problems in the derived code than it is solving, so I don't know.

How to make a class which has __getattr__ properly pickable?

I extended dict in a simple way to directly access it's values with the d.key notation instead of d['key']:
class ddict(dict):
def __getattr__(self, item):
return self[item]
def __setattr__(self, key, value):
self[key] = value
Now when I try to pickle it, it will call __getattr__ to find __getstate__, which is neither present nor necessary. The same will happen upon unpickling with __setstate__:
>>> import pickle
>>> class ddict(dict):
... def __getattr__(self, item):
... return self[item]
... def __setattr__(self, key, value):
... self[key] = value
...
>>> pickle.dumps(ddict())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in __getattr__
KeyError: '__getstate__'
How do I have to modify the class ddict in order to be properly pickable?
The problem is not pickle but that your __getattr__ method breaks the expected contract by raising KeyError exceptions. You need to fix your __getattr__ method to raise AttributeError exceptions instead:
def __getattr__(self, item):
try:
return self[item]
except KeyError:
raise AttributeError(item)
Now pickle is given the expected signal for a missing __getstate__ customisation hook.
From the object.__getattr__ documentation:
This method should return the (computed) attribute value or raise an AttributeError exception.
(bold emphasis mine).
If you insist on keeping the KeyError, then at the very least you need to skip names that start and end with double underscores and raise an AttributeError just for those:
def __getattr__(self, item):
if isinstance(item, str) and item[:2] == item[-2:] == '__':
# skip non-existing dunder method lookups
raise AttributeError(item)
return self[item]
Note that you probably want to give your ddict() subclass an empty __slots__ tuple; you don't need the extra __dict__ attribute mapping on your instances, since you are diverting attributes to key-value pairs instead. That saves you a nice chunk of memory per instance.
Demo:
>>> import pickle
>>> class ddict(dict):
... __slots__ = ()
... def __getattr__(self, item):
... try:
... return self[item]
... except KeyError:
... raise AttributeError(item)
... def __setattr__(self, key, value):
... self[key] = value
...
>>> pickle.dumps(ddict())
b'\x80\x03c__main__\nddict\nq\x00)\x81q\x01.'
>>> type(pickle.loads(pickle.dumps(ddict())))
<class '__main__.ddict'>
>>> d = ddict()
>>> d.foo = 'bar'
>>> d.foo
'bar'
>>> pickle.loads(pickle.dumps(d))
{'foo': 'bar'}
That pickle tests for the __getstate__ method on the instance rather than on the class as is the norm for special methods, is a discussion for another day.
First of all, I think you may need to distinguish between instance attribute and class attribute.
In Python official document Chapter 11.1.4 about pickling, it says:
instances of such classes whose dict or the result of calling getstate() is picklable (see section The pickle protocol for details).
Therefore, the error message you're getting is when you try to pickle an instance of the class, but not the class itself - in fact, your class definition will just pickle fine.
Now for pickling an object of your class, the problem is that you need to call the parent class's serialization implementation first to properly set things up. The correct code is:
In [1]: import pickle
In [2]: class ddict(dict):
...:
...: def __getattr__(self, item):
...: super.__getattr__(self, item)
...: return self[item]
...:
...: def __setattr__(self, key, value):
...: super.__setattr__(self, key, value)
...: self[key] = value
...:
In [3]: d = ddict()
In [4]: d.name = "Sam"
In [5]: d
Out[5]: {'name': 'Sam'}
In [6]: pickle.dumps(d)
Out[6]: b'\x80\x03c__main__\nddict\nq\x00)\x81q\x01X\x04\x00\x00\x00nameq\x02X\x03\x00\x00\x00Samq\x03s}q\x04h\x02h\x03sb.'

__setattr__ only for names not found in the object's attributes`?

I want to use __setattr__ only when the attribute was not found in the object's attributes, like __getattr__.
Do I really have to use try-except?
def __setattr__(self, name, value):
try:
setattr(super(Clazz, self), name, value)
except AttributeError:
# implement *my* __setattr__
pass
You can use hasattr():
def __setattr__(self, name, value):
if hasattr(super(Clazz, self), name):
setattr(super(Clazz, self), name, value)
else:
# implement *my* __setattr__
pass
There are many times when calling hasattr won't work the way you expect (e.g., you've overridden __getattr__ to always return a value), so another way to set the right attribute in the right place would be something like this:
def __setattr__(self, k, v):
if k in self.__dict__ or k in self.__class__.__dict__:
super(Clazz, self).__setattr__(k, v)
else:
# implement *my* __setattr__
pass
__setattr__, if it exists, is called for every attribute set on the object.
Your example code, though, is rather confusing for me. What are you trying to do with the statement:
setattr(super(Clazz, self), name, value) ??
Set an attribute on self, with self viewed as an instance of its superclass? That makes no sense, because the object is still "self".
On the other hand trying to use "setattr" on the object returned by a call to "super" will always yield an attribute error, regardless if the attribute exists on the superclass or not. That is because super returns not the superclass itself, but a wrapper object that wil fetch attributes there when they are needed - so you can use "hasattr" in the object returned by super, but not setattr. I thought it would behave so, and just tried it on the console :
>>> class A(object):pass
...
>>> class B(A): pass
...
>>> b = B()
>>> super(B,b)
<super: <class 'B'>, <B object>>
>>> setattr(super(B,b), "a", 5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'super' object has no attribute 'a'
>>> A.a = 1
>>> setattr(super(B,b), "a", 5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'super' object has no attribute 'a'
But then, you can just use "hasattr" in the object itself, and proceed like this:
def __setattr__(self, attr, value):
if hasattr(self, value):
#this works because retrieving "__setattr__" from the
# result of the supercall gives the correct "__setattr__" of the superclass.
super(Clazz, self).__setattr__(self, attr, value)
else:
# transform value /or attribute as desired in your code
super(Clazz, self).__setattr__(self, attr, value)

Categories

Resources