__getstate__ method not being called when pickling a subclass of set - python

From what I understand, when calling pickle.dumps on an object, it will call the object's __getstate__ method (if it has one) to determine what to pickle.
If I create a class such as:
class DictClass(dict):
def __getstate__(self):
print "pickling"
return self
I get this result:
>>> pickle.dumps(DictClass())
pickling
'ccopy_reg\n_reconstructor\np0...'
I can do the same thing, replacing 'dict' with 'list':
class ListClass(list):
def __getstate__(self):
print "pickling"
return self
>>> pickle.dumps(ListClass())
pickling
'ccopy_reg\n_reconstructor\np0...'
But if I use 'set', something different happens:
class SetClass(set):
def __getstate__(self):
print "pickling"
return self
>>> pickle.dumps(SetClass())
'c__main__\nSetClass...'
The __getstate__ method doesn't get called. Why is this, and is it possible to specify what part of a subclass of a set to pickle?

list does not implement __reduce__(), whereas set does:
>>> list().__reduce__()
...
TypeError: can't pickle list objects
>>> set().__reduce__()
(<type 'set'>, ([],), None)
It's the last tuple in the above example that gets pickled, so SetClass.__getstate__() never enters the picture.

Related

calling a function from class vs object in python

I was actually going through descriptors python docs and came across this example
>>> class D(object):
def f(self, x):
return x
>>> d = D()
>>> D.__dict__['f'] # Stored internally as a function
<function f at 0x00C45070>
>>> id(D.__dict__['f']) # Memory location
49294384
>>> D.f # Get from a class becomes an unbound method
<unbound method D.f>
>>> id(D.f )
48549440
>>> d.f # Get from an instance becomes a bound method
<bound method D.f of <__main__.D object at 0x00B18C90>>
>>> id(d.f)
48549440
So from the above code, I understood that python stores the function definition/declaration of a class as a separate object internally inside class __dict__ variable, when we access directly using __dict__ variable it has memory location as 49294384
But why does it is showing as different function/method object with different memory location 48549440 when accessed through Class or Object? like D.f and d.f
was it not supposed to refer to the same object when we access using the __dict__ variable?. If so why?
D.f is a function taking one argument (self)
x = D.f
x(d)
d.f is a "bound method", i.e. a function where the self argument has already been filled in. You can say
x = d.f
x()
Therefor it cannot be the same thing as D.f, and has to be on a different location.
xtofi explained the difference between descriptor objects (unbound) and bound methods.
I think the missing part is that bound methods are not kept in memory, and they are actually created every time you access them. (You may get the same memory location, but it's not the same object).
Why?
Because on every call on a descriptor may result in a different behavior. Here is an example to explain this idea.
class A(object):
i = 'a'
#property
def name(self):
if A.i == 'a':
return self.fa()
else:
return self.fb()
def fa(self):
print 'one function'
def fb(self):
print 'another function'
Calling the function name on an instance of A results in different function calls.

Why is cls.__dict__[meth] different than getattr(cls, meth) for classmethods/staticmethods?

I've never seen anything else work like this before.
Is there anything else that does this?
>>> class NothingSpecial:
#classmethod
def meth(cls): pass
>>> NothingSpecial.meth
<bound method classobj.meth of <class __main__.NothingSpecial at 0x02C68C70>>
>>> NothingSpecial.__dict__['meth']
<classmethod object at 0x03F15FD0>
>>> getattr(NothingSpecial, 'meth')
<bound method NothingSpecial.meth of <class '__main__.NothingSpecial'>>
>>> object.__getattribute__(NothingSpecial, 'meth')
<classmethod object at 0x03FAFE90>
>>> type.__getattribute__(NothingSpecial, 'meth')
<bound method NothingSpecial.meth of <class '__main__.NothingSpecial'>>
Getattr Uses Descriptor Logic
The main difference is that the dictionary lookup does no extra processing while the attribute fetch incorporates extra logic (see my Descriptor How-To Guide for all the details).
There Are Two Different Underlying Methods
1) The call NothingSpecial.__dict__['meth'] uses the square brackets operator which dispatches to dict.__getitem__ which does a simple hash table lookup or raises KeyError if not found.
2) The call NothingSpecial.meth uses the dot operator which dispatches to type.__getattribute__ which does a simple lookup followed by a special case for descriptors. If the lookup fails, an AttributeError is raised.
How It Works
The overall logic is documented here and here.
In general, a descriptor is an object attribute with “binding
behavior”, one whose attribute access has been overridden by methods
in the descriptor protocol: __get__(), __set__(), and/or __delete__(). If
any of those methods are defined for an object, it is said to be a
descriptor.
The default behavior for attribute access is to get, set, or delete
the attribute from an object’s dictionary. For instance, a.x has a
lookup chain starting with a.__dict__['x'], then
type(a).__dict__['x'], and continuing through the base classes of
type(a) excluding metaclasses.
However, if the looked-up value is an object defining one of the
descriptor methods, then Python may override the default behavior and
invoke the descriptor method instead. Where this occurs in the
precedence chain depends on which descriptor methods were defined and
how they were called
Hope you've found all of this to be helpful. The kind of exploring you're doing is a great way to learn about Python :-)
P.S. You might also enjoy reading the original Whatsnew in Python 2.2 entry for descriptors or looking at PEP 252 where Guido van Rossum originally proposed the idea.
object.__getattribute__(NothingSpecial, 'meth')
and
NothingSpecial.__dict__['meth']
return the same object in this case. You can quickly check it by doing:
NothingSpecial.__dict__['meth'] is object.__getattribute__(NothingSpecial, 'meth')
$True
Both of them points to the same descriptor object
on the other hand:
object.__getattribute__(NothingSpecial, 'meth') is getattr(NothingSpecial, 'meth')
$False
Basically, they aren't they are not the same object nand the same type:
type(object.__getattribute__(NothingSpecial, 'meth'))
$<class 'classmethod'>
type(getattr(NothingSpecial, 'meth'))
$<class 'method'>
So the answer is that getattr will automagically invoke an object's __get__ method if it has one, whereas object.__getattribute__ and the objects __dict__ lookup do not. The following function proves that:
class Nothing:
#classmethod
def a(cls):
return cls()
#staticmethod
def b():
return 'b'
def c(self):
return 'c'
def gitter(obj, name):
value = object.__getattribute__(obj, name)
if hasattr(value, '__get__'):
if isclass(obj):
instance, cls = None, obj
else:
instance, cls = obj, type(obj)
return value.__get__(instance, cls)
return value
>>> gitter(Nothing, 'a')()
<__main__.Nothing object at 0x03E97930>
>>> gitter(Nothing, 'b')()
'b'
>>> gitter(Nothing(), 'c')()
'c'
However, gitter(Nothing(), 'b') doesn't work currently because it's not detecting that the objtype default value is None, but this is enough.

Use of object <type 'object'>

In python, we all know that there are the general object type, of which every class is naturally a heir.
If we type object in console, it returns type <'object'>.
>>> object
<type 'object'>
So far, so good.
It is also possible to instantiate a variable of type object
>>> var = object()
>>> var
<object object at 0x021684D8>
My question is:
Is there a reason for the object type being instantiable? Is there any use for this? Or is it just made for the sake of formality?
There is at least one practical reason to instantiate object: it's a quick, easy, and idiomatic way to get a value that will not compare equal to any other value (and will raise an exception of you try an ordering comparison). This makes it a perfect sentinel. For example:
last = object()
for value in iterable:
if value != last:
yield value
last = value
If you used None as a sentinel value, it would do the wrong thing if iterable started with None, but there's no way iterable can start with the brand-new object instance you just created.
However, even if there were no practical use for object instances, consider that every instance of a subclass of an object is an object, so conceptually you're creating object instances all the time--and practically, if you're designing a subclass that uses super() in its __init__ or __new__ this is no longer just conceptual. If object.__init__ were unavailable, or raised, you'd have to create a new do-nothing root class for any hierarchy of cooperative classes just to have a do-nothing base implementation.
Try it
class a:
def __init__(self):
self.num=0
class b:
def __init__(self):
self.num=1
A=a()
B=b()
type(A)
type(B)
It will return it:
<class '__main__.a'>
<class '__main__.b'>
This tells what kind of object is that.

Difference between calling a method and accessing an attribute

I'm very new to Python, and I'm using Python 3.3.1.
class Parent: # define parent class
parentAttr = 100
age = 55
def __init__(self):
print ("Calling parent constructor")
def setAttr(self, attr):
Parent.parentAttr = attr
class Child(Parent):
def childMethod(self):
print ('Calling child method')
Now I'll create
c=child
c.[here every thing will appear methods and attr (age,setAttr)]
How can I distinguish between methods and atrributes? I mean, when do I use c.SetAtrr(Argument), and c.SetAtrr=value?
Methods are attributes too. They just happen to be callable objects.
You can detect if an object is callable by using the callable() function:
>>> def foo(): pass
...
>>> callable(foo)
True
>>> callable(1)
False
When you call a method, you look up the attribute (a getattr() operation) and then call the result:
c.setAttr(newvalue)
is two steps; finding the attribute (which in this case looks up the attribute on the class, and treats it as a descriptor), then calls the resulting object, a method.
When you assign to an attribute, you rebind that name to a new value:
c.setAttr = 'something else'
would be a setattr() operation.
If you wanted to intercept getting and setting attributes on instances of your class, you could provide the attribute access hooks, __getattr__, __setattr__ and __delattr__.
If you wanted to add a method to an instance, you would have to treat the function as a descriptor object, which produces a method object:
>>> class Foo: pass
...
>>> foo = Foo() # instance
>>> def bar(self): pass
...
>>> bar
<function bar at 0x10b85a320>
>>> bar.__get__(foo, Foo)
<bound method Foo.bar of <__main__.Foo instance at 0x10b85b830>>
The return value of function.__get__(), when given an instance and a class, is a bound method. Calling that method will call the underlying function with self bound to the instance.
And speaking of descriptors, the property() function returns a descriptor too, making it possible to have functions that behave like attributes; they can intercept the getattr(), setattr() and delattr() operations for just that attribute and turn it into a function call:
>>> class Foo:
... #property
... def bar(self):
... return "Hello World!"
...
>>> foo = Foo()
>>> foo.bar
"Hello World!"
Accessing .bar invoked the bar property get hook, which then calls the original bar method.
In almost all situations, you are not going to need the callable() function; you document your API, and provide methods and attributes and the user of your API will figure it out without testing each and every attribute to see if it is callable. With properties, you have the flexibility of providing attributes that are really callables in any case.

Why does pickle __getstate__ accept as a return value the very instance it required __getstate__ to pickle in the first place?

I was going to ask "How to pickle a class that inherits from dict and defines __slots__". Then I realized the utterly mind-wrenching solution in class B below actually works...
import pickle
class A(dict):
__slots__ = ["porridge"]
def __init__(self, porridge): self.porridge = porridge
class B(A):
__slots__ = ["porridge"]
def __getstate__(self):
# Returning the very item being pickled in 'self'??
return self, self.porridge
def __setstate__(self, state):
print "__setstate__(%s) type(%s, %s)" % (state, type(state[0]),
type(state[1]))
self.update(state[0])
self.porridge = state[1]
Here is some output:
>>> saved = pickle.dumps(A(10))
TypeError: a class that defines __slots__ without defining __getstate__ cannot be pickled
>>> b = B('delicious')
>>> b['butter'] = 'yes please'
>>> loaded = pickle.loads(pickle.dumps(b))
__setstate__(({'butter': 'yes please'}, 'delicious')) type(<class '__main__.B'>, <type 'str'>)
>>> b
{'butter': 'yes please'}
>>> b.porridge
'delicious'
So basically, pickle cannot pickle a class that defines __slots__ without also defining __getstate__. Which is a problem if the class inherits from dict - because how do you return the content of the instance without returning self, which is the very instance pickle is already trying to pickle, and can't do so without calling __getstate__. Notice how __setstate__ is actually receiving an instance B as part of the state.
Well, it works... but can someone explain why? Is it a feature or a bug?
Maybe I'm a bit late to the party, but this question didn't get an answer that actually explains what's happening, so here we go.
Here's a quick summary for those who don't want to read this whole post (it got a bit long...):
You don't need to take care of the contained dict instance in __getstate__() -- pickle will do this for you.
If you include self in the state anyway, pickle's cycle detection will prevent an infinite loop.
Writing __getstate__() and __setstate__() methods for custom classes derived from dict
Let's start with the right way to write the __getstate__() and __setstate__() methods of your class. You don't need to take care of pickling the contents of the dict instance contained in B instances -- pickle knows how to deal with dictionaries and will do this for you. So this implementation will be enough:
class B(A):
__slots__ = ["porridge"]
def __getstate__(self):
return self.porridge
def __setstate__(self, state):
self.porridge = state
Example:
>>> a = B("oats")
>>> a[42] = "answer"
>>> b = pickle.loads(pickle.dumps(a))
>>> b
{42: 'answer'}
>>> b.porridge
'oats'
What's happening in your implementation?
Why does your implementation work as well, and what's happening under the hood? That's a bit more involved, but -- once we know that the dictionary gets pickled anyway -- not too hard to figure out. If the pickle module encounters an instance of a user-defined class, it calls the __reduce__() method of this class, which in turn calls __getstate__() (actually, it usually calls the __reduce_ex__() method, but that does not matter here). Let's define B again as you originally did, i.e. using the "recurisve" definition of __getstate__(), and let's see what we get when calling __reduce__() for an instance of B now:
>>> a = B("oats")
>>> a[42] = "answer"
>>> a.__reduce__()
(<function _reconstructor at 0xb7478454>,
(<class '__main__.B'>, <type 'dict'>, {42: 'answer'}),
({42: 'answer'}, 'oats'))
As we can see from the documentation of __reduce__(), the method returns a tuple of 2 to 5 elements. The first element is a function that will be called to reconstruct the instance when unpickling, the second element is the tuple of arguments that will be passed to this function, and the third element is the return value of __getstate__(). We can already see that the dictionary information is included twice. The function _reconstructor() is an internal function of the copy_reg module that reconstructs the base class before __setstate__() is called when unpickling. (Have a look at the source code of this function if you like -- it's short!)
Now the pickler needs to pickle the return value of a.__reduce__(). It basically pickles the three elements of this tuple one after the other. The second element is a tuple again, and its items are also pickled one after the other. The third item of this inner tuple (i.e. a.__reduce__()[1][2]) is of type dict and is pickled using the internal pickler for dictionaries. The third element of the outer tuple (i.e. a.__reduce__()[2]) is also a tuple again, consisting of the B instance itself and a string. When pickling the B instance, the cycle detection of the pickle module kicks in: pickle realises this exact instance has already been dealt with, and only stores a reference to its id() instead of really pickling it -- this is why no infinte loop occurs.
When unpickling this mess again, the unpickler first reads the reconstruction function and its arguments from the stream. The function is called, resulting in an B instance with the dictionary part already initialised. Next, the unpickler reads the state. It encounters a tuple consisting of a reference to an already unpickled object -- namely our instance of B -- and a string, "oats". This tuple now is passed to B.__setstate__(). The first element of state and self are the same object now, as can be seen by adding the line
print self is state[0]
to your __setstate__() implementation (it prints True!). The line
self.update(state[0])
consequently simply updates the instance with itself.
Here's the thinking as I understand it. If your class uses __slots__, it's a way to gaurantee that there aren't any unexpected attributes. Unlike a regular Python object, one that's implemented with slots cannot have attributes dynamically added to it.
When Python unserializes an object with __slots__, it doesn't want to just make an assumption that whatever attributes were in the serialized version are compatible with your runtime class. So it punts that off to you, and you can implement __getstate__ and __setstate__.
But the way you implemented your __getstate__ and__setstate__, you appear to be circumventing that check. Here's the code that's raising that exception:
try:
getstate = self.__getstate__
except AttributeError:
if getattr(self, "__slots__", None):
raise TypeError("a class that defines __slots__ without "
"defining __getstate__ cannot be pickled")
try:
dict = self.__dict__
except AttributeError:
dict = None
else:
dict = getstate()
In a round about way, you're telling the Pickle module to set its objections aside and serialize and unserialize your objects as normal.
That may or may not be a good idea -- I'm not sure. But I think that could come back to bite you if, for example, you change your class definition and then unserialize an object with a different set of attributes than what your runtime class expects.
That's why, when using slots especially, your __getstate__ and __getstate__ should be more explicit. I would be explicit and be clear that you're just sending the dictionary key/values back and forth, like this:
class B(A):
__slots__ = ["porridge"]
def __getstate__(self):
return dict(self), self.porridge
def __setstate__(self, state):
self.update(state[0])
self.porridge = state[1]
Notice the dict(self) -- that casts your object to a dict, which should make sure that the first element in your state tuple is only your dictionary data.

Categories

Resources