Difference between pickling an object piecewise vs all at once?

Difference between pickling an object piecewise vs all at once? - python

Given an arbitrary pythonic object like this:
class ExampleObj(object):
def __init__(self):
self.a = 'a'
self.b = 'b'
self.c = 'c'
obj = ExampleObj()
Is there any functional difference between these two serialization approaches?
Piecewise Pickling
base = type(obj)
name = obj.__class__.__name__
pickled_data = {}
for key,val in obj.__dict__.items():
pickled_data[key] = pickle.dumps(val)
vars = {k : pickle.loads(v) for k,v in pickled_data.items()}
restored = type(name, (base,), vars)
Standard Pickling
restored = pickle.loads( pickle.dumps(obj) )
I can't envision any, but I'm worried there may be some edge case I'm not considering.
(In my application, some objects may not have serializable variables. We were hoping to implement piecewise pickling so we better identify what variables are preventing us from pickling the object)

In the first case, you're creating an instance of type, whereas in the second case, you're creating an instance of the type ExampleObj. To see how the two results are functionally different, I'll name restored_1 the result of your first example, and restored_2 the second.
type(restored_1) # type
type(restored_2) # __main__.ExampleObj
Thus, restored_1 and restored_2 will not be functionally equivalent in the sense that you mention you're looking for.
As a simple illustration, add a method or property to ExampleObj and try to use the restored object from either procedure in various ways.
class ExampleObj(object):
def __init__(self):
self.a = 'a'
self.b = 'b'
self.c = 'c'
def foo(self):
print('bar')
#property
def baz(self):
print(self.a + self.b)
obj = ExampleObj()
After executing your first code, which returns an instance of type:
restored_1.foo() # exception raised because restored_1 is not an ExampleObj instance
restored_1.bar # returns <property at 0x107863138> type
restored_1.__dict__ # returns a mappingproxy object
After executing your second code, which returns an instance of ExampleObj:
restored_2.foo() # bar
restored_2.bar # ab
restored_2.__dict__ # {'a': 'a', 'b': 'b', 'c': 'c'}
If you're looking for a discussion on approaches to see for which instance attrs pickling failed, see this question: How to tell for which object attribute pickle fails?

Related

Python assignment to self in constructor does not make object the same

I am making a constructor in Python. When called with an existing object as its input, it should set the "new" object to that same object. Here is a 10 line demonstration:
class A:
def __init__(self, value):
if isinstance(value, A):
self = value
else:
self.attribute = value
a = A(1)
b = A(a)#a and b should be references to the same object
print("b is a", b is a)#this should be true: the identities should be the same
print("b == a", b == a)#this should be true: the values should be the same
I want the object A(a) constructed from the existing object a to be a. Why is it not? To be clear, I want A(a) to reference the same object as a, NOT a copy.

self, like any other argument, is among the local variables of a function or method. Assignment to the bare name of a local variable never affects anything outside of that function or method, it just locally rebinds that name.
As a comment rightly suggests, it's unclear why you wouldn't just do
b = a
Assuming you have a sound reason, what you need to override is not __init__, but rather __new__ (then take some precaution in __init__ to avoid double initialization). It's not an obvious course so I'll wait for you to explain what exactly you're trying to accomplish.
Added: having clarified the need I agree with the OP that a factory function (ideally, I suggest, as a class method) is better -- and clearer than __new__, which would work (it is a class method after all) but in a less-sharply-clear way.
So, I would code as follows:
class A(object):
#classmethod
def make(cls, value):
if isinstance(value, cls): return value
return cls(value)
def __init__(self, value):
self.attribute = value
Now,
a = A.make(1)
b = A.make(a)
accomplishes the OP's desires, polymorphically over the type of argument passed to A.make.

The only way to make it work exactly as you have it is to implement __new__, the constructor, rather than __init__, the initialiser (the behaviour can get rather complex if both are implemented). It would also be wise to implement __eq__ for equality comparison, although this will fall back to identity comparison. For example:
>>> class A(object):
def __new__(cls, value):
if isinstance(value, cls):
return value
inst = super(A, cls).__new__(cls)
inst.attribute = value
return inst
def __eq__(self, other):
return self.attribute == other.attribute
>>> a = A(1)
>>> b = A(a)
>>> a is b
True
>>> a == b
True
>>> a == A(1)
True # also equal to other instance with same attribute value
You should have a look at the data model documentation, which explains the various "magic methods" available and what they do. See e.g. __new__.

__init__ is an initializer, not a constructor. You would have to mess around with __new__ to do what you want, and it's probably not a good idea to go there.
Try
a = b = A(1)
instead.

If you call a constructor, it's going to create a new object. The simplest thing is to do what hacatu suggested and simply assign b to a's value. If not, perhaps you could have an if statement checking if the value passed in is equal to the object you want referenced and if it is, simply return that item before ever calling the constructor. I haven't tested so I'm not sure if it'd work.

Specifiying a data attribute in multiple instasnces of the same class in python

I have already seen this post and even though the symptoms are similar, the way I am defining my class is different as I am using __init__:
>>> class foo(object):
... def __init__(self,x):
... self.x = x
...
>>>
I next define an instance of this class:
>>> inst1 = foo(10)
>>> inst1.x
10
Now, I would like to copy the same instance into a new variable and then change the value of x:
>>> inst2 = inst1
>>> inst2.x = 20
>>> inst2.x
20
It seems, however, (like a class-level attribute) all data attributes are shared between inst1 and inst2 since changing the value of x for inst2 will also change that for inst1:
>>> inst1.x
20
I do know that an alternative method is to say:
>>> inst2 = foo(20)
However, I don't like to do this because my actual class takes a lot of input arguments out of which I need to change only one or two specific data attribute(s) when creating different instances (i.e., the rest of input arguments remain the same for all instances. Any suggestions is greatly appreciated!

You are not copying the class instance (the object). The following line
>>> inst2 = inst1
copies a reference of inst1. It does not copy the object.
Easy way to confirm this is to look at the result of the builtin id()-function, which results a unique memory address for each python object. The value for both inst2 and inst1 should be the same in this case.
And an easy way to solve it is to use Joran's answer.

class foo(object):
def __init__(self,x):
self.x = x
def copy(self):
return foo(self.x)
foo2 = foo1.copy()
is a pretty safe way to implement it
there is also the builtin copy method
from copy import deepcopy
foo2 = deepcopy(foo1)
if you define a __copy__ method, the copy.copy will use your own __copy__
class foo2:
def __init__(self,val):
self.state = 0
self.val = val
def __copy__(self):
newfoo = foo2(self.val)
newfoo.state = self.state
print "Copied self:",self
return newfoo
from copy import copy
f = foo2(5)
f2 = copy(f) #will do copy we defined
this will make your code a somewhat more generic allowing you to just use the copy method on all objects without worrying about what the object is

Mutable objects in python and constants

I have a class which contains data as attributes and which has a method to return a tuple containing these attributes:
class myclass(object):
def __init__(self,a,b,c):
self.a = a
self.b = b
self.c = c
def tuple(self):
return (self.a, self.b, self.c)
I use this class essentially as a tuple where the items (attributes) can be modified/read through their attribute name. Now I would like to create objects of this class, which would be constants and have pre-defined attribute values, which I could then assign to a variable/mutable object, thereby initializing this variable object's attributes to match the constant object, while at the same time retaining the ability to modify the attributes' values. For example I would like to do this:
constant_object = myclass(1,2,3)
variable_object = constant_object
variable_object.a = 999
Now of course this doesn't work in python, so I am wondering what is the best way to get this kind of functionality?

Now I would like to create objects of this class, which would be constants and have pre-defined attribute values, which I could then assign to a variable/mutable object, thereby initializing this variable object's attributes to match the constant object,
Well, you can't have that. Assignment in Python doesn't initialize anything. It doesn't copy or create anything. All it does is give a new name to the existing value.
If you want to initialize an object, the way to do that in Python is to call the constructor.
So, with your existing code:
new_object = myclass(old_object.a, old_object.b, old_object.c)
If you look at most built-in and stdlib classes, it's a lot more convenient. For example:
a = set([1, 2, 3])
b = set(a)
How do they do that? Simple. Just define an __init__ method that can be called with an existing instance. (In the case of set, this comes for free, because a set can be initialized with any iterable, and sets are iterable.)
If you don't want to give up your existing design, you're going to need a pretty clumsy __init__, but it's at least doable. Maybe this:
_sentinel = object()
def __init__(myclass_or_a, b=_sentinel, c=_sentinel):
if isinstance(a, myclass):
self.a, self.b, self.c = myclass_or_a.a, myclass_or_a.b, myclass_or_a.c
else:
self.a, self.b, self.c = myclass_or_a, b, c
… plus some error handling to check that b is _sentinel in the first case and that it isn't in the other case.
So, however you do it:
constant_object = myclass(1,2,3)
variable_object = myclass(constant_object)
variable_object.a = 999

import copy
class myclass(object):
def __init__(self,a,b,c):
self.a = a
self.b = b
self.c = c
def tuple(self):
return (self.a, self.b, self.c)
constant_object = myclass(1,2,3)
variable_object = copy.deepcopy(constant_object)
variable_object.a = 999
print constant_object.a
print variable_object.a
Output:
1
999

Deepcopying is not entirely necessary in this case, because of the way you've setup your tuple method
class myclass(object):
def __init__(self,a,b,c):
self.a = a
self.b = b
self.c = c
def tuple(self):
return (self.a, self.b, self.c)
constant_object = myclass(1,2,3)
variable_object = myclass(*constant_object.tuple())
variable_object.a = 999
>>> constant_object.a
1
>>> variable_object.a
999
Usually (as others have suggested), you'd want to deepcopy. This creates a brand new object, with no ties to the object being copied. However, given that you are using only ints, deepcopy is overkill. You're better off doing a shallow copy. As a matter of fact, it might even be faster to call the class constructor on the parameters of the object you already have, seeing as these parameters are ints. This is why I suggested the above code.

Python copy-on-write behavior

I'm working on a problem where I'm instantiating many instances of an object. Most of the time the instantiated objects are identical. To reduce memory overhead, I'd like to have all the identical objects point to the same address. When I modify the object, though, I'd like a new instance to be created--essentially copy-on-write behavior. What is the best way to achieve this in Python?
The Flyweight Pattern comes close. An example (from http://codesnipers.com/?q=python-flyweights):
import weakref
class Card(object):
_CardPool = weakref.WeakValueDictionary()
def __new__(cls, value, suit):
obj = Card._CardPool.get(value + suit, None)
if not obj:
obj = object.__new__(cls)
Card._CardPool[value + suit] = obj
obj.value, obj.suit = value, suit
return obj
This behaves as follows:
>>> c1 = Card('10', 'd')
>>> c2 = Card('10', 'd')
>>> id(c1) == id(c2)
True
>>> c2.suit = 's'
>>> c1.suit
's'
>>> id(c1) == id(c2)
True
The desired behavior would be:
>>> c1 = Card('10', 'd')
>>> c2 = Card('10', 'd')
>>> id(c1) == id(c2)
True
>>> c2.suit = 's'
>>> c1.suit
'd'
>>> id(c1) == id(c2)
False
Update: I came across the Flyweight Pattern and it seemed to almost fit the bill. However, I'm open to other approaches.

Do you need id(c1)==id(c2) to be identical, or is that just a demonstration, where the real objective is avoiding creating duplicated objects?
One approach would be to have each object be distinct, but hold an internal reference to the 'real' object like you have above. Then, on any __setattr__ call, change the internal reference.
I've never done __setattr__ stuff before, but I think it would look like this:
class MyObj:
def __init__(self, value, suit):
self._internal = Card(value, suit)
def __setattr__(self, name, new_value):
if name == 'suit':
self._internal = Card(value, new_value)
else:
self._internal = Card(new_value, suit)
And similarly, expose the attributes through getattr.
You'd still have lots of duplicated objects, but only one copy of the 'real' backing object behind them. So this would help if each object is massive, and wouldn't help if they are lightweight, but you have millions of them.

Impossible.
id(c1) == id(c2)
says that c1 and c2 are references to the exact same object. So
c2.suit = 's' is exactly the same as saying c1.suit = 's'.
Python has no way of distinguishing the two (unless you allow introspection of prior call frames, which leads to a dirty hack.)
Since the two assignments are identical, there is no way for Python to know that c2.suit = 's' should cause the name c2 to reference a different object.
To give you an idea of what the dirty hack would look like,
import traceback
import re
import sys
import weakref
class Card(object):
_CardPool = weakref.WeakValueDictionary()
def __new__(cls, value, suit):
obj = Card._CardPool.get(value + suit, None)
if not obj:
obj = object.__new__(cls)
Card._CardPool[value + suit] = obj
obj._value, obj._suit = value, suit
return obj
#property
def suit(self):
return self._suit
#suit.setter
def suit(self, suit):
filename,line_number,function_name,text=traceback.extract_stack()[-2]
name = text[:text.find('.suit')]
setattr(sys.modules['__main__'], name, Card(self._value, suit))
c1 = Card('10', 'd')
c2 = Card('10', 'd')
assert id(c1) == id(c2)
c2.suit = 's'
print(c1.suit)
# 'd'
assert id(c1) != id(c2)
This use of traceback only works with those implementations of Python that uses frames, such as CPython, but not Jython or IronPython.
Another problem is that
name = text[:text.find('.suit')]
is extremely fragile, and would screw up, for example, if the assignment were to look like
if True: c2.suit = 's'
or
c2.suit = (
's')
or
setattr(c2, 'suit', 's')
Yet another problem is that it assumes the name c2 is global. It could just as easily be a local variable (say, inside a function), or an attribute (obj.c2.suit = 's').
I do not know a way to address all the ways the assignment could be made.
In any of these cases, the dirty hack would fail.
Conclusion: Don't use it. :)

This is impossible in your current form. A name (c1 and c2 in your example) is a reference, and you can not simply change the reference by using __setattr__, not to mention all other references to the same object.
The only way this would be possible is something like this:
c1 = c1.changesuit("s")
Where c1.changesuit returns a reference to the (newly created) object. But this only works if each object is referenced by only one name. Alternatively you might be able to do some magic with locals() and stuff like that, but please - don't.

Inspect python class attributes

I need a way to inspect a class so I can safely identify which attributes are user-defined class attributes. The problem is that functions like dir(), inspect.getmembers() and friends return all class attributes including the pre-defined ones like: __class__, __doc__, __dict__, __hash__. This is of course understandable, and one could argue that I could just make a list of named members to ignore, but unfortunately these pre-defined attributes are bound to change with different versions of Python therefore making my project volnerable to changed in the python project - and I don't like that.
example:
>>> class A:
... a=10
... b=20
... def __init__(self):
... self.c=30
>>> dir(A)
['__doc__', '__init__', '__module__', 'a', 'b']
>>> get_user_attributes(A)
['a','b']
In the example above I want a safe way to retrieve only the user-defined class attributes ['a','b'] not 'c' as it is an instance attribute. So my question is... Can anyone help me with the above fictive function get_user_attributes(cls)?
I have spent some time trying to solve the problem by parsing the class in AST level which would be very easy. But I can't find a way to convert already parsed objects to an AST node tree. I guess all AST info is discarded once a class has been compiled into bytecode.

Below is the hard way. Here's the easy way. Don't know why it didn't occur to me sooner.
import inspect
def get_user_attributes(cls):
boring = dir(type('dummy', (object,), {}))
return [item
for item in inspect.getmembers(cls)
if item[0] not in boring]
Here's a start
def get_user_attributes(cls):
boring = dir(type('dummy', (object,), {}))
attrs = {}
bases = reversed(inspect.getmro(cls))
for base in bases:
if hasattr(base, '__dict__'):
attrs.update(base.__dict__)
elif hasattr(base, '__slots__'):
if hasattr(base, base.__slots__[0]):
# We're dealing with a non-string sequence or one char string
for item in base.__slots__:
attrs[item] = getattr(base, item)
else:
# We're dealing with a single identifier as a string
attrs[base.__slots__] = getattr(base, base.__slots__)
for key in boring:
del attrs['key'] # we can be sure it will be present so no need to guard this
return attrs
This should be fairly robust. Essentially, it works by getting the attributes that are on a default subclass of object to ignore. It then gets the mro of the class that's passed to it and traverses it in reverse order so that subclass keys can overwrite superclass keys. It returns a dictionary of key-value pairs. If you want a list of key, value tuples like in inspect.getmembers then just return either attrs.items() or list(attrs.items()) in Python 3.
If you don't actually want to traverse the mro and just want attributes defined directly on the subclass then it's easier:
def get_user_attributes(cls):
boring = dir(type('dummy', (object,), {}))
if hasattr(cls, '__dict__'):
attrs = cls.__dict__.copy()
elif hasattr(cls, '__slots__'):
if hasattr(base, base.__slots__[0]):
# We're dealing with a non-string sequence or one char string
for item in base.__slots__:
attrs[item] = getattr(base, item)
else:
# We're dealing with a single identifier as a string
attrs[base.__slots__] = getattr(base, base.__slots__)
for key in boring:
del attrs['key'] # we can be sure it will be present so no need to guard this
return attrs

Double underscores on both ends of 'special attributes' have been a part of python before 2.0. It would be very unlikely that they would change that any time in the near future.
class Foo(object):
a = 1
b = 2
def get_attrs(klass):
return [k for k in klass.__dict__.keys()
if not k.startswith('__')
and not k.endswith('__')]
print get_attrs(Foo)
['a', 'b']

Thanks aaronasterling, you gave me the expression i needed :-)
My final class attribute inspector function looks like this:
def get_user_attributes(cls,exclude_methods=True):
base_attrs = dir(type('dummy', (object,), {}))
this_cls_attrs = dir(cls)
res = []
for attr in this_cls_attrs:
if base_attrs.count(attr) or (callable(getattr(cls,attr)) and exclude_methods):
continue
res += [attr]
return res
Either return class attribute variabels only (exclude_methods=True) or also retrieve the methods.
My initial tests og the above function supports both old and new-style python classes.
/ Jakob

If you use new style classes, could you simply subtract the attributes of the parent class?
class A(object):
a = 10
b = 20
#...
def get_attrs(Foo):
return [k for k in dir(Foo) if k not in dir(super(Foo))]
Edit: Not quite. __dict__,__module__ and __weakref__ appear when inheriting from object, but aren't there in object itself. You could special case these--I doubt they'd change very often.

Sorry for necro-bumping the thread. I'm surprised that there's still no simple function (or a library) to handle such common usage as of 2019.
I'd like to thank aaronasterling for the idea. Actually, set container provides a more straightforward way to express it:
class dummy: pass
def abridged_set_of_user_attributes(obj):
return set(dir(obj))-set(dir(dummy))
def abridged_list_of_user_attributes(obj):
return list(abridged_set_of_user_attributes(obj))
The original solution using list comprehension is actually two level of loops because there are two in keyword compounded, despite having only one for keyword made it look like less work than it is.

This worked for me to include user defined attributes with __ that might be be found in cls.__dict__
import inspect
class A:
__a = True
def __init__(self, _a, b, c):
self._a = _a
self.b = b
self.c = c
def test(self):
return False
cls = A(1, 2, 3)
members = inspect.getmembers(cls, predicate=lambda x: not inspect.ismethod(x))
attrs = set(dict(members).keys()).intersection(set(cls.__dict__.keys()))
__attrs = {m[0] for m in members if m[0].startswith(f'_{cls.__class__.__name__}')}
attrs.update(__attrs)
This will correctly yield: {'_A__a', '_a', 'b', 'c'}
You can update to clean the cls.__class__.__name__ if you wish

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Difference between pickling an object piecewise vs all at once? - python

Related

Python assignment to self in constructor does not make object the same

Specifiying a data attribute in multiple instasnces of the same class in python

Mutable objects in python and constants

Python copy-on-write behavior

Inspect python class attributes

Categories

Resources