python: need a deepcopy equivalent breaking all shared identity

python: need a deepcopy equivalent breaking all shared identity - python

Due to some constrains I need to create a fresh copy of an object alongwith fresh copies of all its attributes and for attributes of its attributes and so on recursively.
Existing deepcopy() is recursive, but when multiple objects within the tree being copied have the same starting identity, they also have the same ending identity (even though their ending identities don't match their starting identities).
For the following case:
class A:
def __init__(self, x):
self.x = x
v = A(1)
o = [v, v]
copy.deepcopy does following:
dc_o = copy.deepcopy(o)
assert dc_o[0] is not o[0] # new identity from the original
assert dc_o[0] is dc_o[1] # but maintains identity within the copied tree
assert dc_o[0] == dc_o[1] # ...as well as value
But, what I need is:
r_dc_o = recursive_deepcopy(o)
assert r_dc_o[0] is not o[0] # new identity from the original
assert r_dc_o[0] is not r_dc_o[1] # also new identity from elsewhere inside copy
assert r_dc_o[0] == r_dc_o[1] # while maintaining the same value
How can I do this?

Fully automating a recursive deepcopy in a way that didn't memoize objects would be extremely dangerous -- it would mean you couldn't have any kind of objects with internal references preserved in a way that would make those references useful after the copy operation (think about objects with a "parent" link, or objects that link to a shared registry or similar resource). That said, if you really wanted to do this (and you shouldn't -- it will break a great many objects passed through the operation), you can accomplish it by constructing a memo dictionary that ignored attempts at adding keys, and passing that as a second argument to deepcopy().
So, here we are:
import copy
class baddict(dict):
def __setitem__(self, k, v):
pass
class A:
def __init__(self, x):
self.x = x
def __eq__(self, other):
self.x == other.x
v = A(1)
o = [v, v]
r_dc_o = copy.deepcopy(o, baddict())
assert r_dc_o[0] is not r_dc_o[1]
assert r_dc_o[0] == r_dc_o[1]
I'd suggest thinking about why you need this behavior, and trying to come up with a better way to accomplish it. Even a baddict implementation that looked at the value and skipped memoizing only if values were instances of a specific class would be safer than what we're doing here.

Related

Efficient list manipulation in python

I have a large list and regularily need to find an item satisfying a rather complex condition (not equality), i.e. I am forced to check every item in the list until I find one. The conditions change, but some items match more often then others. So I would like to bring the matching item to the front of the list each time I find one, so frequently matching items are found more quickly.
Is there an efficient, pythonic way to do this?
Sequences ([]) are backed by an array, so removing an item somewhere in the middle and prepending it to the array means moving every previous item. That's in O(n) time, not good.
In C you could build a linked list and move the item on your own when found. In Python there is a deque, but afaik you cannot reference the node objects nor have access to .next pointers.
And a self-made linked list is very slow in Python. (In fact it's slower than ordinary linear search without moving any item.)
Sadly, a dict or set finds items based on value equality and thus doesn't fit my problem.
As an illustration, here's the condition:
u, v, w = n.value # list item
if v in g[u] and w in g[v] and u not in g[w]:
...

Consider instead a Pythonic approach. As Ed Post once put it, "The determined Real Programmer can write FORTRAN programs in any language" -- and this generalizes... you're trying to write C in Python and it isn't working well for you:-)
Rather, think of putting an auxiliary dict cache next to the list -- caching the indices where items are found (needs to be invalidated only on "deep" changes to the list's structure). Much simpler and faster...
Probably best done by having list and dict in a small class:
class Seeker(object):
def __init__(self, *a, **k):
self.l = list(*a, **k)
self.d = {}
def find(self, value):
where = self.d.get(value)
if where is None:
self.d[value] = where = self.l.find(value)
return where
def __setitem__(self, index, value):
if value in self.d: del self.d[value]
self.l[index] = value
# and so on for other mutators that invalidate self.d; then,
def __getattr__(self, name):
# delegate everything else to the list
return getattr(self.l, name)
You need only define the mutators you actually need to use -- e.g, if you won't do insert, sort, __delitem__, &c, no need to define those, you can just delegate them to the list.
Added: in Python 3.2 or better, functools.lru_cache can actually do most of the work for you -- use it to decorate find and you'll get a better implementation of caching, with the ability to limit cache size if you so desire. To clear the cache, you'll need to call self.find.cache_clear() at the appropriate spots (where I above use self.d = {}) -- unfortunately, that crucial functionality is not (yet!-) documented (the volunteers updating the docs are not the same ones updating the code...!-)... but, trust me, it's not going to disappear on you:-).
Added: the OP edited the Q to clarify that he's not after "value equality", but rather some more complex set of conditions, exemplified by a predicate such as:
def good_for_g(g, n):
# for some container `g` and item value `n`:
u, v, w = n.value
return v in g[u] and w in g[v] and u not in g[w]
Presumably, then, the desire to bring "good" items towards the front is in turn predicated on their "goodness" being "sticky", i.e, g staying pretty much the same for a while. In this case, one can use the predicate one as a feature extraction and checking function, which forms the key into the dictionary -- so for example:
class FancySeeker(object):
def __init__(self, *a, **k):
self.l = list(*a, **k)
self.d = {}
def _find_in_list(self, predicate):
for i, n in enumerate(self.l):
if predicate(n):
return i
return -1
def find(self, predicate):
where = self.d.get(predicate)
if where is None:
where = self._find_in_list(predicate)
self.d[predicate] = where
return where
and so forth.
So the remaining difficulty is to put predicate in a form suitable for effective indexing into a dict. If predicate is just a function, no problem. But if predicate is a function with parameters, as formed e.g by functools.partial or as a bound method of some instance, that requires a bit of further processing/wrapping to make the indexing work.
Two calls to functools.partial with the same bound argument(s) and function, for example, do not return equal objects -- one has, rather, to inspect the .args and .func of the returned objects to ensure, so to speak, a "singleton" is returned for any given (func, args) pair.
Moreover, if some of the bound arguments are mutable, one needs to use their id in lieu of their hash (or else the raw functools.partial object would not be hashable). It gets even hairier for bound methods, though they can similarly be wrapped into e.g a hashable, "equality adjusted" Predicate class.
Lastly, if these gyrations prove too cumbersome and you really want a fast implementation of a linked list instead, look at https://pypi.python.org/pypi/llist/0.4 -- it's a C-coded implementation of singly and doubly linked lists for Python (for each kind, it implements three types: the list itself, the list node, and the list's iterator).

You can do exactly what you want using deque.rotate.
from collections import deque
class Collection:
"Linked List collection that moves searched for items to the front of the collection"
def __init__(self, seq):
self._deque = deque(seq)
def __contains__(self, target):
for i, item in enumerate(self._deque):
if item == target:
self._deque.rotate(i)
self._deque.popleft()
self._deque.rotate(-i+1)
self._deque.appendleft(item)
return True
return False
def __str__(self):
return "Collection({})".format(str(self._deque))
c = Collection(range(10))
print(c)
print("5 in d:", 5 in c)
print(c)
Gives the following output:
Collection(deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))
5 in c: True
Collection(deque([5, 0, 1, 2, 3, 4, 6, 7, 8, 9]))

Most efficient way of comparing the contents of two class instances in python

I'm looking for the most efficient way of comparing the contents of two class instances. I have a list containing these class instances, and before appending to the list I want to determine if their property values are the same. This may seem trivial to most, but after perusing these forums I wasn't able specific to what I'm trying to do. Also note that I don't have an programming background.
This is what I have so far:
class BaseObject(object):
def __init__(self, name=''):
self._name = name
def __repr__(self):
return '<{0}: \'{1}\'>'.format(self.__class__.__name__, self.name)
def _compare(self, other, *attributes):
count = 0
if isinstance(other, self.__class__):
if len(attributes):
for attrib in attributes:
if (attrib in self.__dict__.keys()) and (attrib in other.__dict__.keys()):
if self.__dict__[attrib] == other.__dict__[attrib]:
count += 1
return (count == len(attributes))
else:
for attrib in self.__dict__.keys():
if (attrib in self.__dict__.keys()) and (attrib in other.__dict__.keys()):
if self.__dict__[attrib] == other.__dict__[attrib]:
count += 1
return (count == len(self.__dict__.keys()))
def _copy(self):
return (copy.deepcopy(self))
Before adding to my list, I'd do something like:
found = False
for instance in myList:
if instance._compare(newInstance):
found = True
Break
if not found: myList.append(newInstance)
However I'm unclear whether this is the most efficient or python-ic way of comparing the contents of instances of the same class.

Implement a __eq__ special method instead:
def __eq__(self, other, *attributes):
if not isinstance(other, type(self)):
return NotImplemented
if attributes:
d = float('NaN') # default that won't compare equal, even with itself
return all(self.__dict__.get(a, d) == other.__dict__.get(a, d) for a in attributes)
return self.__dict__ == other.__dict__
Now you can just use:
if newInstance in myList:
and Python will automatically use the __eq__ special method to test for equality.
In my version I retained the ability to pass in a limited set of attributes:
instance1.__eq__(instance2, 'attribute1', 'attribute2')
but using all() to make sure we only test as much as is needed.
Note that we return NotImplemented, a special singleton object to signal that the comparison is not supported; Python will ask the other object if it perhaps supports equality testing instead for that case.

You can implement the comparison magic method __eq__(self, other) for your class, then simply do
if instance == newInstance:
As you apparently don't know what attributes your instance will have, you could do:
def __eq__(self, other):
return isinstance(other, type(self)) and self.__dict__ == other.__dict__

Your method has one major flaw: if you have reference cycles with classes that both derive from BaseObject, your comparison will never finish and die with a stack overflow.
In addition, two objects of different classes but with the same attribute values compare as equal. Trivial example: any instance of BaseObject with no attributes will compare as equal to any instance of a BaseObject subclass with no attributes (because if issubclass(C, B) and a is an instance of C, then isinstance(a, B) returns True).
Finally, rather than writing a custom _compare method, just call it __eq__ and reap all the benefits of now being able to use the == operator (including contain testing in lists, container comparisons, etc.).
As a matter of personal preference, though, I'd stay away from that sort-of automatically-generated comparison, and explicitly compare explicit attributes.

Python: Passing mutable(?) object to method

I'm trying to implement a class with a method that calls another method with an object that's part of the class where the lowest method mutates the object. My implementation is a little more complicated, so I'll post just some dummy code so you can see what I'm talking about:
class test:
def __init__(self,list):
self.obj = list
def mult(self, x, n):
x = x*n
def numtimes(self, n):
self.mult(self.obj, n)
Now, if I create an object of this type and run the numtimes method, it won't update self.obj:
m = test([1,2,3,4])
m.numtimes(3)
m.obj #returns [1,2,3,4]
Whereas I'd like it to give me [1,2,3,4,1,2,3,4,1,2,3,4]
Basically, I need to pass self.obj to the mult method and have it mutate self.obj so that when I call m.obj, I'll get [1,2,3,4,1,2,3,4,1,2,3,4] instead of [1,2,3,4].
I feel like this is just a matter of understanding how python passes objects as arguments to methods (like it's making a copy of the object, and instead I need to use a pointer), but maybe not. I'm new to python and could really use some help here.
Thanks in advance!!

Allow me to take on the bigger subject of mutability.
Lists are mutable objects, and support both mutable operations, and immutable operations. That means, operations that change the list in-place, and operations that return a new list. Tuples, for contrast, only are only immutable.
So, to multiply a list, you can choose two methods:
a *= b
This is a mutable operation, that will change 'a' in-place.
a = a * b
This is an immutable operation. It will evaluate 'a*b', create a new list with the correct value, and assign 'a' to that new list.
Here, already, lies a solution to your problem. But, I suggest you read on a bit. When you pass around lists (and other objects) as parameters, you are only passing a new reference, or "pointer" to that same list. So running mutable operations on that list will also change the one that you passed. The result might be a very subtle bug, when you write:
>>> my_list = [1,2,3]
>>> t = test(my_list)
>>> t.numtimes(2)
>>> my_list
[1,2,3,1,2,3] # Not what you intended, probably!
So here's my final recommendation. You can choose to use mutable operations, that's fine. But then create a new copy from your arguments, as such:
def __init__(self,l):
self.obj = list(l)
OR use immutable operations, and reassign them to self:
def mult(self, x, n):
self.x = x*n
Or do both, there's no harm in being extra safe :)

The multiplication x * n creates a new instance and does not alter the existing list. See here:
a = [1]
print (id (a) )
a = a * 2
print (id (a) )
This should work:
class test:
def __init__(self,list):
self.obj = list
def mult(_, x, n):
x *= n
def numtimes(self, n):
self.mult(self.obj, n)

How to remove duplicates in set for objects?

I have set of objects:
class Test(object):
def __init__(self):
self.i = random.randint(1,10)
res = set()
for i in range(0,1000):
res.add(Test())
print len(res) = 1000
How to remove duplicates from set of objects ?
Thanks for answers, it's work:
class Test(object):
def __init__(self, i):
self.i = i
# self.i = random.randint(1,10)
# self.j = random.randint(1,20)
def __keys(self):
t = ()
for key in self.__dict__:
t = t + (self.__dict__[key],)
return t
def __eq__(self, other):
return isinstance(other, Test) and self.__keys() == other.__keys()
def __hash__(self):
return hash(self.__keys())
res = set()
res.add(Test(2))
...
res.add(Test(8))
result: [2,8,3,4,5,6,7]
but how to save order ? Sets not support order. Can i use list instead set for example ?

Your objects must be hashable (i.e. must have __eq__() and __hash__() defined) for sets to work properly with them:
class Test(object):
def __init__(self):
self.i = random.randint(1, 10)
def __eq__(self, other):
return self.i == other.i
def __hash__(self):
return self.i
An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() or __cmp__() method). Hashable objects which compare equal must have the same hash value.
Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.
If you have several attributes, hash and compare a tuple of them (thanks, delnan):
class Test(object):
def __init__(self):
self.i = random.randint(1, 10)
self.k = random.randint(1, 10)
self.j = random.randint(1, 10)
def __eq__(self, other):
return (self.i, self.k, self.j) == (other.i, other.k, other.j)
def __hash__(self):
return hash((self.i, self.k, self.j))

Your first question is already answered by Pavel Anossov.
But you have another question:
but how to save order ? Sets not support order. Can i use list instead set for example ?
You can use a list, but there are a few downsides:
You get the wrong interface.
You don't get automatic handling of duplicates. You have to explicitly write if foo not in res: res.append(foo). Obviously, you can wrap this up in a function instead of writing it repeatedly, but it's still extra work.
It's going to be a lot less efficient if the collection can get large. Basically, adding a new element, checking whether an element already exists, etc. are all going to be O(N) instead of O(1).
What you want is something that works like an ordered set. Or, equivalently, like a list that doesn't allow duplicates.
If you do all your adds first, and then all your lookups, and you don't need lookups to be fast, you can get around this by first building a list, then using unique_everseen from the itertools recipes to remove duplicates.
Or you could just keep a set and a list or elements by order (or a list plus a set of elements seen so far). But that can get a bit complicated, so you might want to wrap it up.
Ideally, you want to wrap it up in a type that has exactly the same API as set. Something like an OrderedSet akin to collections.OrderedDict.
Fortunately, if you scroll to the bottom of that docs page, you'll see that exactly what you want already exists; there's a link to an OrderedSet recipe at ActiveState.
So, copy it, paste it into your code, then just change res = set() to res = OrderedSet(), and you're done.

I think you can easily do what you want with a list as you asked in your first post since you defined the eq operator :
l = []
if Test(0) not in l :
l.append(Test(0))
My 2 cts ...

Pavel Anossov's answer is great for allowing your class to be used in a set with the semantics you want. However, if you want to preserve the order of your items, you'll need a bit more. Here's a function that de-duplicates a list, as long as the list items are hashable:
def dedupe(lst):
seen = set()
results = []
for item in lst:
if item not in seen:
seen.add(item)
results.append(item)
return results
A slightly more idiomatic version would be a generator, rather than a function that returns a list. This gets rid of the results variable, using yield rather than appending the unique values to it. I've also renamed the lst parameter to iterable, since it will work just as well on any iterable object (such as another generator).
def dedupe(iterable):
seen = set()
for item in iterable:
if item not in seen:
seen.add(item)
yield item

Hierarchy / Flyweight / Instancing Problem in Python

Here is the problem I am trying to solve, (I have simplified the actual problem, but this should give you all the relevant information). I have a hierarchy like so:
1.A
1.B
1.C
2.A
3.D
4.B
5.F
(This is hard to illustrate - each number is the parent, each letter is the child).
Creating an instance of the 'letter' objects is expensive (IO, database costs, etc), so should only be done once.
The hierarchy needs to be easy to navigate.
Children in the hierarchy need to have just one parent.
Modifying the contents of the letter objects should be possible directly from the objects in the hierarchy.
There needs to be a central store containing all of the 'letter' objects (and only those in the hierarchy).
'letter' and 'number' objects need to be possible to create from a constructor (such as Letter(**kwargs) ).
It is perfectably acceptable to expect that when a letter changes from the hierarchy, all other letters will respect the same change.
Hope this isn't too abstract to illustrate the problem.
What would be the best way of solving this? (Then I'll post my solution)
Here's an example script:
one = Number('one')
a = Letter('a')
one.addChild(a)
two = Number('two')
a = Letter('a')
two.addChild(a)
for child in one:
child.method1()
for child in two:
print '%s' % child.method2()

A basic approach will use builtin data types. If I get your drift, the Letter object should be created by a factory with a dict cache to keep previously generated Letter objects. The factory will create only one Letter object for each key.
A Number object can be a sub-class of list that will hold the Letter objects, so that append() can be used to add a child. A list is easy to navigate.
A crude outline of a caching factory:
>>> class Letters(object):
... def __init__(self):
... self.cache = {}
... def create(self, v):
... l = self.cache.get(v, None)
... if l:
... return l
... l = self.cache[v] = Letter(v)
... return l
>>> factory=Letters()
>>> factory.cache
{}
>>> factory.create('a')
<__main__.Letter object at 0x00EF2950>
>>> factory.create('a')
<__main__.Letter object at 0x00EF2950>
>>>
To fulfill requirement 6 (constructor), here is
a more contrived example, using __new__, of a caching constructor. This is similar to Recipe 413717: Caching object creation .
class Letter(object):
cache = {}
def __new__(cls, v):
o = cls.cache.get(v, None)
if o:
return o
else:
o = cls.cache[v] = object.__new__(cls)
return o
def __init__(self, v):
self.v = v
self.refcount = 0
def addAsChild(self, chain):
if self.refcount > 0:
return False
self.refcount += 1
chain.append(self)
return True
Testing the cache functionality
>>> l1 = Letter('a')
>>> l2 = Letter('a')
>>> l1 is l2
True
>>>
For enforcing a single parent, you'll need a method on Letter objects (not Number) - with a reference counter. When called to perform the addition it will refuse addition if the counter is greater than zero.
l1.addAsChild(num4)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.