I want to use a WeakKeyDictionary where the keys are tuples of other objects, e.g. of type Tuple[A,B], in such a way:
# a,b,c defined somewhere
d = WeakKeyDictionary()
d[(a, b)] = c
This does not work because: TypeError: cannot create weak reference to 'tuple' object. But even if it could create a weak ref to a tuple, you have the other problem: The tuple object here ((a,b)) is not referenced anymore, i.e. after this code, the dict d is empty again.
In principle however, having such a weak key dict to tuples should be possible. I think the behavior should be non-ambiguous and straight-forward: Whenever any part of the key gets deleted (a or b), the whole key gets removed.
How can I get this? Any easy way using the existing builtins? Or do I need to implement my own? Or is there some other library providing this?
You need to "implement your own" in this case - and the problem is that you will need an auxiliar dictionary with stand-alone keys - so that when the time comes to del one the paired keys, you are able to find them back.
The implementation of WeakrefDicts themselves are pretty simple and straightforward, using collection.abc helpers for mappings - you could even pick the code from there and evolve it- but I think a minimal one can be done from scratch like bellow.
To be clear: this is a fresh implementation of weak-key dicts, doing exactly what you asked in the question: the keys should e any sequence of weak-referenceable objects, and when any object from the sequence is destroyed, the item is cleared in the dictionary. This is done using the callback mechanism of low-level weakref.ref objects.
import weakref
from collections.abc import MutableMapping
class MultiWeakKeyDict(MutableMapping):
def __init__(self, **kw):
self.data = {}
self.helpers = {}
self.update(**kw)
def _remove(self, wref):
for data_key in self.helpers.pop(wref, ()):
try:
del self.data[data_key]
except KeyError:
pass
def _build_key(self, keys):
return tuple(weakref.ref(item, self._remove) for item in keys)
def __setitem__(self, keys, value):
weakrefs = self._build_key(keys)
for item in weakrefs:
self.helpers.setdefault(item, set()).add(weakrefs)
self.data[weakrefs] = value
def __getitem__(self, keys):
return self.data[self._build_key(keys)]
def __delitem__(self, keys):
del self.data[self._build_key(keys)]
def __iter__(self):
for key in self.data:
yield tuple(item() for item in key)
def __len__(self):
return len(self.data)
def __repr__(self):
return f"{self.__class__.__name__}({', '.join('{!r}:{!r}'.format(k, v) for k, v in self.items())})"
And working:
In [142]: class A:
...: def __repr__(s): return "A obj"
...:
In [143]: a, b, c = [A() for _ in (1,2,3)]
In [144]: d = MultiWeakKeyDict()
In [145]: d[a,b] = 1
In [146]: d[b,c] = 2
In [147]: d
Out[147]: MultiWeakKeyDict((A obj, A obj):1.(A obj, A obj):2)
In [148]: len(d)
Out[148]: 2
In [149]: del b
In [150]: len(d)
Out[150]: 0
In [151]: d
Out[151]: MultiWeakKeyDict()
Related
How to increment d['a']['b']['c'][1][2][3] if d is defaultdict of defaultdict without code dublication?
from collections import defaultdict
nested_dict_type = lambda: defaultdict(nested_dict_type)
nested_dict = nested_dict_type()
# incrementation
if type(nested_dict['a']['b']['c']['d'][1][2][3][4][5][6]) != int:
nested_dict['a']['b']['c']['d'][1][2][3][4][5][6] = 0
nested_dict['a']['b']['c']['d'][1][2][3][4][5][6] += 1 # ok, now it contains 1
Here we can see that we duplicated (in the code) a chain of keys 3 times.
Question: Is it possible to write a function inc that will take nested_dict['a']['b']...[6] and do the same job as above? So:
def inc(x):
if type(x) != int:
x = 0
x += 1
inc(nested_dict['a']['b']['c']['d'][1][2][3][4][5][6]) # ok, now it contains 1
Update (20 Aug 2018):
There is still no answer to the question. It's clear that there are options "how to do what I want", but the question is straightforward: there is "value", we pass it to a function, function modifies it. It looks that it's not possible.
Just a value, without any "additional keys", etc.
If it is so, can we make an answer more generic?
Notes:
What is defaultdict of defaultdicts - SO.
This question is not about "storing of integers in a defaultdict", so I'm not looking for a hierarchy of defaultdicts with an int type at the leaves.
Assume that type (int in the examples) is known in advance / can be even parametrized (including the ability to perform += operator) - the question is how to dereference the object, pass it for modification and store back in the context of defaultdict of defaultdicts.
Is the answer to this question related to the mutability? See example below:
Example:
def inc(x):
x += 1
d = {'a': int(0)}
inc(d['a'])
# d['a'] == 0, immutable
d = {'a': Int(0)}
inc(d['a'])
# d['a'] == 1, mutated
Where Int is:
class Int:
def __init__(self, value):
self.value = value
def __add__(self, v):
self.value += v
return self
def __repr__(self):
return str(self.value)
It's not exactly abut mutability, more about how assignment performs name binding.
When you do x = 0 in your inc function you bind a new object to the name x, and any connection between that name and the previous object bound to that name is lost. That doesn't depend on whether or not x is mutable.
But since x is an item in a mutable object we can achieve what you want by passing the parent mutable object to inc along with the key needed to access the desired item.
from collections import defaultdict
nested_dict_type = lambda: defaultdict(nested_dict_type)
nested_dict = nested_dict_type()
# incrementation
def inc(ref, key):
if not isinstance(ref[key], int):
ref[key] = 0
ref[key] += 1
d = nested_dict['a']['b']['c']['d'][1][2][3][4][5]
inc(d, 6)
print(d)
output
defaultdict(<function <lambda> at 0xb730553c>, {6: 1})
Now we aren't binding a new object, we're merely mutating an existing one, so the original d object gets updated correctly.
BTW, that deeply nested dict is a bit painful to work with. Maybe there's a better way to organize your data... But anyway, one thing that can be handy when working with deep nesting is to use lists or tuples of keys. Eg,
q = nested_dict
keys = 'a', 'b', 'c', 'd', 1, 2, 3, 4, 5
for k in keys:
q = q[k]
q now refers to nested_dict['a']['b']['c']['d'][1][2][3][4][5]
You can't have multiple default types with defaultdict. You have the following options:
Nested defaultdict of defaultdict objects indefinitely;
defaultdict of int objects, which likely won't suit your needs;
defaultdict of defaultdict down to a specific level with int defined for the last level, e.g. d = defaultdict(lambda: defaultdict(int)) for a single nesting;
Similar to (3), but for counting you can use collections.Counter instead, i.e. d = defaultdict(Counter).
I recommend the 3rd or 4th options if you are always going to go down to a set level. In other words, a scalar value will only be supplied at the nth level, where n is constant.
Otherwise, one manual option is to have a function perform the type-testing. In this case, try / except may be a good alternative. Here we also define a recursive algorithm to allow you to feed a list of keys rather than defining manual __getitem__ calls.
from collections import defaultdict
from functools import reduce
from operator import getitem
nested_dict_type = lambda: defaultdict(nested_dict_type)
d = nested_dict_type()
d[1][2] = 10
def inc(d_in, L):
try:
reduce(getitem, L[:-1], d_in)[L[-1]] += 1
except TypeError:
reduce(getitem, L[:-1], d_in)[L[-1]] = 1
inc(d, [1, 2])
inc(d, [1, 3])
print(d)
defaultdict({1: defaultdict({2: 11, 3: 1})})
I'd like to use instances of any type as a key in a single dict.
def add_to_dict(my_object, d, arbitrary_val = '123'):
d[ id(my_object) ] = arbitrary_val
d = {}
add_to_dict('my_str', arbitrary_val)
add_to_dict(my_list, arbitrary_val)
add_to_dict(my_int, arbirtray_val)
my_object = myclass()
my_object.__hash__ = None
add_to_dict(my_object, arbitrary_val)
The above won't work because my_list and my_object can't be hashed.
My first thought was to just pass in the id value of the object using the id() function.
def add_to_dict(my_object, d, arbitrary_val = '123'):
d[ id(my_object) ] = arbitrary_val
However, that won't work because id('some string') == id('some string') is not guaranteed to always be True.
My second thought was to test if the object has the __hash__ attribute. If it does, use the object, otherwise, use the id() value.
def add_to_dict(my_object, d, arbitrary_val = '123'):
d[ my_object if my_object.__hash__ else id(my_object) ] = arbitrary_val
However, since hash() and id() both return int's, I believe I will eventually get a collision.
How can I write add_to_dict(obj, d) above to ensure that no matter what obj is (list, int, str, object, dict), it will correctly set the item in the dictionary and do so without collision?
We could make some kind of dictionary that allows us to insert mutable objects as well:
class DictionaryMutable:
nullobject = object()
def __init__(self):
self._inner_dic = {}
self._inner_list = []
def __getitem__(self, name):
try:
return self._inner_dic[name]
except TypeError:
for key, val in self._inner_list:
if name == key:
return val
raise KeyError(name)
def __setitem__(self, name, value):
try:
self._inner_dic[name] = value
except TypeError:
for elm in self._inner_list:
if name == elm[0]:
elm[1] = value
break
else:
self._inner_list.append([name,value])
# ...
This works as follows: the DictionaryMutable consists out of a dictionary and a list. The dictionary contains the hashable immutable keys, the list contains sublists where each sublist contains two elements: a key and a value.
For each lookup we first attempt to perform a lookup on the dictionary, in case the key name is unhashable, a TypeError will be thrown. In that case we iterate through the list, check if one of the keys matches and return the corresponding value if it does. If no such element exists, we raise a KeyError.
Setting elements works approximately the same way: first we attempt to set the element in the dictionary. If it turns out the key is unhashable, we search linearly through the list and aim to add the element. If that fails, we add it at the end of the list.
This implementation has some major disadvantages:
if the dictionary lookup fails due to the key being unhashable, we will perform linear lookup, this can siginificantly slow down the lookup; and
if you alter an object that is in the dictionary, then the key will be updated, and thus a search for that object will fail. It thus can result in some unpredicted behavior.
This is only a basic implementation. For instance __iter__, etc. need to be implemented as well.
Instead of the id() of the object, you could use the pickled byte stream representation of the object pickle.dumps() returns for it. pickle works with most built-in types, and there are ways to extend it to work with most values it doesn't know how to do automatically.
Note: I used the repr() of the object as its "arbitrary value" in an effort to make it easier to identify them in the output displayed.
try:
import cpickle as pickle
except ModuleNotFoundError:
import pickle
from pprint import pprint
def add_to_dict(d, obj, arbitrary_val='123'):
d[pickle.dumps(obj)] = arbitrary_val
class MyClass: pass
my_string = 'spam'
my_list = [13, 'a']
my_int = 42
my_instance = MyClass()
d = {}
add_to_dict(d, my_string, repr(my_string))
add_to_dict(d, my_list, repr(my_list))
add_to_dict(d, my_int, repr(my_int))
add_to_dict(d, my_instance, repr(my_instance))
pprint(d)
Output:
{b'\x80\x03K*.': '42',
b'\x80\x03X\x04\x00\x00\x00spamq\x00.': "'spam'",
b'\x80\x03]q\x00(K\rX\x01\x00\x00\x00aq\x01e.': "[13, 'a']",
b'\x80\x03c__main__\nMyClass\nq\x00)\x81q\x01.': '<__main__.MyClass object at '
'0x021C1630>'}
I need to compare hundreds of objects stored in a unique list to find duplicates:
object_list = {Object_01, Object_02, Object_03, Object_04, Object_05, ...}
I've written a custom function, which returns True, if the objects are equal and False if not:
object_01.compare(object_02)
>>> True
Compare method works well, but takes a lot of time per execution. I'm currently using itertools.combinations(x, 2) to iterate through all combinations. I've thought it's a good idea to use a dict for storing already compared objects and create new sets dynamically like:
dct = {'Compared': {}}
dct['Compared'] = set()
import itertools
for a, b in itertools.combinations(x, 2):
if b.name not in dct['Compared']:
if compare(a,b) == True:
#print (a,b)
key = a.name
value = b.name
if key not in dct:
dct[key] = set()
dct[key].add(value)
else:
dct[key].add(value)
dct[key].add(key)
dct['Compared'].add(b)
Current Output:
Compared: {'Object_02', 'Object_01', 'Object_03', 'Object_04', 'Object_05'}
Object_01: {'Object_02', 'Object_03', 'Object_01'}
Object_04: {'Object_05', 'Object_04'}
Object_05: {'Object_04'}
...
I would like to know: Is there a faster way to iterate through all combinations and how to break/prevent the iteration of an object, which is already assigned to a list of duplicates?
Desired Output:
Compared: {'Object_02', 'Object_01', 'Object_03', 'Object_04', 'Object_05'}
Object_01: {'Object_02', 'Object_03', 'Object_01'}
Object_04: {'Object_05', 'Object_04'}
...
Note: Compare method is a c-wrapper. Requirement is to find an algorithm around it.
You don't need to calculate all combinations, you just need to check if a given item is a duplicate:
for i, a in enumerate(x):
if any(a.compare(b) for b in x[:i]):
# a is a duplicate of an already seen item, so do something
This is still technically O(n^2), but you've cut out at least half the checks required, and should be a bit faster.
In short, x[:i] returns all items in the list before index i. If the item x[i] appears in that list, you know it's a duplicate. If not, there may be a duplicate after it in the list, but you worry about that when you get there.
Using any is also important here: if it finds any true item, it will immediately stop, without checking the rest of the iterable.
You could also improve the number of checks by removing known duplicates from the list you're checking against:
x_copy = x[:]
removed = 0
for i, a in enumerate(x):
if any(a.compare(b) for b in x_copy[:i-removed]):
del x_copy[i-removed]
removed += 1
# a is a duplicate of an already seen item, so do something
Note that we use a copy, to avoid changing the sequence we're iterating over, and we need to take account for the number of items we've removed when using indexes.
Next, we just need to figure out how to build the dictionary.
THis might be a little more complex. The first step is to figure out exactly which element is a duplicate. This can be done by realising any is just a wrapper around a for loop:
def any(iterable):
for item in iterable:
if item: return True
return False
We can then make a minor change, and pass in a function:
def first(iterable, fn):
for item in iterable:
if fn(item): return item
return None
Now, we change our duplicate finder as follows:
d = collections.defaultdict(list)
x_copy = x[:]
removed = 0
for i, a in enumerate(x):
b = first(x_copy[:i-removed], a.compare):
if b is not None:
# b is the first occurring duplicate of a
del x_copy[i-removed]
removed += 1
d[b.name].append(a)
else:
# we've not seen a yet, but might see it later
d[a.name].append(a)
This will put every element in the list into a dict(-like). If you only want the duplicates, it's then just a case of getting all the entries with a length greater than 1.
Group the objects by name if you want to find the dups grouping by attributes
class Foo:
def __init__(self,i,j):
self.i = i
self.j = j
object_list = {Foo(1,2),Foo(3,4),Foo(1,2),Foo(3,4),Foo(5,6)}
from collections import defaultdict
d = defaultdict(list)
for obj in object_list:
d[(obj.i,obj.j)].append(obj)
print(d)
defaultdict(<type 'list'>, {(1, 2): [<__main__.Foo instance at 0x7fa44ee7d098>, <__main__.Foo instance at 0x7fa44ee7d128>],
(5, 6): [<__main__.Foo instance at 0x7fa44ee7d1b8>],
(3, 4): [<__main__.Foo instance at 0x7fa44ee7d0e0>, <__main__.Foo instance at 0x7fa44ee7d170>]})
If not the name then use a tuple to store all the attributes you use to check for comparison.
Or sort the list by the attributes that matter and use groupby to group:
class Foo:
def __init__(self,i,j):
self.i = i
self.j = j
object_list = {Foo(1,2),Foo(3,4),Foo(1,2),Foo(3,4),Foo(5,6)}
from itertools import groupby
from operator import attrgetter
groups = [list(v) for k,v in groupby(sorted(object_list, key=attrgetter("i","j")),key=attrgetter("i","j"))]
print(groups)
[[<__main__.Foo instance at 0x7f794a944d40>, <__main__.Foo instance at 0x7f794a944dd0>], [<__main__.Foo instance at 0x7f794a944d88>, <__main__.Foo instance at 0x7f794a944e18>], [<__main__.Foo instance at 0x7f794a944e60>]]
You could also implement lt, eq and hash to make your objects sortable and hashable:
class Foo(object):
def __init__(self,i,j):
self.i = i
self.j = j
def __lt__(self, other):
return (self.i, self.j) < (other.i, other.j)
def __hash__(self):
return hash((self.i,self.j))
def __eq__(self, other):
return (self.i, self.j) == (other.i, other.j)
print(set(object_list))
object_list.sort()
print(map(lambda x: (getattr(x,"i"),getattr(x,"j")),object_list))
set([<__main__.Foo object at 0x7fdff2fc08d0>, <__main__.Foo object at 0x7fdff2fc09d0>, <__main__.Foo object at 0x7fdff2fc0810>])
[(1, 2), (1, 2), (3, 4), (3, 4), (5, 6)]
Obviously the attributes need to be hashable, if you had lists you could change to tuples etc..
how to nest a OrderedDict?
i tried:
table=collections.OrderedDict()
table['E']['a']='abc'
but this shows error.
i tried also:
table=collections.OrderedDict(OrderedDict())
table['E']['a']='abc'
this also shows error.
i tried:
table=collections.OrderedDict()
table['E']=collections.OrderedDict()
table['E']['a']='abc'
this works fine.
in my coding i had to use like this:
table=collections.OrderedDict()
for lhs in left:
table[lhs]=collections.OrderedDict()
for val in terminal:
table[lhs][val]=0
which works fine. but is there any other method. as i read python manages its data structure automatically.
is there anyway to declare a dictionary along with how much nesting it'll be and what will be the data-structures of its nests in one line.
using an extra loop just to declare a dictionary feels like i'm missing something in python.
You can define your own custom subclass of OrderedDict, handle the __missing__ method to support infinite nesting.
from collections import OrderedDict
class MyDict(OrderedDict):
def __missing__(self, key):
val = self[key] = MyDict()
return val
Demo:
>>> d = MyDict()
>>> d['b']['c']['e'] = 100
>>> d['a']['c']['e'] = 100
>>> d.keys()
['b', 'a']
>>> d['a']['d']['e'] = 100
>>> d['a'].keys()
['c', 'd']
If you really want to do it in one line, then this would work
table = collections.OrderedDict([(lhs, collections.OrderedDict(zip(terminal, [0] * len(terminal)))) for lhs in left])
You would be best off (especially if terminal has a lot of members) doing
zipped = zip(terminal, [0] * len(terminal))
table = collections.OrderedDict([(lhs, collections.OrderedDict(zipped)) for lhs in left])
class OrderedDefaultDict(OrderedDict):
def __init__(self, default_factory=None, *args, **kwargs):
super(OrderedDefaultDict, self).__init__(*args, **kwargs)
self.default_factory = default_factory
def __missing__(self, key):
if self.default_factory is None:
raise KeyError(key)
val = self[key] = self.default_factory()
return val
It's simple enough to subclass OrderedDict with defaultdict-like behavior. You can then use an OrderedDefaultDict as follows:
table = OrderedDefaultDict(OrderedDict)
table['a']['b'] = 3
I'm using python2.6. Is it available in higher version of python?
Else is there any other way I can maintain priority queues for list of objects of non-trivial classes?
What I need is something like this
>>> l = [ ['a', 3], ['b', 1] ]
>>> def foo(x, y):
... return x[1]-y[1]
>>> heap = heapify(l, cmp=foo)
Any suggestions ?
Solution: Wrap data with the new comparison
Since the builtin functions don't directly support cmp functions, we need to build new variants of heapify and heappop:
from heapq import heapify, heappop
from functools import cmp_to_key
def new_heapify(data, cmp):
s = list(map(cmp_to_key(cmp), data))
heapify(s)
return s
def new_heappop(data):
return heappop(data).obj
Those are used just like your example:
>>> l = [ ['a', 3], ['b', 1] ]
>>> def foo(x, y):
... return x[1]-y[1]
...
>>> heap = new_heapify(l, cmp=foo)
>>> new_heappop(heap)
['b', 1]
Solution: Store Augmented Tuples
A more traditional solution is to store (priority, task) tuples on the heap:
pq = [ ]
heappush(pq, (10, task1))
heappush(pq, (5, task2))
heappush(pq, (15, task3))
priority, task = heappop(pq)
This works fine as long as no two tasks have the same priority; otherwise, the tasks themselves are compared (which might not work at all in Python 3).
The regular docs give guidance on how to implement priority queues using heapq:
http://docs.python.org/library/heapq.html#priority-queue-implementation-notes
Just write an appropriate __lt__ method for the objects in the list so they sort correctly:
class FirstList(list):
def __lt__(self, other):
return self[0] < other[0]
lst = [ ['a', 3], ['b', 1] ]
lst = [FirstList(item) for item in lst]
Only __lt__ is needed by Python for sorting, though it's a good idea to define all of the comparisons or use functools.total_ordering.
You can see that it is working by using two items with the same first value and different second values. The two objects will swap places when you heapify no matter what the second values are because lst[0] < lst[1] will always be False. If you need the heapify to be stable, you need a more complex comparison.
Well, this is terrible and awful and you definitely shouldn't do it… But it looks like the heapq module defines a cmp_lt function, which you could monkey patch if you really wanted a custom compare function.
With these Heap and HeapBy classes I tried to simplify the usage of heapq. You can use HeapBy to pass a key sorting function.
Note that Raymond said that his solution won't work if priorities are repeated and the values are not sortable. That's why I added an example of HeapBy with a NonComparable class.
I took the __lt__ idea from agf's solution.
Usage:
# Use HeapBy with a lambda for sorting
max_heap = HeapBy(key=lambda x: -x)
max_heap.push(3)
max_heap.push(1)
max_heap.push(2)
assert max_heap.pop() == 3
assert max_heap.pop() == 2
assert max_heap.pop() == 1
# Use Heap as a convenience facade for heapq
min_heap = Heap()
min_heap.push(3)
min_heap.push(1)
min_heap.push(2)
assert min_heap.pop() == 1
assert min_heap.pop() == 2
assert min_heap.pop() == 3
# HeapBy also works with non-comparable objects.
# Note that I push a duplicated value
# to make sure heapq will not try to call __lt__ on it.
class NonComparable:
def __init__(self, val):
self.val = val
# Using non comparable values
max_heap = HeapBy(key=lambda x: -x.val)
max_heap.push(NonComparable(1))
max_heap.push(NonComparable(1))
max_heap.push(NonComparable(3))
max_heap.push(NonComparable(2))
assert max_heap.pop().val == 3
assert max_heap.pop().val == 2
assert max_heap.pop().val == 1
assert max_heap.pop().val == 1
Classes:
import heapq
class Heap:
"""
Convenience class for simplifying heapq usage
"""
def __init__(self, array=None, heapify=True):
if array:
self.heap = array
if heapify:
heapq.heapify(self.heap)
else:
self.heap = []
def push(self, x):
heapq.heappush(self.heap, x)
def pop(self):
return heapq.heappop(self.heap)
class HeapBy(Heap):
"""
Heap where you can specify a key function for sorting
"""
# Item only uses the key function to sort elements,
# just in case the values are not comparable
class Item:
def __init__(self, value, key):
self.key = key
self.value = value
def __lt__(self, other):
return self.key(self.value) < other.key(other.value)
def __init__(self, key, array=None, heapify=True):
super().__init__(array, heapify)
self.key = key
def push(self, x):
super().push(self.Item(x, self.key))
def pop(self):
return super().pop().value
I don't know if this is better but it is like Raymond Hettinger's solution but the priority is determined from the object.
Let this be your object and you want to sort by the the x attribute.
class Item:
def __init__(self, x):
self.x = x
Then have a function which applies the pairing
def create_pairs(items):
return map(lambda item: (item.x, item), items)
Then apply the function to the lists as input into heapq.merge
list(heapq.merge(create_pairs([Item(1), Item(3)]),
create_pairs([Item(2), Item(5)])))
Which gave me the following output
[(1, <__main__.Item instance at 0x2660cb0>),
(2, <__main__.Item instance at 0x26c2830>),
(3, <__main__.Item instance at 0x26c27e8>),
(5, <__main__.Item instance at 0x26c2878>)]