Why does {}.values() == {}.values() return False? [duplicate] - python

With Python 3:
>>> from collections import OrderedDict
>>> d1 = OrderedDict([('foo', 'bar')])
>>> d2 = OrderedDict([('foo', 'bar')])
I wanted to check for equality:
>>> d1 == d2
True
>>> d1.keys() == d2.keys()
True
But:
>>> d1.values() == d2.values()
False
Do you know why values are not equal?
I've tested this with Python 3.4 and 3.5.
Following this question, I posted on the Python-Ideas mailing list to have additional details:
https://mail.python.org/pipermail/python-ideas/2015-December/037472.html

In Python 3, dict.keys() and dict.values() return special iterable classes - respectively a collections.abc.KeysView and a collections.abc.ValuesView. The first one inherit it's __eq__ method from set, the second uses the default object.__eq__ which tests on object identity.

In python3, d1.values() and d2.values() are collections.abc.ValuesView objects:
>>> d1.values()
ValuesView(OrderedDict([('foo', 'bar')]))
Don't compare them as an object, convert them to lists and then compare them:
>>> list(d1.values()) == list(d2.values())
True
Investigating why it works for comparing keys, in _collections_abc.py of CPython, KeysView is inheriting from Set while ValuesView does not:
class KeysView(MappingView, Set):
class ValuesView(MappingView):
Tracing for __eq__ in ValuesView and its parents:
MappingView ==> Sized ==> ABCMeta ==> type ==> object.
__eq__ is implemented only in object and not overridden.
In the other hand, KeysView inherits __eq__ directly from Set.

Unfortunately, both current answers don't address why this is but focus on how this is done. That mailing list discussion was amazing, so I'll sum things up:
For odict.keys/dict.keys and odict.items/dict.items:
odict.keys (subclass of dict.keys) supports comparison due to its conformance to collections.abc.Set (it's a set-like object). This is possible due to the fact that keys inside a dictionary (ordered or not) are guaranteed to be unique and hashable.
odict.items (subclass of dict.items) also supports comparison for the same reason as .keys does. itemsview is allowed to do this since it raises the appropriate error if one of the items (specifically, the second element representing the value) is not hashable, uniqueness is guaranteed, though (due to keys being unique):
>>> od = OrderedDict({'a': []})
>>> set() & od.items()
TypeErrorTraceback (most recent call last)
<ipython-input-41-a5ec053d0eda> in <module>()
----> 1 set() & od.items()
TypeError: unhashable type: 'list'
For both these views keys, items, the comparison uses a simple function called all_contained_in (pretty readable) that uses the objects __contain__ method to check for membership of the elements in the views involved.
Now, about odict.values/dict.values:
As noticed, odict.values (subclass of dict.values [shocker]) doesn't compare like a set-like object. This is because the values of a valuesview cannot be represented as a set, the reasons are two-fold:
Most importantly, the view might contain duplicates which cannot be dropped.
The view might contain non-hashable objects (which, on it's own, isn't sufficient to not treat the view as set-like).
As stated in a comment by #user2357112 and by #abarnett in the mailing list, odict.values/dict.values is a multiset, a generalization of sets that allows multiple instances of it's elements.
Trying to compare these is not as trivial as comparing keys or items due to the inherent duplication, the ordering and the fact that you probably need to take into consideration the keys that correspond to those values. Should dict_values that look like this:
>>> {1:1, 2:1, 3:2}.values()
dict_values([1, 1, 2])
>>> {1:1, 2:1, 10:2}.values()
dict_values([1, 1, 2])
actually be equal even though the values that correspond to the keys isn't the same? Maybe? Maybe not? It isn't straight-forward either way and will lead to inevitable confusion.
The point to be made though is that it isn't trivial to compare these as is with keys and items, to sum up, with another comment from #abarnett on the mailing list:
If you're thinking we could define what multisets should do, despite not having a standard multiset type or an ABC for them, and apply that to values views, the next question is how to do that in better than quadratic time for non-hashable values. (And you can't assume ordering here, either.) Would having a values view hang for 30 seconds and then come back with the answer you intuitively wanted instead of giving the wrong answer in 20 millis be an improvement? (Either way, you're going to learn the same lesson: don't compare values views. I'd rather learn that in 20 millis.)

Related

How does hash-table in set works in python?

As far as I know, set in python works via a hash-table to achieve O(1) look-up complexity. While it is hash-table, every entry in a set must be hashable (or immutable).
So This peace of code raises exception:
>>> {dict()}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict'
Because dict is not hashable. But we can create our own class inherited from dict and implement the __hash__ magic method. I created my own in this way:
>>> class D(dict):
... def __hash__(self):
... return 3
...
I know it should not work properly but I just wanted to experiment with it. So I checked that I can now use this type in set:
>>> {D()}
{{}}
>>> {D(name='ali')}
{{'name': 'ali'}}
So far so good, but I thought that this way of implementing the __hash__ magic method would screw up the look up in set. Because every object of the D has the same hash value.
>>> d1 = D(n=1)
>>> d2 = D(n=2)
>>> hash(d1), hash(d2)
(3, 3)
>>>
>>> {d1, d2}
{{'n': 2}, {'n': 1}}
But the surprise for me was this:
>>> d3 = D()
>>> d3 in {d1, d2}
False
I expected the result to be True, because hash of d3 is 3 and currently there are values in our set with the same hash value. How does the set works internally?
To be usable in sets and dicts, a __hash__ method must guarantee that if x == y, then hash(x) == hash(y). But that's a one-sided implication. It's not at all required that if hash(x) == hash(h) then x == y must be true. Indeed, that's impossible to achieve in general (for example, there are an unbounded number of distinct Python ints, but only a finite number of hash codes - there must be distinct ints that have the same hash value).
That your hashes are all the same is fine. They only tell the set/dict where to start looking. All objects in the container with the same hash are then compared, one by one, for equality, until success, or until all such objects have been tried without success.
However, while making all hashes the same doesn't hurt correctness, it's a disaster for performance: it effectively turns the set/dict into an exceptionally slow way to do an O(n) linear search.

Why is user-defined object hashable but not list? [duplicate]

I'm a bit confused about what can/can't be used as a key for a python dict.
dicked = {}
dicked[None] = 'foo' # None ok
dicked[(1,3)] = 'baz' # tuple ok
import sys
dicked[sys] = 'bar' # wow, even a module is ok !
dicked[(1,[3])] = 'qux' # oops, not allowed
So a tuple is an immutable type but if I hide a list inside of it, then it can't be a key.. couldn't I just as easily hide a list inside a module?
I had some vague idea that that the key has to be "hashable" but I'm just going to admit my own ignorance about the technical details; I don't know what's really going on here. What would go wrong if you tried to use lists as keys, with the hash as, say, their memory location?
There's a good article on the topic in the Python wiki: Why Lists Can't Be Dictionary Keys. As explained there:
What would go wrong if you tried to use lists as keys, with the hash as, say, their memory location?
It can be done without really breaking any of the requirements, but it leads to unexpected behavior. Lists are generally treated as if their value was derived from their content's values, for instance when checking (in-)equality. Many would - understandably - expect that you can use any list [1, 2] to get the same key, where you'd have to keep around exactly the same list object. But lookup by value breaks as soon as a list used as key is modified, and for lookup by identity requires you to keep around exactly the same list - which isn't requires for any other common list operation (at least none I can think of).
Other objects such as modules and object make a much bigger deal out of their object identity anyway (when was the last time you had two distinct module objects called sys?), and are compared by that anyway. Therefore, it's less surprising - or even expected - that they, when used as dict keys, compare by identity in that case as well.
Why can't I use a list as a dict key in python?
>>> d = {repr([1,2,3]): 'value'}
{'[1, 2, 3]': 'value'}
(for anybody who stumbles on this question looking for a way around it)
as explained by others here, indeed you cannot. You can however use its string representation instead if you really want to use your list.
Just found you can change List into tuple, then use it as keys.
d = {tuple([1,2,3]): 'value'}
The issue is that tuples are immutable, and lists are not. Consider the following
d = {}
li = [1,2,3]
d[li] = 5
li.append(4)
What should d[li] return? Is it the same list? How about d[[1,2,3]]? It has the same values, but is a different list?
Ultimately, there is no satisfactory answer. For example, if the only key that works is the original key, then if you have no reference to that key, you can never again access the value. With every other allowed key, you can construct a key without a reference to the original.
If both of my suggestions work, then you have very different keys that return the same value, which is more than a little surprising. If only the original contents work, then your key will quickly go bad, since lists are made to be modified.
Here's an answer http://wiki.python.org/moin/DictionaryKeys
What would go wrong if you tried to use lists as keys, with the hash as, say, their memory location?
Looking up different lists with the same contents would produce different results, even though comparing lists with the same contents would indicate them as equivalent.
What about Using a list literal in a dictionary lookup?
Because lists are mutable, dict keys (and set members) need to be hashable, and hashing mutable objects is a bad idea because hash values should be computed on the basis of instance attributes.
In this answer, I will give some concrete examples, hopefully adding value on top of the existing answers. Every insight applies to the elements of the set datastructure as well.
Example 1: hashing a mutable object where the hash value is based on a mutable characteristic of the object.
>>> class stupidlist(list):
... def __hash__(self):
... return len(self)
...
>>> stupid = stupidlist([1, 2, 3])
>>> d = {stupid: 0}
>>> stupid.append(4)
>>> stupid
[1, 2, 3, 4]
>>> d
{[1, 2, 3, 4]: 0}
>>> stupid in d
False
>>> stupid in d.keys()
False
>>> stupid in list(d.keys())
True
After mutating stupid, it cannot be found in the dict any longer because the hash changed. Only a linear scan over the list of the dict's keys finds stupid.
Example 2: ... but why not just a constant hash value?
>>> class stupidlist2(list):
... def __hash__(self):
... return id(self)
...
>>> stupidA = stupidlist2([1, 2, 3])
>>> stupidB = stupidlist2([1, 2, 3])
>>>
>>> stupidA == stupidB
True
>>> stupidA in {stupidB: 0}
False
That's not a good idea as well because equal objects should hash identically such that you can find them in a dict or set.
Example 3: ... ok, what about constant hashes across all instances?!
>>> class stupidlist3(list):
... def __hash__(self):
... return 1
...
>>> stupidC = stupidlist3([1, 2, 3])
>>> stupidD = stupidlist3([1, 2, 3])
>>> stupidE = stupidlist3([1, 2, 3, 4])
>>>
>>> stupidC in {stupidD: 0}
True
>>> stupidC in {stupidE: 0}
False
>>> d = {stupidC: 0}
>>> stupidC.append(5)
>>> stupidC in d
True
Things seem to work as expected, but think about what's happening: when all instances of your class produce the same hash value, you will have a hash collision whenever there are more than two instances as keys in a dict or present in a set.
Finding the right instance with my_dict[key] or key in my_dict (or item in my_set) needs to perform as many equality checks as there are instances of stupidlist3 in the dict's keys (in the worst case). At this point, the purpose of the dictionary - O(1) lookup - is completely defeated. This is demonstrated in the following timings (done with IPython).
Some Timings for Example 3
>>> lists_list = [[i] for i in range(1000)]
>>> stupidlists_set = {stupidlist3([i]) for i in range(1000)}
>>> tuples_set = {(i,) for i in range(1000)}
>>> l = [999]
>>> s = stupidlist3([999])
>>> t = (999,)
>>>
>>> %timeit l in lists_list
25.5 µs ± 442 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit s in stupidlists_set
38.5 µs ± 61.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit t in tuples_set
77.6 ns ± 1.5 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
As you can see, the membership test in our stupidlists_set is even slower than a linear scan over the whole lists_list, while you have the expected super fast lookup time (factor 500) in a set without loads of hash collisions.
TL; DR: you can use tuple(yourlist) as dict keys, because tuples are immutable and hashable.
The simple answer to your question is that the class list does not implement the method hash which is required for any object which wishes to be used as a key in a dictionary. However the reason why hash is not implemented the same way it is in say the tuple class (based on the content of the container) is because a list is mutable so editing the list would require the hash to be recalculated which may mean the list in now located in the wrong bucket within the underling hash table. Note that since you cannot modify a tuple (immutable) it doesn't run into this problem.
As a side note, the actual implementation of the dictobjects lookup is based on Algorithm D from Knuth Vol. 3, Sec. 6.4. If you have that book available to you it might be a worthwhile read, in addition if you're really, really interested you may like to take a peek at the developer comments on the actual implementation of dictobject here. It goes into great detail as to exactly how it works. There is also a python lecture on the implementation of dictionaries which you may be interested in. They go through the definition of a key and what a hash is in the first few minutes.
Your awnser can be found here:
Why Lists Can't Be Dictionary Keys
Newcomers to Python often wonder why, while the language includes both
a tuple and a list type, tuples are usable as a dictionary keys, while
lists are not. This was a deliberate design decision, and can best be
explained by first understanding how Python dictionaries work.
Source & more info: http://wiki.python.org/moin/DictionaryKeys
A Dictionary is a HashMap it stores map of your keys, value converted
to a hashed new key and value mapping.
something like (psuedo code):
{key : val}
hash(key) = val
If you are wondering which are available options that can be used as key for your dictionary. Then
anything which is hashable(can be converted to hash, and hold static value i.e immutable so as to make a
hashed key as stated above) is eligible but as list or set objects can be vary on the go so hash(key) should also needs to vary just to be in sync with your list or set.
You can try :
hash(<your key here>)
If it works fine it can be used as key for your dictionary or else convert it to something hashable.
Inshort :
Convert that list to tuple(<your list>).
Convert that list to str(<your list>).
Simply we can keep in mind that the dict keys need to be immutable (to be exact, hashable). Lists are mutable (to be exact, lists do not provide a valid __hash__ method).
Here an immutable object (unchangeable object) is an object whose state cannot be modified after it is created. This is in contrast to a mutable object (changeable object), which can be modified after it is created.
According to the Python 2.7.2 documentation:
An object is hashable if it has a hash value which never changes
during its lifetime (it needs a __hash__() method), and can be
compared to other objects (it needs an __eq__() or __cmp__() method).
Hashable objects which compare equal must have the same hash value.
Hashability makes an object usable as a dictionary key and a set
member, because these data structures use the hash value internally.
All of Python’s immutable built-in objects are hashable, while no
mutable containers (such as lists or dictionaries) are. Objects which
are instances of user-defined classes are hashable by default; they
all compare unequal, and their hash value is their id().
A tuple is immutable in the sense that you cannot add, remove or replace its elements, but the elements themselves may be mutable. List's hash value depends on the hash values of its elements, and so it changes when you change the elements.
Using id's for list hashes would imply that all lists compare differently, which would be surprising and inconvenient.

Why is the __dict__ of instances so much smaller in size in Python 3?

In Python, dictionaries created for the instances of a class are tiny compared to the dictionaries created containing the same attributes of that class:
import sys
class Foo(object):
def __init__(self, a, b):
self.a = a
self.b = b
f = Foo(20, 30)
When using Python 3.5.2, the following calls to getsizeof produce:
>>> sys.getsizeof(vars(f)) # vars gets obj.__dict__
96
>>> sys.getsizeof(dict(vars(f))
288
288 - 96 = 192 bytes saved!
Using Python 2.7.12, though, on the other hand, the same calls return:
>>> sys.getsizeof(vars(f))
280
>>> sys.getsizeof(dict(vars(f)))
280
0 bytes saved.
In both cases, the dictionaries obviously have exactly the same contents:
>>> vars(f) == dict(vars(f))
True
so this isn't a factor. Also, this also applies to Python 3 only.
So, what's going on here? Why is the size of the __dict__ of an instance so tiny in Python 3?
In short:
Instance __dict__'s are implemented differently than the 'normal' dictionaries created with dict or {}. The dictionaries of an instance share the keys and hashes and the keep a separate array for the parts that differ: the values. sys.getsizeof only counts those values when calculating the size for the instance dict.
A bit more:
Dictionaries in CPython are, as of Python 3.3, implemented in one of two forms:
Combined dictionary: All values of the dictionary are stored alongside the key and hash for each entry. (me_value member of the PyDictKeyEntry struct). As far as I know, this form is used for dictionaries created with dict, {} and the module namespace.
Split table: The values are stored separately in an array, while the keys and hashes are shared (Values stored in ma_values of PyDictObject)
Instance dictionaries are always implemented in a split-table form (a Key-Sharing Dictionary) which allows instances of a given class to share the keys (and hashes) for their __dict__ and only differ in the corresponding values.
This is all described in PEP 412 -- Key-Sharing Dictionary. The implementation for the split dictionary landed in Python 3.3 so, previous versions of the 3 family as well as Python 2.x don't have this implementation.
The implementation of __sizeof__ for dictionaries takes this fact into account and only considers the size that corresponds to the values array when calculating the size for a split dictionary.
It's thankfully, self-explanatory:
Py_ssize_t size, res;
size = DK_SIZE(mp->ma_keys);
res = _PyObject_SIZE(Py_TYPE(mp));
if (mp->ma_values) /*Add the values to the result*/
res += size * sizeof(PyObject*);
/* If the dictionary is split, the keys portion is accounted-for
in the type object. */
if (mp->ma_keys->dk_refcnt == 1) /* Add keys/hashes size to res */
res += sizeof(PyDictKeysObject) + (size-1) * sizeof(PyDictKeyEntry);
return res;
As far as I know, split-table dictionaries are created only for the namespace of instances, using dict() or {} (as also described in the PEP) always results in a combined dictionary that doesn't have these benefits.
As an aside, since it's fun, we can always break this optimization. There's two current ways I've currently found, a silly way or by a more sensible scenario:
Being silly:
>>> f = Foo(20, 30)
>>> getsizeof(vars(f))
96
>>> vars(f).update({1:1}) # add a non-string key
>>> getsizeof(vars(f))
288
Split tables only support string keys, adding a non-string key (which really makes zero sense) breaks this rule and CPython turns the split table into a combined one loosing all memory gains.
A scenario that might happen:
>>> f1, f2 = Foo(20, 30), Foo(30, 40)
>>> for i, j in enumerate([f1, f2]):
... setattr(j, 'i'+str(i), i)
... print(getsizeof(vars(j)))
96
288
Different keys being inserted in the instances of a class will eventually lead to the split table getting combined. This doesn't apply only to the instances already created; all consequent instances created from the class will be have a combined dictionary instead of a split one.
# after running previous snippet
>>> getsizeof(vars(Foo(100, 200)))
288
of course, there's no good reason, other than for fun, for doing this on purpose.
If anyone is wondering, Python 3.6's dictionary implementation doesn't change this fact. The two aforementioned forms of dictionaries while still available are just further compacted (the implementation of dict.__sizeof__ also changed, so some differences should come up in values returned from getsizeof.)

What's the idiomatic way to fake __hash__() for dicts?

EDIT: as #BrenBarn pointed out, the original didn't make sense.
Given a list of dicts (courtesy of csv.DictReader--they all have str keys and values) it'd be nice to remove duplicates by stuffing them all in a set, but this can't be done directly since dict isn't hashable. Some existing questions touch on how to fake __hash__() for sets/dicts but don't address which way should be preferred.
# i. concise but ugly round trip
filtered = [eval(x) for x in {repr(d) for d in pile_o_dicts}]
# ii. wordy but avoids round trip
filtered = []
keys = set()
for d in pile_o_dicts:
key = str(d)
if key not in keys:
keys.add(key)
filtered.append(d)
# iii. introducing another class for this seems Java-like?
filtered = {hashable_dict(x) for x in pile_o_dicts}
# iv. something else entirely
In the spirit of the Zen of Python what's the "obvious way to do it"?
Based on your example code, I take your question to be something slightly different from what you literally say. You don't actually want to override __hash__() -- you just want to filter out duplicates in linear time, right? So you need to ensure the following for each dictionary: 1) every key-value pair is represented, and 2) they are represented in a stable order. You could use a sorted tuple of key-value pairs, but instead, I would suggest using frozenset. frozensets are hashable, and they avoid the overhead of sorting, which should improve performance (as this answer seems to confirm). The downside is that they take up more memory than tuples, so there is a space/time tradeoff here.
Also, your code uses sets to do the filtering, but that doesn't make a lot of sense. There's no need for that ugly eval step if you use a dictionary:
filtered = {frozenset(d.iteritems()):d for d in pile_o_dicts}.values()
Or in Python 3, assuming you want a list rather than a dictionary view:
filtered = list({frozenset(d.items()):d for d in pile_o_dicts}.values())
These are both bit clunky. For readability, consider breaking it into two lines:
dict_o_dicts = {frozenset(d.iteritems()):d for d in pile_o_dicts}
filtered = dict_o_dicts.values()
The alternative is an ordered tuple of tuples:
filtered = {tuple(sorted(d.iteritems())):d for d in pile_o_dicts}.values()
And a final note: don't use repr for this. Dictionaries that evaluate as equal can have different representations:
>>> d1 = {str(i):str(i) for i in range(300)}
>>> d2 = {str(i):str(i) for i in range(299, -1, -1)}
>>> d1 == d2
True
>>> repr(d1) == repr(d2)
False
The artfully named pile_o_dicts can be converted to a canonical form by sorting their items lists:
groups = {}
for d in pile_o_dicts:
k = tuple(sorted(d.items()))
groups.setdefault(k, []).append(d)
This will group identical dictionaries together.
FWIW, the technique of using sorted(d.items()) is currently used in the standard library for functools.lru_cache() in order to recognize function calls that have the same keyword arguments. IOW, this technique is tried and true :-)
If the dicts all have the same keys, you can use a namedtuple
>>> from collections import namedtuple
>>> nt = namedtuple('nt', pile_o_dicts[0])
>>> set(nt(**d) for d in pile_o_dicts)

A data-structure for 1:1 mappings in python?

I have a problem which requires a reversable 1:1 mapping of keys to values.
That means sometimes I want to find the value given a key, but at other times I want to find the key given the value. Both keys and values are guaranteed unique.
x = D[y]
y == D.inverse[x]
The obvious solution is to simply invert the dictionary every time I want a reverse-lookup: Inverting a dictionary is very easy, there's a recipe here but for a large dictionary it can be very slow.
The other alternative is to make a new class which unites two dictionaries, one for each kind of lookup. That would most likely be fast but would use up twice as much memory as a single dict.
So is there a better structure I can use?
My application requires that this should be very fast and use as little as possible memory.
The structure must be mutable, and it's strongly desirable that mutating the object should not cause it to be slower (e.g. to force a complete re-index)
We can guarantee that either the key or the value (or both) will be an integer
It's likely that the structure will be needed to store thousands or possibly millions of items.
Keys & Valus are guaranteed to be unique, i.e. len(set(x)) == len(x) for for x in [D.keys(), D.valuies()]
The other alternative is to make a new
class which unites two dictionaries,
one for each kind of lookup. That
would most likely be fast but would
use up twice as much memory as a
single dict.
Not really. Have you measured that? Since both dictionaries would use references to the same objects as keys and values, then the memory spent would be just the dictionary structure. That's a lot less than twice and is a fixed ammount regardless of your data size.
What I mean is that the actual data wouldn't be copied. So you'd spend little extra memory.
Example:
a = "some really really big text spending a lot of memory"
number_to_text = {1: a}
text_to_number = {a: 1}
Only a single copy of the "really big" string exists, so you end up spending just a little more memory. That's generally affordable.
I can't imagine a solution where you'd have the key lookup speed when looking by value, if you don't spend at least enough memory to store a reverse lookup hash table (which is exactly what's being done in your "unite two dicts" solution).
class TwoWay:
def __init__(self):
self.d = {}
def add(self, k, v):
self.d[k] = v
self.d[v] = k
def remove(self, k):
self.d.pop(self.d.pop(k))
def get(self, k):
return self.d[k]
The other alternative is to make a new class which unites two dictionaries, one for each > kind of lookup. That would most likely use up twice as much memory as a single dict.
Not really, since they would just be holding two references to the same data. In my mind, this is not a bad solution.
Have you considered an in-memory database lookup? I am not sure how it will compare in speed, but lookups in relational databases can be very fast.
Here is my own solution to this problem: http://github.com/spenthil/pymathmap/blob/master/pymathmap.py
The goal is to make it as transparent to the user as possible. The only introduced significant attribute is partner.
OneToOneDict subclasses from dict - I know that isn't generally recommended, but I think I have the common use cases covered. The backend is pretty simple, it (dict1) keeps a weakref to a 'partner' OneToOneDict (dict2) which is its inverse. When dict1 is modified dict2 is updated accordingly as well and vice versa.
From the docstring:
>>> dict1 = OneToOneDict()
>>> dict2 = OneToOneDict()
>>> dict1.partner = dict2
>>> assert(dict1 is dict2.partner)
>>> assert(dict2 is dict1.partner)
>>> dict1['one'] = '1'
>>> dict2['2'] = '1'
>>> dict1['one'] = 'wow'
>>> assert(dict1 == dict((v,k) for k,v in dict2.items()))
>>> dict1['one'] = '1'
>>> assert(dict1 == dict((v,k) for k,v in dict2.items()))
>>> dict1.update({'three': '3', 'four': '4'})
>>> assert(dict1 == dict((v,k) for k,v in dict2.items()))
>>> dict3 = OneToOneDict({'4':'four'})
>>> assert(dict3.partner is None)
>>> assert(dict3 == {'4':'four'})
>>> dict1.partner = dict3
>>> assert(dict1.partner is not dict2)
>>> assert(dict2.partner is None)
>>> assert(dict1.partner is dict3)
>>> assert(dict3.partner is dict1)
>>> dict1.setdefault('five', '5')
>>> dict1['five']
'5'
>>> dict1.setdefault('five', '0')
>>> dict1['five']
'5'
When I get some free time, I intend to make a version that doesn't store things twice. No clue when that'll be though :)
Assuming that you have a key with which you look up a more complex mutable object, just make the key a property of that object. It does seem you might be better off thinking about the data model a bit.
"We can guarantee that either the key or the value (or both) will be an integer"
That's weirdly written -- "key or the value (or both)" doesn't feel right. Either they're all integers, or they're not all integers.
It sounds like they're all integers.
Or, it sounds like you're thinking of replacing the target object with an integer value so you only have one copy referenced by an integer. This is a false economy. Just keep the target object. All Python objects are -- in effect -- references. Very little actual copying gets done.
Let's pretend that you simply have two integers and can do a lookup on either one of the pair. One way to do this is to use heap queues or the bisect module to maintain ordered lists of integer key-value tuples.
See http://docs.python.org/library/heapq.html#module-heapq
See http://docs.python.org/library/bisect.html#module-bisect
You have one heapq (key,value) tuples. Or, if your underlying object is more complex, the (key,object) tuples.
You have another heapq (value,key) tuples. Or, if your underlying object is more complex, (otherkey,object) tuples.
An "insert" becomes two inserts, one to each heapq-structured list.
A key lookup is in one queue; a value lookup is in the other queue. Do the lookups using bisect(list,item).
It so happens that I find myself asking this question all the time (yesterday in particular). I agree with the approach of making two dictionaries. Do some benchmarking to see how much memory it's taking. I've never needed to make it mutable, but here's how I abstract it, if it's of any use:
class BiDict(list):
def __init__(self,*pairs):
super(list,self).__init__(pairs)
self._first_access = {}
self._second_access = {}
for pair in pairs:
self._first_access[pair[0]] = pair[1]
self._second_access[pair[1]] = pair[0]
self.append(pair)
def _get_by_first(self,key):
return self._first_access[key]
def _get_by_second(self,key):
return self._second_access[key]
# You'll have to do some overrides to make it mutable
# Methods such as append, __add__, __del__, __iadd__
# to name a few will have to maintain ._*_access
class Constants(BiDict):
# An implementation expecting an integer and a string
get_by_name = BiDict._get_by_second
get_by_number = BiDict._get_by_first
t = Constants(
( 1, 'foo'),
( 5, 'bar'),
( 8, 'baz'),
)
>>> print t.get_by_number(5)
bar
>>> print t.get_by_name('baz')
8
>>> print t
[(1, 'foo'), (5, 'bar'), (8, 'baz')]
How about using sqlite? Just create a :memory: database with a two-column table. You can even add indexes, then query by either one. Wrap it in a class if it's something you're going to use a lot.

Categories

Resources