Given a dict iterator, get the dict

Given a dict iterator, get the dict - python

Given a list iterator, you can find the original list via the pickle protocol:
>>> L = [1, 2, 3]
>>> Li = iter(L)
>>> Li.__reduce__()[1][0] is L
True
Given a dict iterator, how can you find the original dict? I could only find a hacky way using CPython implementation details (via garbage collector):
>>> def get_dict(dict_iterator):
... [d] = gc.get_referents(dict_iterator)
... return d
...
>>> d = {}
>>> get_dict(iter(d)) is d
True

There is no API to find the source iterable object from an iterator. This is intentional, iterators are seen as single-use objects; iterate and discard. A such, they often drop their iterable reference once they have reached the end; what's the point of keeping it if you can't get more elements, anyway?
You see this in both the list and dict iterators, the hacks you found either produce empty objects or None once you are done iterating. List iterators use an empty list when pickled:
>>> l = [1]
>>> it = iter(l)
>>> it.__reduce__()[1][0] is l
True
>>> list(it) # exhaust the iterator
[1]
>>> it.__reduce__()[1][0] is l
False
>>> it.__reduce__()[1][0]
[]
and the dictionary iterator just sets the pointer to the original dictionary to null, so there are no referents left after that:
>>> import gc
>>> it = iter({'foo': 42})
>>> gc.get_referents(it)
[{'foo': 42}]
>>> list(it)
['foo']
>>> gc.get_referents(it)
[]
Both your hacks are just that: hacks. They are implementation dependent and can and probably will change between Python releases. Currently, using iter(dictionary).__reduce__() gets you the equivalent of iter, list(copy(self)) and rather than access to the dictionary because that's deemed a better implementation, but future versions might use something different altogether, etc.
For dictionaries, the only other option currently available is to access the di_dict pointer in the dictiter struct, with ctypes:
import ctypes
class PyObject_HEAD(ctypes.Structure):
_fields_ = [
("ob_refcnt", ctypes.c_ssize_t),
("ob_type", ctypes.c_void_p),
]
class dictiterobject(ctypes.Structure):
_fields_ = [
("ob_base", PyObject_HEAD),
("di_dict", ctypes.py_object),
("di_used", ctypes.c_ssize_t),
("di_pos", ctypes.c_ssize_t),
("di_result", ctypes.py_object), # always NULL for dictkeys_iter
("len", ctypes.c_ssize_t),
]
def dict_from_dictiter(it):
di = dictiterobject.from_address(id(it))
try:
return di.di_dict
except ValueError: # null pointer
return None
This is just as much of a hack as relying on gc.get_referents():
>>> d = {'foo': 42}
>>> it = iter(d)
>>> dict_from_dictiter(it)
{'foo': 42}
>>> dict_from_dictiter(it) is d
True
>>> list(it)
['foo']
>>> dict_from_dictiter(it) is None
True
For now, at least in CPython versions up to and including Python 3.8, there are no other options available.

Related

Why make lists unhashable?

A common issue on SO is removing duplicates from a list of lists. Since lists are unhashable, set([[1, 2], [3, 4], [1, 2]]) throws TypeError: unhashable type: 'list'. Answers to this kind of question usually involve using tuples, which are immutable and therefore hashable.
This answer to What makes lists unhashable? include the following:
If the hash value changes after it gets stored at a particular slot in the dictionary, it will lead to an inconsistent dictionary. For example, initially the list would have gotten stored at location A, which was determined based on the hash value. If the hash value changes, and if we look for the list we might not find it at location A, or as per the new hash value, we might find some other object.
but I don't quite understand because other types that can be used for dictionary keys can be changed without issue:
>>> d = {}
>>> a = 1234
>>> d[a] = 'foo'
>>> a += 1
>>> d[a] = 'bar'
>>> d
{1234: 'foo', 1235: 'bar'}
It is obvious that if the value of a changes, it will hash to a different location in the dictionary. Why is the same assumption dangerous for a list? Why is the following an unsafe method for hashing a list, since it is what we all use when we need to anyway?
>>> class my_list(list):
... def __hash__(self):
... return tuple(self).__hash__()
...
>>> a = my_list([1, 2])
>>> b = my_list([3, 4])
>>> c = my_list([1, 2])
>>> foo = [a, b, c]
>>> foo
[[1, 2], [3, 4], [1, 2]]
>>> set(foo)
set([[1, 2], [3, 4]])
It seems that this solves the set() problem, why is this an issue? Lists may be mutable, but they are ordered which seems like it would be all that's needed for hashing.

You seem to confuse mutability with rebinding. a += 1 assigns a new object, the int object with the numeric value 1235, to a. Under the hood, for immutable objects like int, a += 1 is just the same as a = a + 1.
The original 1234 object is not mutated. The dictionary is still using an int object with numeric value 1234 as the key. The dictionary still holds a reference to that object, even though a now references a different object. The two references are independent.
Try this instead:
>>> class BadKey:
... def __init__(self, value):
... self.value = value
... def __eq__(self, other):
... return other == self.value
... def __hash__(self):
... return hash(self.value)
... def __repr__(self):
... return 'BadKey({!r})'.format(self.value)
...
>>> badkey = BadKey('foo')
>>> d = {badkey: 42}
>>> badkey.value = 'bar'
>>> print(d)
{BadKey('bar'): 42}
Note that I altered the attribute value on the badkey instance. I didn't even touch the dictionary. The dictionary reflects the change; the actual key value itself was mutated, the object that both the name badkey and the dictionary reference.
However, you now can't access that key anymore:
>>> badkey in d
False
>>> BadKey('bar') in d
False
>>> for key in d:
... print(key, key in d)
...
BadKey('bar') False
I have thoroughly broken my dictionary, because I can no longer reliably locate the key.
That's because BadKey violates the principles of hashability; that the hash value must remain stable. You can only do that if you don't change anything about the object that the hash is based on. And the hash must be based on whatever makes two instances equal.
For lists, the contents make two list objects equal. And you can change those, so you can't produce a stable hash either.

Why can't a tuple be a key for a dictionary in python?

Why can't I do the below thing :
a = (1,2,3)
dict[a] = 'hi'
TypeError: 'type' object does not support item assignment

It can be. The problem is you're trying to access an item in the dict type itself.
>>> a = (1,2,3)
>>> d = {}
>>> d[a] = "hi"
>>> d
{(1, 2, 3): 'hi'}
As #mgilson put it in a comment: "Tuples can be hashed as long as all of their elements can be hashed."
(Note that you should never name your dictionaries dict, or lists list, etc. This shadows the built-in name, and they're often handy to have around, e.g. dict(zip(keys, values)).)

You can use a tuple as a key (as long as all of its items are hashable):
>>> a = (1,2,3)
>>> b = {a:'hi'}
>>> b[(1,2,3)]
'hi'
>>>
Your problem is that you are trying to index the built-in function dict:
>>> dict
<type 'dict'>
>>>

dict is a type. You want to make an instance of that type.
d = {}
a = (1, 2, 3)
d[a] = 'hi'

Is there a Python dict without values?

Instead of this:
a = {"foo": None, "bar": None}
Is there a way to write this?
b = {"foo", "bar"}
And still let b have constant time access (i.e. not a Python set, which cannot be keyed into)?

Actually, in Python 2.7 and 3.2+, this really does work:
>>> b = {"foo", "bar"}
>>> b
set(['foo', 'bar'])
You can't use [] access on a set ("key into"), but you can test for inclusion:
>>> 'x' in b
False
>>> 'foo' in b
True
Sets are as close to value-less dictionaries as it gets. They have average-case constant-time access, require hashable objects (i.e. no storing lists or dicts in sets), and even support their own comprehension syntax:
{x**2 for x in xrange(100)}

Yes, sets:
set() -> new empty set object
set(iterable) -> new set object
Build an unordered collection of unique elements.
Related: How is set() implemented?
Time complexity : https://wiki.python.org/moin/TimeComplexity#set

In order to "key" into a set in constant time use in:
>>> s = set(['foo', 'bar', 'baz'])
>>> 'foo' in s
True
>>> 'fork' in s
False

Bizarre behavior trying to store a deque in a Shelve

I will let the following terminal session speak for itself:
>>> import shelve
>>> s = shelve.open('TestShelve')
>>> from collections import deque
>>> s['store'] = deque()
>>> d = s['store']
>>> print s['store']
deque([])
>>> print d
deque([])
>>> s['store'].appendleft('Teststr')
>>> d.appendleft('Teststr')
>>> print s['store']
deque([])
>>> print d
deque(['Teststr'])
Shouldn't d and s['store'] point to the same object? Why does appendleft work on d but not on s['store']?

shelve is pickleing (serializing) the object. Of necessity, this makes a copy. So, the objects you get back from shelve won't have the same identity as the ones you put in, though they will be equivalent.
If it's important, you could write a deque subclass that automatically re-shelves itself whenever it's modified, although this would probably have poor performance in many use cases.

It turns out they're not the same so any operations you perform on them won't match:
>>> import shelve
>>> s = shelve.open('TestShelve')
>>> from collections import deque
>>> s['store'] = deque()
>>> d = s['store']
>>> id(s['store'])
27439296
>>> id(d)
27439184
To modify items as you coded, you need to pass the parameter writeback=True:
s = shelve.open('TestShelve', writeback=True)
See the documentation:
If the writeback parameter is True, the object will hold a cache of
all entries accessed and write them back to the dict at sync and close
times. This allows natural operations on mutable entries, but can
consume much more memory and make sync and close take a long time.
You can also do it with writeback=False but then you need to write the code exactly as in the provided example:
# having opened d without writeback=True, you need to code carefully:
temp = d['xx'] # extracts the copy
temp.append(5) # mutates the copy
d['xx'] = temp # stores the copy right back, to persist it

Defining a dict with a tuple singleton key

To define a singleton in python use singleton = ('singleton'),
A Python dictionary can use a tuple as a key, as in
[('one', 'two'): 5]
But is it possible to do
[('singleton'),: 5]
Somehow?

Yes, you can do this — but not with ('Singleton'). You've got to use ('Singleton',).
The reason for this is that Python will interpret single parentheses around a single item as merely the item itself. Adding a comma enforces the tuple interpretation.
>>> d = {}
>>> d[('Thing')] = "one"
>>> d.keys()
['Thing']
>>> d[('Thing',)] = "another"
>>> d
{'Thing': 'one', ('Thing',): 'another'}

Signify to python that 'singleton' is a tuple to make it work:
>>> a = {}
>>> a[('singleton',)] = 5
>>> a
{('singleton',): 5}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Given a dict iterator, get the dict - python

Related

Why make lists unhashable?

Why can't a tuple be a key for a dictionary in python?

Is there a Python dict without values?

Bizarre behavior trying to store a deque in a Shelve

Defining a dict with a tuple singleton key

Categories

Resources