Bizarre behavior trying to store a deque in a Shelve - python

I will let the following terminal session speak for itself:
>>> import shelve
>>> s = shelve.open('TestShelve')
>>> from collections import deque
>>> s['store'] = deque()
>>> d = s['store']
>>> print s['store']
deque([])
>>> print d
deque([])
>>> s['store'].appendleft('Teststr')
>>> d.appendleft('Teststr')
>>> print s['store']
deque([])
>>> print d
deque(['Teststr'])
Shouldn't d and s['store'] point to the same object? Why does appendleft work on d but not on s['store']?

shelve is pickleing (serializing) the object. Of necessity, this makes a copy. So, the objects you get back from shelve won't have the same identity as the ones you put in, though they will be equivalent.
If it's important, you could write a deque subclass that automatically re-shelves itself whenever it's modified, although this would probably have poor performance in many use cases.

It turns out they're not the same so any operations you perform on them won't match:
>>> import shelve
>>> s = shelve.open('TestShelve')
>>> from collections import deque
>>> s['store'] = deque()
>>> d = s['store']
>>> id(s['store'])
27439296
>>> id(d)
27439184
To modify items as you coded, you need to pass the parameter writeback=True:
s = shelve.open('TestShelve', writeback=True)
See the documentation:
If the writeback parameter is True, the object will hold a cache of
all entries accessed and write them back to the dict at sync and close
times. This allows natural operations on mutable entries, but can
consume much more memory and make sync and close take a long time.
You can also do it with writeback=False but then you need to write the code exactly as in the provided example:
# having opened d without writeback=True, you need to code carefully:
temp = d['xx'] # extracts the copy
temp.append(5) # mutates the copy
d['xx'] = temp # stores the copy right back, to persist it

Related

Given a dict iterator, get the dict

Given a list iterator, you can find the original list via the pickle protocol:
>>> L = [1, 2, 3]
>>> Li = iter(L)
>>> Li.__reduce__()[1][0] is L
True
Given a dict iterator, how can you find the original dict? I could only find a hacky way using CPython implementation details (via garbage collector):
>>> def get_dict(dict_iterator):
... [d] = gc.get_referents(dict_iterator)
... return d
...
>>> d = {}
>>> get_dict(iter(d)) is d
True
There is no API to find the source iterable object from an iterator. This is intentional, iterators are seen as single-use objects; iterate and discard. A such, they often drop their iterable reference once they have reached the end; what's the point of keeping it if you can't get more elements, anyway?
You see this in both the list and dict iterators, the hacks you found either produce empty objects or None once you are done iterating. List iterators use an empty list when pickled:
>>> l = [1]
>>> it = iter(l)
>>> it.__reduce__()[1][0] is l
True
>>> list(it) # exhaust the iterator
[1]
>>> it.__reduce__()[1][0] is l
False
>>> it.__reduce__()[1][0]
[]
and the dictionary iterator just sets the pointer to the original dictionary to null, so there are no referents left after that:
>>> import gc
>>> it = iter({'foo': 42})
>>> gc.get_referents(it)
[{'foo': 42}]
>>> list(it)
['foo']
>>> gc.get_referents(it)
[]
Both your hacks are just that: hacks. They are implementation dependent and can and probably will change between Python releases. Currently, using iter(dictionary).__reduce__() gets you the equivalent of iter, list(copy(self)) and rather than access to the dictionary because that's deemed a better implementation, but future versions might use something different altogether, etc.
For dictionaries, the only other option currently available is to access the di_dict pointer in the dictiter struct, with ctypes:
import ctypes
class PyObject_HEAD(ctypes.Structure):
_fields_ = [
("ob_refcnt", ctypes.c_ssize_t),
("ob_type", ctypes.c_void_p),
]
class dictiterobject(ctypes.Structure):
_fields_ = [
("ob_base", PyObject_HEAD),
("di_dict", ctypes.py_object),
("di_used", ctypes.c_ssize_t),
("di_pos", ctypes.c_ssize_t),
("di_result", ctypes.py_object), # always NULL for dictkeys_iter
("len", ctypes.c_ssize_t),
]
def dict_from_dictiter(it):
di = dictiterobject.from_address(id(it))
try:
return di.di_dict
except ValueError: # null pointer
return None
This is just as much of a hack as relying on gc.get_referents():
>>> d = {'foo': 42}
>>> it = iter(d)
>>> dict_from_dictiter(it)
{'foo': 42}
>>> dict_from_dictiter(it) is d
True
>>> list(it)
['foo']
>>> dict_from_dictiter(it) is None
True
For now, at least in CPython versions up to and including Python 3.8, there are no other options available.

Alternative to using deepcopy for nested dictionaries?

I have a nested dict like this, but much larger:
d = {'a': {'b': 'c'}, 'd': {'e': {'f':2}}}
I've written a function which takes a dictionary and a path of keys as input and returns the value associated with that path.
>>> p = 'd/e'
>>> get_from_path(d, p)
>>> {'f':2}
Once I get the nested dictionary, I will need to modify it, however, d can not be modified. Do I need to use deepcopy, or is there a more efficient solution that doesn't require constantly making copies of the dictionary?
Depending on your use case, one approach to avoid making changes to an existing dictionary is to wrap it in a collections.ChainMap:
>>> import collections
>>> # here's a dictionary we want to avoid dirty'ing
>>> d = {i: i for in in range(10)}
>>> # wrap into a chain map and make changes there
>>> c = collections.ChainMap({}, d)
Now we can add new keys and values to c without corresponding changes happening in d
>>> c[0] = -100
>>> print(c[0], d[0])
-100 0
Whether this solution is appropriate depends on your use case ... in particular the ChainMap will:
not behave like a regular map when it comes to some things, like deleting keys:
>>> del c[0]
>>> print(c[0])
0
still allow you to modify values in place
>>> d = dict(a=[])
>>> collections.ChainMap({}, d)["a"].append(1)
will alter the list in d
However, if you are merely wishing to take your embedded dictionary and pop some new keys and values on it, then ChainMap may be appropriate.

is there a way to instantiate variables from iterated output in python?

Say I have a list
my_list = ['a','b','c']
and I have a set of values
my_values = [1,2,3]
Is there a way to iterate through my list and set the values of my_list equal to my_values
for i in range(len(my_list)):
## an operation that instantiates my_list[i] as the variable a = my_values[i]
...
>>> print a
1
I just want to do this without copying the text of file that holds the program to a new file, inserting the new lines as strings where they need to go in the program. I'd like to skip the create, rename, destroy, file operations if possible, as I'm dealing with pretty large sets of stuff.
This is probably hackery that you shouldn't do, but since the globals() dict has all the global variables in it, you can add them to the global dict for the module:
>>> my_list = ['a','b','c']
>>> my_values = [1,2,3]
>>> for k, v in zip(my_list, my_values):
... globals()[k] = v
...
>>> a
1
>>> b
2
>>> c
3
But caveat emptor, best not to mix your namespace with your variable values. I don't see anything good coming of it.
I recommend using a normal dict instead to store your values instead of loading them into the global or local namespace.

Is it possible to use python mock for complex multidimensional dictionary input?

I have started reading about the mock library but haven't quite figure out how to use it as an input value to test my sync function.
My sync function takes a multidimensional dictionary from external source and then parse it and translate to various Django database records
I have bravely tried:
sync(MagicMock())
But surely expected to fail bluntly due to the type of the values that mock returns.
So I think I better manually set some return values, I have tried the following experiment:
>>> m = MagicMock()
>>> m['categories'] = [1,2,3]
>>> m['categories'].__class__
<class 'mock.MagicMock'>
>>> m['categories'][0]
<MagicMock name='mock.__getitem__().__getitem__()' id='4557691280'>
Also tried the return_value
>>> m = MagicMock()
>>> m['categories'].return_value = [1,2]
>>> m['categories']
<MagicMock name='mock.__getitem__()' id='4557733712'>
But the code inside the sync function is expecting integer values from the dictionary...
You can do this with PropertyMock:
>>> m = MagicMock()
>>> p = PropertyMock(return_value=3)
>>> type(m).foo = p
>>> m.foo
3
You can do this with patch.dict:
from mock import patch
m = dict()
with patch.dict(m, {'categories': [1, 2, 3]}):
print(m['categories'])
print(m)

Is there a Python dict without values?

Instead of this:
a = {"foo": None, "bar": None}
Is there a way to write this?
b = {"foo", "bar"}
And still let b have constant time access (i.e. not a Python set, which cannot be keyed into)?
Actually, in Python 2.7 and 3.2+, this really does work:
>>> b = {"foo", "bar"}
>>> b
set(['foo', 'bar'])
You can't use [] access on a set ("key into"), but you can test for inclusion:
>>> 'x' in b
False
>>> 'foo' in b
True
Sets are as close to value-less dictionaries as it gets. They have average-case constant-time access, require hashable objects (i.e. no storing lists or dicts in sets), and even support their own comprehension syntax:
{x**2 for x in xrange(100)}
Yes, sets:
set() -> new empty set object
set(iterable) -> new set object
Build an unordered collection of unique elements.
Related: How is set() implemented?
Time complexity : https://wiki.python.org/moin/TimeComplexity#set
In order to "key" into a set in constant time use in:
>>> s = set(['foo', 'bar', 'baz'])
>>> 'foo' in s
True
>>> 'fork' in s
False

Categories

Resources