I have a nested dict like this, but much larger:
d = {'a': {'b': 'c'}, 'd': {'e': {'f':2}}}
I've written a function which takes a dictionary and a path of keys as input and returns the value associated with that path.
>>> p = 'd/e'
>>> get_from_path(d, p)
>>> {'f':2}
Once I get the nested dictionary, I will need to modify it, however, d can not be modified. Do I need to use deepcopy, or is there a more efficient solution that doesn't require constantly making copies of the dictionary?
Depending on your use case, one approach to avoid making changes to an existing dictionary is to wrap it in a collections.ChainMap:
>>> import collections
>>> # here's a dictionary we want to avoid dirty'ing
>>> d = {i: i for in in range(10)}
>>> # wrap into a chain map and make changes there
>>> c = collections.ChainMap({}, d)
Now we can add new keys and values to c without corresponding changes happening in d
>>> c[0] = -100
>>> print(c[0], d[0])
-100 0
Whether this solution is appropriate depends on your use case ... in particular the ChainMap will:
not behave like a regular map when it comes to some things, like deleting keys:
>>> del c[0]
>>> print(c[0])
0
still allow you to modify values in place
>>> d = dict(a=[])
>>> collections.ChainMap({}, d)["a"].append(1)
will alter the list in d
However, if you are merely wishing to take your embedded dictionary and pop some new keys and values on it, then ChainMap may be appropriate.
Related
This question already has answers here:
How do I initialize a dictionary of empty lists in Python?
(7 answers)
Closed 2 years ago.
I came across this behavior that surprised me in Python 2.6 and 3.2:
>>> xs = dict.fromkeys(range(2), [])
>>> xs
{0: [], 1: []}
>>> xs[0].append(1)
>>> xs
{0: [1], 1: [1]}
However, dict comprehensions in 3.2 show a more polite demeanor:
>>> xs = {i:[] for i in range(2)}
>>> xs
{0: [], 1: []}
>>> xs[0].append(1)
>>> xs
{0: [1], 1: []}
>>>
Why does fromkeys behave like that?
Your Python 2.6 example is equivalent to the following, which may help to clarify:
>>> a = []
>>> xs = dict.fromkeys(range(2), a)
Each entry in the resulting dictionary will have a reference to the same object. The effects of mutating that object will be visible through every dict entry, as you've seen, because it's one object.
>>> xs[0] is a and xs[1] is a
True
Use a dict comprehension, or if you're stuck on Python 2.6 or older and you don't have dictionary comprehensions, you can get the dict comprehension behavior by using dict() with a generator expression:
xs = dict((i, []) for i in range(2))
In the first version, you use the same empty list object as the value for both keys, so if you change one, you change the other, too.
Look at this:
>>> empty = []
>>> d = dict.fromkeys(range(2), empty)
>>> d
{0: [], 1: []}
>>> empty.append(1) # same as d[0].append(1) because d[0] references empty!
>>> d
{0: [1], 1: [1]}
In the second version, a new empty list object is created in every iteration of the dict comprehension, so both are independent from each other.
As to "why" fromkeys() works like that - well, it would be surprising if it didn't work like that. fromkeys(iterable, value) constructs a new dict with keys from iterable that all have the value value. If that value is a mutable object, and you change that object, what else could you reasonably expect to happen?
To answer the actual question being asked: fromkeys behaves like that because there is no other reasonable choice. It is not reasonable (or even possible) to have fromkeys decide whether or not your argument is mutable and make new copies every time. In some cases it doesn't make sense, and in others it's just impossible.
The second argument you pass in is therefore just a reference, and is copied as such. An assignment of [] in Python means "a single reference to a new list", not "make a new list every time I access this variable". The alternative would be to pass in a function that generates new instances, which is the functionality that dict comprehensions supply for you.
Here are some options for creating multiple actual copies of a mutable container:
As you mention in the question, dict comprehensions allow you to execute an arbitrary statement for each element:
d = {k: [] for k in range(2)}
The important thing here is that this is equivalent to putting the assignment k = [] in a for loop. Each iteration creates a new list and assigns it to a value.
Use the form of the dict constructor suggested by #Andrew Clark:
d = dict((k, []) for k in range(2))
This creates a generator which again makes the assignment of a new list to each key-value pair when it is executed.
Use a collections.defaultdict instead of a regular dict:
d = collections.defaultdict(list)
This option is a little different from the others. Instead of creating the new list references up front, defaultdict will call list every time you access a key that's not already there. You can there fore add the keys as lazily as you want, which can be very convenient sometimes:
for k in range(2):
d[k].append(42)
Since you've set up the factory for new elements, this will actually behave exactly as you expected fromkeys to behave in the original question.
Use dict.setdefault when you access potentially new keys. This does something similar to what defaultdict does, but it has the advantage of being more controlled, in the sense that only the access you want to create new keys actually creates them:
d = {}
for k in range(2):
d.setdefault(k, []).append(42)
The disadvantage is that a new empty list object gets created every time you call the function, even if it never gets assigned to a value. This is not a huge problem, but it could add up if you call it frequently and/or your container is not as simple as list.
I wonder how to accomplish something like this in python:
d = {'a': 3}
a_value_ref = d['a']
a_value_ref = 6
assert d['a'] == 6
I want to first calculate reference to a specific value in a multilevel dict and then modify it by using this mechanism. Is it possible? It's easy using C/C++. Thank you for your help.
You can't do that, because integers assignment changes its reference, and you cannot change its value without that (immutability of integers)
You could do that using a single integer in a list, that would work.
d = {'a': [3]}
a_value_ref = d['a']
a_value_ref[0] = 6
assert d['a'] == [6]
This is not a way how Python works, workarounds are not a good solutions. I have changed the way of accessing dict's keys and now I update values using the dict[key] = value syntax.
I am using tuples as the key for a dictionary I created. For example:
example_dict = {}
example_dict[("A", "B")] = "1"
Later when I wish to modify the value of an entry in the dictionary I don't currently have control over the order of the tuple. For example:
("B", "A") may be the case, instead of ("A", "B")
I'm aware that these tuples are not equal from a simple == comparison that I tried in the python shell.
What I am wondering is how I could work around this? How could I make the following not produce a KeyError:
print (example_dict["B", "A"])
Is there a way to consistently order the elements of a tuple? Is there a way to ignore order completely? Any other work arounds? I'm aware I could just include all arrangements of the tuples as keys in the dictionary, and then collate the values of the different permutations later. I strongly want to avoid doing this as that only adds difficulty and complexity to the problem.
The usual ways are to either sort the keys:
example_dict[tuple(sorted(key_tuple))] = "1"
use frozensets as keys (if there won't be duplicate elements in the tuples):
example_dict[frozenset(key_tuple)] = "1"
or use frozensets of (item, count) tuples as keys (if there can be duplicate elements in the tuples):
example_dict[frozenset(Counter(key_tuple).viewitems())] = "1"
Whichever option you choose, you'll have to apply the same transformation when you look up values.
You want your dictionary keys to be "sets" (a set is a collection for which an item is either in or not in the set, but that has no concept of order). Luckily python has what you need. Specifically because you need something hashable you want to use frozenset.
>>> example_dict = {}
>>> example_dict[frozenset(("A", "B"))] = "1"
>>> example_dict[frozenset(("B", "A"))]
'1'
>>> example_dict[frozenset(("A", "B"))]
'1'
Instead of using a tuple, use a frozenset. A frozenset is just a constant set, just as a tuple can be thought of as a constant list.
Here's an example (from Python 3, but it will work in Python 2 as well):
>>> d = {}
>>> k1 = frozenset((1, 2))
>>> k2 = frozenset((2, 1))
>>> k1
frozenset({1, 2})
>>> k2
frozenset({1, 2})
>>> k1 == k2
True
>>> d[k1] = 123
>>> d[k2]
123
>>>
In trying to use a list comprehension to make a list given a conditional, I see the following:
In [1]: mydicts = [{'foo':'val1'},{'foo':''}]
In [2]: mylist = [d for d in mydicts if d['foo']]
In [3]: mylist
Out[3]: [{'foo': 'val1'}]
In [4]: mydicts[1]['foo'] = 'val2'
In [5]: mydicts
Out[5]: [{'foo': 'val1'}, {'foo': 'val2'}]
In [6]: mylist
Out[6]: [{'foo': 'val1'}]
I've been reading the docs to try and understand this but have come up with nothing so far, so I'll ask my question here: why is it that mylist never includes {'foo': 'val2'} even though the reference in the list comprehension points to mydict, which by In [6] contains {'foo': 'val2'}? Is this because Python eagerly evaluates list comprehensions? Or is the lazy/eager dichotomy totally irrelevant to this?
There's no lazy evaluation of lists in Python. List comprehensions simply create a new list. If you want "lazy" evaluation, use a generator expression instead.
my_generator_expression = (d for d in mydicts if d['foo']) # note parentheses
mydicts[1]['foo'] = 'val2'
print(my_generator_expression) # >>> <generator object <genexpr> at 0x00000000>
for d in my_generator_expression:
print(d) # >>> {'foo': 'val1'}
# >>> {'foo': 'val2'}
Note that generators differ from lists in several important ways. Perhaps the most notable is that once you iterate over them, they are exhausted, so they're best to use if you only need the data they contain once.
I think you're a bit confused about what list comprehensions do.
When you do this:
[d for d in mydicts if d['foo']]
That evaluates to a new list. So, when you do this:
mylist = [d for d in mydicts if d['foo']]
You're assigning that list as the value of mylist. You can see this very easily:
assert type(mylist) == list
You're not assigning "a list comprehension" that gets reevaluated every time to mylist. There are no magic values in Python that get reevaluated every time. (You can fake them by, e.g., creating a class with a #property, but that's not really an exception; it's the expression myobj.myprop that's being reevaluated, not myprop itself.)
In fact, mylist = [d for d in mydicts if d['foo']] is basically the same mylist = [1, 2, 3].* In both cases, you're creating a new list, and assigning it to mylist. You wouldn't expect the second one to re-evaluate [1, 2, 3] each time (otherwise, doing mylist[0] = 0 wouldn't do much good, because as soon as you try to view mylist you'd be getting a new, pristine list!). The same is true here.
* In Python 3.x, they aren't just basically the same; they're both just different types of list displays. In 2.x, it's a bit more murky, and they just happen to both evaluate to new list objects.
mylist contains the result of a previous list comprehension evaluation, it won't magically updated just because you update a variable that was used for its computation.
I'm going through a whole bunch of tuples with a many-to-many correlation, and I want to make a dictionary where each b of (a,b) has a list of all the a's that correspond to a b. It seems awkward to test for a list at key b in the dictionary, then look for an a, then append a if it's not already there, every single time through the tuple digesting loop; but I haven't found a better way yet. Does one exist? Is there some other way to do this that's a lot prettier?
See the docs for the setdefault() method:
setdefault(key[, default])
If key is
in the dictionary, return its value.
If not, insert key with a value of
default and return default. default
defaults to None.
You can use this as a single call that will get b if it exists, or set b to an empty list if it doesn't already exist - and either way, return b:
>>> key = 'b'
>>> val = 'a'
>>> print d
{}
>>> d.setdefault(key, []).append(val)
>>> print d
{'b': ['a']}
>>> d.setdefault(key, []).append('zee')
>>> print d
{'b': ['a', 'zee']}
Combine this with a simple "not in" check and you've done what you're after in three lines:
>>> b = d.setdefault('b', [])
>>> if val not in b:
... b.append(val)
...
>>> print d
{'b': ['a', 'zee', 'c']}
Assuming you're not really tied to lists, defaultdict and set are quite handy.
import collections
d = collections.defaultdict(set)
for a, b in mappings:
d[b].add(a)
If you really want lists instead of sets, you could follow this with a
for k, v in d.iteritems():
d[k] = list(v)
And if you really want a dict instead of a defaultdict, you can say
d = dict(d)
I don't really see any reason you'd want to, though.
Use collections.defaultdict
your_dict = defaultdict(list)
for (a,b) in your_list:
your_dict[b].append(a)
you can sort your tuples O(n log n) then create your dictionary O(n)
or simplier O(n) but could impose heavy load on memory in case of many tuples:
your_dict = {}
for (a,b) in your_list:
if b in your_dict:
your_dict[b].append(a)
else:
your_dict[b]=[a]
Hmm it's pretty much the same as you've described. What's awkward about that?
You could also consider using an sql database to do the dirty work.
Instead of using an if, AFAIK it is more pythonic to use a try block instead.
your_list=[('a',1),('a',3),('b',1),('f',1),('a',2),('z',1)]
your_dict={}
for (a,b) in your_list:
try:
your_dict[b].append(a)
except KeyError:
your_dict[b]=[a]
print your_dict
I am not sure how you will get out of the key test, but once they key/value pair has been initialized it is easy :)
d = {}
if 'b' not in d:
d['b'] = set()
d['b'].add('a')
The set will ensure that only 1 of 'a' is in the collection. You need to do the initial 'b' check though to make sure the key/value exist.
Dict get method?
It returns the value of my_dict[some_key] if some_key is in the dictionary, and if not - returns some default value ([] in the example below):
my_dict[some_key] = my_dict.get(some_key, []).append(something_else)
There's another way that's rather efficient (though maybe not as efficient as sets) and simple. It's similar in practice to defaultdict but does not require an additional import.
Granted that you have a dict with empty (None) keys, it means you also create the dict keys somewhere. You can do so with the dict.fromkeys method, and this method also allows for setting a default value to all keys.
keylist = ['key1', 'key2']
result = dict.fromkeys(keylist, [])
where result will be:
{'key1': [], 'key2': []}
Then you can do your loop and use result['key1'].append(..) directly