Introduction
Following dictionary has three levels of keys and then a value.
d = {
1:{
'A':{
'i': 100,
'ii': 200
},
'B':{
'i': 300
}
},
2:{
'A':{
'ii': 500
}
}
}
Examples that need to be added.
d[1]['B']['ii'] = 600 # OK
d[2]['C']['iii'] = 700 # Keyerror on 'C'
d[3]['D']['iv'] = 800 # Keyerror on 3
Problem Statement
I wanted to create code that would create the necessary nested keys and avoid any key errors.
Solution 1
The first solution I came up with, was:
def NewEntry_1(d, lv1, lv2, lv3, value):
if lv1 in d:
if lv2 in d:
d[lv1][lv2][lv3] = value
else:
d[lv1][lv2] = {lv3: value}
else:
d[lv1] = {lv2: {lv3: value}}
Seems legit, but embedding this in order pieces of code made it mind boggling. I explored Stackoverflow for other solutions and was reading on the get() and setdefault() functions.
Solution 2
There is plenty of material to find about get() and setdefault(), but not so much on nested dictionaries. Ultimately I was able to come up with:
def NewEntry_2(d, lv1, lv2, lv3, value):
return d.setdefault(lv1, {}).setdefault(lv2,{}).setdefault(lv3, value)
It is one line of code so it is not really necessary to make it a function. Easily modifiable to include operations:
d[lv1][lv2][lv3] = d.setdefault(lv1, {}).setdefault(lv2,{}).setdefault(lv3, 0) + value
Seems perfect?
Question
When adding large quantities of entries and doing many modifications, is option 2 better than option 1? Or should I define function 1 and call it? The answers I'm looking should take into account speed and/or potential for errors.
Examples
NewEntry_1(d, 1, 'B', 'ii', 600)
# output = {1: {'A': {'i': 100, 'ii': 200}, 'B': {'i': 300, 'ii': 600}}, 2: {'A': {'ii': 500}}}
NewEntry_1(d, 2, 'C', 'iii', 700)
# output = {1: {'A': {'i': 100, 'ii': 200}, 'B': {'i': 300, 'ii': 600}}, 2: {'A': {'ii': 500}, 'C': {'iii': 700}}}
NewEntry_1(d, 3, 'D', 'iv', 800)
# output = {1: {'A': {'i': 100, 'ii': 200}, 'B': {'i': 300, 'ii': 600}}, 2: {'A': {'ii': 500}, 'C': {'iii': 700}}, 3: {'D': {'iv': 800}}}
More background
I'm a business analyst exploring using Python for creating Graph DB that would help me with very specific analysis. The dictionary structure is used to story the influence one node has on one of its neighbors:
lv1 is Node From
lv2 is Node To
lv3 is Iteration
value is Influence (in
%)
In the first iteration Node 1 has direct influence on Node 2. In the second iteration Node 1 influences all the Nodes that Node 2 is influencing.
I'm aware of packages that can help me with it (networkx), but I'm trying to understand Python/GraphDB before I want to start using them.
As for the nested dictionaries, you should take a look at defaultdict. Using it will save you a lot of the function-calling overhead. The nested defaultdict construction resorts to lambda functions for their default factories:
d = defaultdict(lambda: defaultdict(lambda: defaultdict(int))) # new, shiny, empty
d[1]['B']['ii'] = 600 # OK
d[2]['C']['iii'] = 700 # OK
d[3]['D']['iv'] = 800 # OK
Update: A useful trick to know to create a deeply nested defaultdict is the following:
def tree():
return defaultdict(tree)
d = tree()
# now any depth is possible
# d[1][2][3][4][5][6][7][8] = 9
Related
Imagine 2 different yaml files (or for demonstration purposes, 2 dictionaries)
a = {'A':'yes',
'B': 2,
'C': [-1,0,2],
'D': {
'E': True
}}
b = {'A':'yes',
'G': 2,
'C': [-1,0,1],
'F': {
'E': False
}}
Obviously they look very similar, but they have different keys for what intentionally appears to be similar values.
if we do a comparison of the two:
print(DeepDiff(a, b, ignore_order=True, significant_digits=10, verbose_level=2).pretty())
we get this kind of expected result
Item root['G'] (2) added to dictionary.
Item root['F'] ({'E': False}) added to dictionary.
Item root['B'] (2) removed from dictionary.
Item root['D'] ({'E': True}) removed from dictionary.
Value of root['C'][2] changed from 2 to 1.
Since DeefDiff doesn't know that the keys represent the same "things".
It is possible to rename the keys:
b['D'] = b.pop('F')
b['B'] = b.pop('G')
and now the same DeepDiff call results in
Value of root['C'][2] changed from 2 to 1.
Value of root['D']['E'] changed from True to False.
So, is there an efficient way to create a "Translator" for b to a and automatically interpret those difference without manually writing over each key or creating a new dictionary for comparison.
We could create a "mapping dictionary" and iterate through them:
translator_b2a = {'D': 'F',
'B': 'G'}
for key in translator:
value = translator_b2a[key]
b[key] = b.pop(value)
and get the same result... just wonder if there is a method/process more efficient or already designed. This method will obviously break down with the yaml/dictionaries get more complex, such as when the level of the nested keys are different, i.e.
b = {'A':'yes',
'G': 2,
'C': [-1,0,1],
'F': {
'E': {'H':False}
}}
Depending on your needs there are actually several options.
Option 1a: concentrate on the value comparison, but not the structure
from pandas.io.json._normalize import nested_to_record
aflat = nested_to_record(a, sep='')
bflat = nested_to_record(b, sep='')
bflat = dict(zip(list(aflat.keys()), bflat.values()))
print(DeepDiff(aflat, bflat, ignore_order=True, significant_digits=10, verbose_level=2).pretty())
###Output:
###Value of root['C'][2] changed from 2 to 1.
###Value of root['DE'] changed from True to False.
Option 1b: concentrate on the value comparison, but not the structure
import pandas as pd
def flatten(d):
df = pd.json_normalize(d, sep='')
return df.to_dict(orient='records')[0]
aflat = flatten(a)
bflat = flatten(b)
bflat = dict(zip(list(aflat.keys()), bflat.values()))
print(DeepDiff(aflat, bflat, ignore_order=True, significant_digits=10, verbose_level=2).pretty())
###Output:
###Value of root['C'][2] changed from 2 to 1.
###Value of root['DE'] changed from True to False.
Option 2: keep the structure of the first dict
import pandas as pd
def updateDict(init, values, count=0):
items = {}
for k,v in init.items():
if isinstance(v, dict):
items[k] = updateDict(v, values, count, k)
else:
items[k] = values[count]
count += 1
return items
dfb = pd.json_normalize(b, sep='')
b = updateDict(a, dfb.values[0])
print(DeepDiff(a, b, ignore_order=True, significant_digits=10, verbose_level=2).pretty())
###Output:
###Value of root['C'][2] changed from 2 to 1.
###Value of root['D']['E'] changed from True to False.
Well... this works for now, until I get more creative with finding ways to break it... I'm sure someone will make it better. This handles the 2nd example where the nested Keys might not have a 1:1 relationship and also handles where there is an error in the translator (i.e. if the requested key is not in the "b" dictionary, it leaves it as it was... however, if the requested key is not in the "a" dictionary then you'll still have a "difference" show up in the deepdiff).
Don't really have to "enumerate" the first inner loop. The tupley portion converts from a tuple back to dict format.
def translatorB2A():
return [{'b':['F','E','H'], 'a':['D', 'E']},
{'b':['G'], 'a':['B']}]
def convertB2A(b):
translator = translatorB2A()
bNew = copy.deepcopy(b)
for translation in translator:
skip = False
value = bNew
lastKey = ""
for i, key in enumerate(translation['b']):
if key in value.keys():
lastKey = key
value = value.pop(key)
else:
skip = True
if lastKey:
bNew[lastKey] = value
break
newObj = {}
if not skip:
for j, key in enumerate(reversed(translation['a'])):
if j==0:
newObj[key]=value
else:
tupley = newObj.popitem()
newObj[key]= {tupley[0]:tupley[1]}
bNew = bNew | newObj
return bNew
Now you can run it on some dictionaries that represent YAMLs
a = {'A':'yes',
'B': 2,
'C': [-1,0,2],
'D': {
'E': True
}}
b = {'A':'yes',
'G': 2,
'C': [-1,0,1],
'F': {
'E': {'H':False}
}}
bNew = convertB2A(b)
print(bNew)
print(DeepDiff(a, bNew, ignore_order=True, significant_digits=10, verbose_level=2).pretty())
should get a result like:
{'A': 'yes', 'C': [-1, 0, 1], 'D': {'E': False}, 'B': 2}
Value of root['C'][2] changed from 2 to 1.
Value of root['D']['E'] changed from True to False.
Would be nice if there is a way to insert the changed keys in the same order they are in the "translated to" dictionary. i.e. find a way to have b return as
{'A': 'yes', 'B': 2, 'C': [-1, 0, 1], 'D': {'E': False}}
so that it appears at a quick glance the same format as "a"
{'A': 'yes', 'B': 2, 'C': [-1, 0, 2], 'D': {'E': True}}
I have a complex dictionary:
l = {10: [{'a':1, 'T':'y'}, {'a':2, 'T':'n'}], 20: [{'a':3,'T':'n'}]}
When I'm trying to iterate over the dictionary I'm not getting a dictionary with a list for values that are a dictionary I'm getting a tuple like so:
for m in l.items():
print(m)
(10, [{'a': 1, 'T': 'y'}, {'a': 2, 'T': 'n'}])
(20, [{'a': 3, 'T': 'n'}])
But when I just print l I get my original dictionary:
In [7]: l
Out[7]: {10: [{'a': 1, 'T': 'y'}, {'a': 2, 'T': 'n'}], 20: [{'a': 3, 'T': 'n'}]}
How do I iterate over the dictionary? I still need the keys and to process each dictionary in the value list.
There are two questions here. First, you ask why this is turned into a "tuple" - the answer to that question is because that is what the .items() method on dictionaries returns - a tuple of each key/value pair.
Knowing this, you can then decide how to use this information. You can choose to expand the tuple into the two parts during iteration
for k, v in l.items():
# Now k has the value of the key and v is the value
# So you can either use the value directly
print(v[0]);
# or access using the key
value = l[k];
print(value[0]);
# Both yield the same value
With a dictionary you can add another variable while iterating over it.
for key, value in l.items():
print(key,value)
I often rely on pprint when processing a nested object to know at a glance what structure that I am dealing with.
from pprint import pprint
l = {10: [{'a':1, 'T':'y'}, {'a':2, 'T':'n'}], 20: [{'a':3,'T':'n'}]}
pprint(l, indent=4, width=40)
Output:
{ 10: [ {'T': 'y', 'a': 1},
{'T': 'n', 'a': 2}],
20: [{'T': 'n', 'a': 3}]}
Others have already answered with implementations.
Thanks for all the help. I did discuss figure out how to process this. Here is the implementation I came up with:
for m in l.items():
k,v = m
print(f"key: {k}, val: {v}")
for n in v:
print(f"key: {n['a']}, val: {n['T']}")
Thanks for everyones help!
I would like to shuffle the key value pairs in this dictionary so that the outcome has no original key value pairs. Starting dictionary:
my_dict = {'A':'a',
'K':'k',
'P':'p',
'Z':'z'}
Example of unwanted outcome:
my_dict_shuffled = {'Z':'a',
'K':'k', <-- Original key value pair
'A':'p',
'P':'z'}
Example of wanted outcome:
my_dict_shuffled = {'Z':'a',
'A':'k',
'K':'p',
'P':'z'}
I have tried while loops and for loops with no luck. Please help! Thanks in advance.
Here's a fool-proof algorithm I learned from a Numberphile video :)
import itertools
import random
my_dict = {'A': 'a',
'K': 'k',
'P': 'p',
'Z': 'z'}
# Shuffle the keys and values.
my_dict_items = list(my_dict.items())
random.shuffle(my_dict_items)
shuffled_keys, shuffled_values = zip(*my_dict_items)
# Offset the shuffled values by one.
shuffled_values = itertools.cycle(shuffled_values)
next(shuffled_values, None) # Offset the values by one.
# Guaranteed to have each value paired to a random different key!
my_random_dict = dict(zip(shuffled_keys, shuffled_values))
Disclaimer (thanks for mentioning, #jf328): this will not generate all possible permutations! It will only generate permutations with exactly one "cycle". Put simply, the algorithm will never give you the following outcome:
{'A': 'k',
'K': 'a',
'P': 'z',
'Z': 'p'}
However, I imagine you can extend this solution by building a random list of sub-cycles:
(2, 2, 3) => concat(zip(*items[0:2]), zip(*items[2:4]), zip(*items[4:7]))
A shuffle which doesn't leave any element in the same place is called a derangement. Essentially, there are two parts to this problem: first to generate a derangement of the keys, and then to build the new dictionary.
We can randomly generate a derangement by shuffling until we get one; on average it should only take 2-3 tries even for large dictionaries, but this is a Las Vegas algorithm in the sense that there's a tiny probability it could take a much longer time to run than expected. The upside is that this trivially guarantees that all derangements are equally likely.
from random import shuffle
def derangement(keys):
if len(keys) == 1:
raise ValueError('No derangement is possible')
new_keys = list(keys)
while any(x == y for x, y in zip(keys, new_keys)):
shuffle(new_keys)
return new_keys
def shuffle_dict(d):
return { x: d[y] for x, y in zip(d, derangement(d)) }
Usage:
>>> shuffle_dict({ 'a': 1, 'b': 2, 'c': 3 })
{'a': 2, 'b': 3, 'c': 1}
theonewhocodes, does this work, if you don't have a right answer, can you update your question with a second use case?
my_dict = {'A':'a',
'K':'k',
'P':'p',
'Z':'z'}
while True:
new_dict = dict(zip(list(my_dict.keys()), random.sample(list(my_dict.values()),len(my_dict))))
if new_dict.items() & my_dict.items():
continue
else:
break
print(my_dict)
print(new_dict)
I have the following (very simplified) dict. The get_details function is an API call that I would like to avoid doing twice.
ret = {
'a': a,
'b': [{
'c': item.c,
'e': item.get_details()[0].e,
'h': [func_h(detail) for detail in item.get_details()],
} for item in items]
}
I could of course rewrite the code like this:
b = []
for item in items:
details = item.get_details()
b.append({
'c': item.c,
'e': details[0].e,
'h': [func_h(detail) for detail in details],
})
ret = {
'a': a,
'b': b
}
but would like to use the first approach since it seems more pythonic.
You could use an intermediary generator to extract the details from your items. Something like this:
ret = {
'a': a,
'b': [{
'c': item.c,
'e': details[0].e,
'h': [func_h(detail) for detail in details],
} for (item, details) in ((item, item.get_details()) for item in items)]
}
I don't find the second one particularly un-pythonic; you have a complex initialization, and you shouldn't expect to boil down to a single simple expression. That said, you don't need the temporary list b; you can work directly with ret['b']:
ret = {
'a': a,
'b': []
}
for item in items:
details = item.get_details()
d = details[0]
ret['b'].append({
'c': item.c,
'e': d.e,
'h': map(func_h, details)
})
This is also a case where I would choose map over a list comprehension. (If this were Python 3, you would need to wrap that in an additional call to list.)
I wouldn't try too hard to be more pythonic if it means looking like your first approach. I would take your second approach a step further, and just use a separate function:
ret = {
'a': a,
'b': get_b_from_items(items)
}
I think that's as clean as it can get. Use comments/docstrings to indicate what 'b' is, test the function, and then the next person who comes along can quickly read and trust your code. I know you know how to write the function, but for the sake of completeness, here's how I would do it:
# and add this in where you want it
def get_b_from_items(items):
"""Return a list of (your description here)."""
result = []
for item in items:
details = item.get_details()
result.append({
'c': item.c,
'e': details[0].e,
'h': [func_h(detail) for detail in details],
})
return result
That is plenty pythonic (note the docstring-- very pythonic), and very readable. And of course, it has the advantage of being slightly more granularly testable, complex logic abstracted away from the higher level logic, and all the other advantages of using functions.
I'm trying to invert a simple dictionary like:
{'a' : 1, 'b' : 2, 'c' : 3, 'd' : 4}
I'm using this function:
def invert(d):
return dict([(x,y) for y,x in d.iteritems()])
Now when I invert my dictionary, everything works out fine. When I invert it twice however, I get:
{'a': 1, 'c': 3, 'b': 2, 'd': 4}
which is not in the same order as the dictionary I started with. Is there a problem with my invert function? Sorry I'm kinda new to python, but thanks for any help!
That is correct, dictionaries are unordered in python
from another so answer answer:
CPython implementation detail: Keys and values are listed in an
arbitrary order which is non-random, varies across Python
implementations, and depends on the dictionary’s history of insertions
and deletions.
from the docs:
It is best to think of a dictionary as an unordered set of key: value
pairs, with the requirement that the keys are unique (within one
dictionary). A pair of braces creates an empty dictionary: {}. Placing
a comma-separated list of key:value pairs within the braces adds
initial key:value pairs to the dictionary; this is also the way
dictionaries are written on output.
Python dictionaries are unsorted by design.
You can use collections.OrderedDict instead if you really need this behaviour.
Try running this code:
d = {
'a' : 1, 'b' : 2,
'c' : 3, 'd' : 4
}
def invert(d):
return dict([(x,y) for y,x in d.iteritems()])
print d
d = invert(d)
print d
d = invert(d)
print d
This is the output:
{'a': 1, 'c': 3, 'b': 2, 'd': 4}
{1: 'a', 2: 'b', 3: 'c', 4: 'd'}
{'a': 1, 'c': 3, 'b': 2, 'd': 4}
As you can see, it technically is the same dictionary, but when you declare it, it is unordered.