transform ordered dictionaries to list of dictionaries iteratively - python

EDIT: I realized I made a mistake in my data structure and updated it.
I have an ordered dict with some values as list and some as list of lists:
odict = OrderedDict([('A', [33]),
('B',
[[{'AA': 'DOG', 'BB': '2'},
{'AA': 'CAT', 'BB': '1'}]]),
('C',
[['01','012']])
])
I am currently manually adding the keys and values in this function:
def odict_to_listdict(odict, key1: str, key2: str, key3: str) -> list:
return [{key1: v, key2: v2, key3: v3} for v1, v2, v3 in \
zip(odict[key1], odict[key2], odict[key3])]
odict_to_listdict(odict,'key1','key2','key3')
that gives me the expected output of a list of dictionaries:
[{'A': 33,
'B': [{'AA': 'DOG', 'BB': '2'}, {'AA': 'CAT', 'BB': '1'}],
'C': ['01', '012']}]
I plan to add more keys, values, how do I iterate through the ordered dict without explicitly typing the keys while maintaining the expected output, for example:
[{'A': 33,
'B': [{'AA': 'DOG', 'BB': '2'}, {'AA': 'CAT', 'BB': '1'}],
'C': ['01', '012']}],
'D': 42,
'E': [{'A': 1}]
}]

First off, from Python 3.7+ you can use normal dict since it maintains the insertion order.
You can re-write your function like:
def odict_to_listdict(odict):
keys = list(odict)
return [dict(zip(keys, i)) for i in zip(*odict.values())]
Test:
from collections import OrderedDict
odict = OrderedDict()
odict['key1'] = ['1', '2']
odict['key2'] = ['Apple', 'Orange']
odict['key3'] = ['bla', 'bla']
def odict_to_listdict(odict):
keys = list(odict)
return [dict(zip(keys, i)) for i in zip(*odict.values())]
print(odict_to_listdict(odict))
output:
[{'key1': '1', 'key2': 'Apple', 'key3': 'bla'}, {'key1': '2', 'key2': 'Orange', 'key3': 'bla'}]
Explanation :
You can get the keys from the dictionary itself, no need to pass it to the function.
the iteration part of the odict is basically iterating through the odict dictionary in parallel.
For the expression part of list comp, you need to zip the keys and the values which get returned from the iteration part.

Related

Reshaping a large dictionary

I am working on xbrl document parsing. I got to a point where I have a large dic structured like this....
sample of a dictionary I'm working on
Since it's bit challenging to describe the pattern of what I'm trying to achieve I just put an example of what I'd like it to be...
sample of what I'm trying to achieve
Since I'm fairly new to programing, I'm hustling for days with this. Trying different approaches with loops, list and dic comprehension starting from here...
for k in storage_gaap:
if 'context_ref' in storage_gaap[k]:
for _k in storage_gaap[k]['context_ref']:
storage_gaap[k]['context_ref']={_k}```
storage_gaap being the master dictionary. Sorry for attaching pictures, but it's just much clearer to see the dictionary
I'd really appreciate any and ever help
Here's a solution using zip and dictionary comprehension to do what you're trying to do using toy data in a similar structure.
import itertools
import pprint
# Sample data similar to provided screenshots
data = {
'a': {
'id': 'a',
'vals': ['a1', 'a2', 'a3'],
'val_num': [1, 2, 3]
},
'b': {
'id': 'b',
'vals': ['b1', 'b2', 'b3'],
'val_num': [4, 5, 6]
}
}
# Takes a tuple of keys, and a list of tuples of values, and transforms them into a list of dicts
# i.e ('id', 'val'), [('a', 1), ('b', 2) => [{'id': 'a', 'val': 1}, {'id': 'b', 'val': 2}]
def get_list_of_dict(keys, list_of_tuples):
list_of_dict = [dict(zip(keys, values)) for values in list_of_tuples]
return list_of_dict
def process_dict(key, values):
# Transform the dict with lists of values into a list of dicts
list_of_dicts = get_list_of_dict(('id', 'val', 'val_num'), zip(itertools.repeat(key, len(values['vals'])), values['vals'], values['val_num']))
# Dictionary comprehension to group them based on the 'val' property of each dict
return {d['val']: {k:v for k,v in d.items() if k != 'val'} for d in list_of_dicts}
# Reorganize to put dict under a 'context_values' key
processed = {k: {'context_values': process_dict(k, v)} for k,v in data.items()}
# {'a': {'context_values': {'a1': {'id': 'a', 'val_num': 1},
# 'a2': {'id': 'a', 'val_num': 2},
# 'a3': {'id': 'a', 'val_num': 3}}},
# 'b': {'context_values': {'b1': {'id': 'b', 'val_num': 4},
# 'b2': {'id': 'b', 'val_num': 5},
# 'b3': {'id': 'b', 'val_num': 6}}}}
pprint.pprint(processed)
Ok, Here is the updated solution from my case. Catch for me was the was the zip function since it only iterates over the smallest list passed. Solution was the itertools.cycle method Here is the code:
data = {'us-gaap_WeightedAverageNumberOfDilutedSharesOutstanding': {'context_ref': ['D20210801-20220731',
'D20200801-20210731',
'D20190801-20200731',
'D20210801-20220731',
'D20200801-20210731',
'D20190801-20200731'],
'decimals': ['-5',
'-5',
'-5',
'-5',
'-5',
'-5'],
'id': ['us-gaap:WeightedAverageNumberOfDilutedSharesOutstanding'],
'master_id': ['us-gaap_WeightedAverageNumberOfDilutedSharesOutstanding'],
'unit_ref': ['shares',
'shares',
'shares',
'shares',
'shares',
'shares'],
'value': ['98500000',
'96400000',
'96900000',
'98500000',
'96400000',
'96900000']},
def get_list_of_dict(keys, list_of_tuples):
list_of_dict = [dict(zip(keys, values)) for values in list_of_tuples]
return list_of_dict
def process_dict(k, values):
list_of_dicts = get_list_of_dict(('context_ref', 'decimals', 'id','master_id','unit_ref','value'),
zip((values['context_ref']),values['decimals'],itertools.cycle(values['id']),
itertools.cycle(values['master_id']),values['unit_ref'], values['value']))
return {d['context_ref']: {k:v for k,v in d.items()if k != 'context_ref'} for d in list_of_dicts}
processed = {k: {'context_values': process_dict(k, v)} for k,v in data.items()}
pprint.pprint(processed)

How to merge data from multiple dictionaries with repeating keys?

I have two dictionaries:
dict1 = {'a': '2', 'b': '10'}
dict2 = {'a': '25', 'b': '7'}
I need to save all the values for same key in a new dictionary.
The best i can do so far is: defaultdict(<class 'list'>, {'a': ['2', '25'], 'b': ['10', '7']})
dd = defaultdict(list)
for d in (dict1, dict2):
for key, value in d.items():
dd[key].append(value)
print(dd)
that does not fully resolve the problem since a desirable result is:
a = {'dict1':'2', 'dict2':'25'}
b = {'dict2':'10', 'dict2':'7'}
Also i possibly would like to use new dictionary key same as initial dictionary name
Your main problem is that you're trying to cross the implementation boundary between a string value and a variable name. This is almost always bad design. Instead, start with all of your labels as string data:
table = {
"dict1": {'a': '2', 'b': '10'},
"dict2": {'a': '25', 'b': '7'}
}
... or, in terms of your original post:
table = {
"dict1": dict1,
"dict2": dict2
}
From here, you should be able to invert the levels to obtain
invert = {
"a": {'dict1': '2', 'dict2': '25'},
"b": {'dict2': '10', 'dict2': '7'}
}
Is that enough to get your processing where it needs to be? Keeping the data in comprehensive dicts like this, will make it easier to iterate through the sub-dicts as needed.
As #Prune suggested, structuring your result as a nested dictionary will be easier:
{'a': {'dict1': '2', 'dict2': '25'}, 'b': {'dict1': '10', 'dict2': '7'}}
Which could be achieved with a dict comprehension:
{k: {"dict%d" % i: v2 for i, v2 in enumerate(v1, start=1)} for k, v1 in dd.items()}
If you prefer doing it without a comprehension, you could do this instead:
result = {}
for k, v1 in dd.items():
inner_dict = {}
for i, v2 in enumerate(v1, start=1):
inner_dict["dict%d" % i] = v2
result[k] = inner_dict
Note: This assumes you want to always want to keep the "dict1", "dict2",... key structure.

Convert list to dictionary with duplicate keys using dict comprehension [duplicate]

This question already has answers here:
How can one make a dictionary with duplicate keys in Python?
(9 answers)
Closed 6 months ago.
Good day all,
I am trying to convert a list of length-2 items to a dictionary using the below:
my_list = ["b4", "c3", "c5"]
my_dict = {key: value for (key, value) in my_list}
The issue is that when a key occurrence is more than one in the list, only the last key and its value are kept.
So in this case instead of
my_dict = {'c': '3', 'c': '5', 'b': '4'}
I get
my_dict = {'c': '5', 'b': '4'}
How can I keep all key:value pairs even if there are duplicate keys.
Thanks
For one key in a dictionary you can only store one value.
You can chose to have the value as a list.
{'b': ['4'], 'c': ['3', '5']}
following code will do that for you :
new_dict = {}
for (key, value) in my_list:
if key in new_dict:
new_dict[key].append(value)
else:
new_dict[key] = [value]
print(new_dict)
# output: {'b': ['4'], 'c': ['3', '5']}
Same thing can be done with setdefault. Thanks #Aadit M Shah for pointing it out
new_dict = {}
for (key, value) in my_list:
new_dict.setdefault(key, []).append(value)
print(new_dict)
# output: {'b': ['4'], 'c': ['3', '5']}
Same thing can be done with defaultdict. Thanks #MMF for pointing it out.
from collections import defaultdict
new_dict = defaultdict(list)
for (key, value) in my_list:
new_dict[key].append(value)
print(new_dict)
# output: defaultdict(<class 'list'>, {'b': ['4'], 'c': ['3', '5']})
you can also chose to store the value as a list of dictionaries:
[{'b': '4'}, {'c': '3'}, {'c': '5'}]
following code will do that for you
new_list = [{key: value} for (key, value) in my_list]
If you don't care about the O(n^2) asymptotic behaviour you can use a dict comprehension including a list comprehension:
>>> {key: [i[1] for i in my_list if i[0] == key] for (key, value) in my_list}
{'b': ['4'], 'c': ['3', '5']}
or the iteration_utilities.groupedby function (which might be even faster than using collections.defaultdict):
>>> from iteration_utilities import groupedby
>>> from operator import itemgetter
>>> groupedby(my_list, key=itemgetter(0), keep=itemgetter(1))
{'b': ['4'], 'c': ['3', '5']}
You can use defaultdict to avoid checking if a key is in the dictionnary or not :
from collections import defaultdict
my_dict = defaultdict(list)
for k, v in my_list:
my_dict[k].append(v)
Output :
defaultdict(list, {'b': ['4'], 'c': ['3', '5']})

Match two dictionaries by key and return array of values

I have two dictionaries that I want to match by key in order to create a new dictionary with every value in dict1 as key and an list of each matching key's values in dict2 as the value in the output. The example should be less confusing:
dict1 = {'AAA': 'id5', 'BBB': 'id3', 'CCC': 'id8', 'DDD': 'id3'}
dict2 = {'AAA': 'value8', 'BBB': 'value24', 'CCC': 'value13', 'DDD': 'value2'}
What I have tried:
keys = set(dict1) & set(dict2)
complete = {}
for x in keys:
key = dict1[x]
value = dict2[x]
complete[key] = [value]
Output:
complete = {'id3': ['value24'], 'id5': ['value8'], 'id8': ['value13']}
Desired output:
complete = {'id3': ['value24', 'value2'], 'id5': ['value8'], 'id8': ['value13']}
In reality the dictionaries are quite large so performance is an important factor. Any help is appreciated.
The dict.keys() method returns a dictionary view that already acts as a set. All you need to do is take the union of those views.
If your values from dict1 are not unique, use dict.setdefault() to build lists of values:
output = {}
for key in dict1.keys() & dict2.keys():
output.setdefault(dict1[key], []).append(dict2[key])
Demo:
>>> dict1 = {'AAA': 'id5', 'BBB': 'id3', 'CCC': 'id8', 'DDD': 'id3'}
>>> dict2 = {'AAA': 'value8', 'BBB': 'value24', 'CCC': 'value13', 'DDD': 'value2'}
>>> output = {}
>>> for key in dict1.keys() & dict2.keys():
... output.setdefault(dict1[key], []).append(dict2[key])
...
>>> output
{'id8': ['value13'], 'id3': ['value24', 'value2'], 'id5': ['value8']}
This is about as efficient as it'll get.

Python: Combine several nested lists into a dictionary

I have a bunch of lists like the following two:
['a', ['b', ['x', '1'], ['y', '2']]]
['a', ['c', ['xx', '4'], ['gg', ['m', '3']]]]
What is the easiest way to combine all of them into a single dictionary that looks like:
{'a': {
'b': {
'x':1,
'y':2
}
'c': {
'xx':4,
'gg': {
'm':3
}
}
}
The depth of nesting is variable.
Here's a very crude implementation, it does not handle weird cases such as lists having less than two elements and it overwrites duplicate keys, but its' something to get you started:
l1 = ['a', ['b', ['x', '1'], ['y', '2']]]
l2 = ['a', ['c', ['xx', '4'], ['gg', ['m', '3']]]]
def combine(d, l):
if not l[0] in d:
d[l[0]] = {}
for v in l[1:]:
if type(v) == list:
combine(d[l[0]],v)
else:
d[l[0]] = v
h = {}
combine(h, l1)
combine(h, l2)
print h
Output:
{'a': {'c': {'gg': {'m': '3'}, 'xx': '4'}, 'b': {'y': '2', 'x': '1'}}}
It's not really 'pythonic' but i dont see a good way to do this without recursion
def listToDict(l):
if type(l) != type([]): return l
return {l[0] : listToDict(l[1])}
It made the most sense to me to break this problem into two parts (well, that and I misread the question the first time through..'S)
transformation
The first part transforms the [key, list1, list2] data structure into nested dictionaries:
def recdict(elements):
"""Create recursive dictionaries from [k, v1, v2, ...] lists.
>>> import pprint, functools
>>> pprint = functools.partial(pprint.pprint, width=2)
>>> pprint(recdict(['a', ['b', ['x', '1'], ['y', '2']]]))
{'a': {'b': {'x': '1',
'y': '2'}}}
>>> pprint(recdict(['a', ['c', ['xx', '4'], ['gg', ['m', '3']]]]))
{'a': {'c': {'gg': {'m': '3'},
'xx': '4'}}}
"""
def rec(item):
if isinstance(item[1], list):
return [item[0], dict(rec(e) for e in item[1:])]
return item
return dict([rec(elements)])
It expects that
every list has at least two elements
the first element of every list is a key
if the second element of a list is a list, then all subsequent elements are also lists; these are combined into a dictionary.
The tricky bit (at least for me) was realizing that you have to return a list from the recursive function rather than a dictionary. Otherwise, you can't combine the parallel lists that form the second and third elements of some of the lists.
To make this more generally useful (i.e. to tuples and other sequences), I would change
if isinstance(item[1], list):
to
if (isinstance(item[1], collections.Sequence)
and not isinstance(item[1], basestring)):
You can also make it work for any iterable but that requires a little bit of reorganization.
merging
The second part merges the dictionaries that result from running the first routine on the two given data structures. I think this will recursively merge any number of dictionaries that don't have conflicting keys, though I didn't really test it for anything other than this use case.
def mergedicts(*dicts):
"""Recursively merge an arbitrary number of dictionaries.
>>> import pprint
>>> d1 = {'a': {'b': {'x': '1',
... 'y': '2'}}}
>>> d2 = {'a': {'c': {'gg': {'m': '3'},
... 'xx': '4'}}}
>>> pprint.pprint(mergedicts(d1, d2), width=2)
{'a': {'b': {'x': '1',
'y': '2'},
'c': {'gg': {'m': '3'},
'xx': '4'}}}
"""
keys = set(k for d in dicts for k in d)
def vals(key):
"""Returns all values for `key` in all `dicts`."""
withkey = (d for d in dicts if d.has_key(key))
return [d[key] for d in withkey]
def recurse(*values):
"""Recurse if the values are dictionaries."""
if isinstance(values[0], dict):
return mergedicts(*values)
if len(values) == 1:
return values[0]
raise TypeError("Multiple non-dictionary values for a key.")
return dict((key, recurse(*vals(key))) for key in keys)

Categories

Resources