Merge dictionaries retaining values for duplicate keys [duplicate] - python

This question already has answers here:
How to merge dicts, collecting values from matching keys?
(17 answers)
Closed 12 days ago.
Given n dictionaries, write a function that will return a unique dictionary with a list of values for duplicate keys.
Example:
d1 = {'a': 1, 'b': 2}
d2 = {'c': 3, 'b': 4}
d3 = {'a': 5, 'd': 6}
result:
>>> newdict
{'c': 3, 'd': 6, 'a': [1, 5], 'b': [2, 4]}
My code so far:
>>> def merge_dicts(*dicts):
... x = []
... for item in dicts:
... x.append(item)
... return x
...
>>> merge_dicts(d1, d2, d3)
[{'a': 1, 'b': 2}, {'c': 3, 'b': 4}, {'a': 5, 'd': 6}]
What would be the best way to produce a new dictionary that yields a list of values for those duplicate keys?

Python provides a simple and fast solution to this: the defaultdict in the collections module. From the examples in the documentation:
Using list as the default_factory, it is easy to group a sequence of
key-value pairs into a dictionary of lists:
>>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
>>> d = defaultdict(list)
>>> for k, v in s:
... d[k].append(v)
...
>>> d.items()
[('blue', [2, 4]), ('red', 1), ('yellow', [1, 3])]
When each key is encountered for the first time, it is not already in
the mapping; so an entry is automatically created using the
default_factory function which returns an empty list. The
list.append() operation then attaches the value to the new list. When
keys are encountered again, the look-up proceeds normally (returning
the list for that key) and the list.append() operation adds another
value to the list.
In your case, that would be roughly:
import collections
def merge_dicts(*dicts):
res = collections.defaultdict(list)
for d in dicts:
for k, v in d.iteritems():
res[k].append(v)
return res
>>> merge_dicts(d1, d2, d3)
defaultdict(<type 'list'>, {'a': [1, 5], 'c': [3], 'b': [2, 4], 'd': [6]})

def merge_dicts(*dicts):
d = {}
for dict in dicts:
for key in dict:
try:
d[key].append(dict[key])
except KeyError:
d[key] = [dict[key]]
return d
This retuns:
{'a': [1, 5], 'b': [2, 4], 'c': [3], 'd': [6]}
There is a slight difference to the question. Here all dictionary values are lists. If that is not to be desired for lists of length 1, then add:
for key in d:
if len(d[key]) == 1:
d[key] = d[key][0]
before the return d statement. However, I cannot really imagine when you would want to remove the list. (Consider the situation where you have lists as values; then removing the list around the items leads to ambiguous situations.)

Related

Convert each list element into a nested dictionary key

There is this list of string that I need to use to create a nested dictionary with some values ['C/A', 'C/B/A', 'C/B/B']
The output will be in the format {'C': {'A': [1, 2, 3], 'B': {'A': [1, 2, 3], 'B': [1, 2, 3]}}}
I've tried to use the below code to create the nested dictionary and update the value, but instead I get {'C': {'A': [1, 2, 3], 'C': {'B': {'A': [1, 2, 3], 'C': {'B': {'B': [1, 2, 3]}}}}}} as the output which is not the correct format. I'm still trying to figure out a way. any ideas?
s = ['C/A', 'C/B/A', 'C/B/B']
new = current = dict()
for each in s:
lst = each.split('/')
for i in range(len(lst)):
current[lst[i]] = dict()
if i != len(lst)-1:
current = current[lst[i]]
else:
current[lst[i]] = [1,2,3]
print(new)
You can create a custom Tree class:
class Tree(dict):
'''
Create arbitrarily nested dicts.
>>> t = Tree()
>>> t[1][2][3] = 4
>>> t
{1: {2: {3: 4}}}
>>> t.set_nested_item('a', 'b', 'c', value=5)
>>> t
{1: {2: {3: 4}}, 'a': {'b': {'c': 5}}}
'''
def __missing__(self, key):
self[key] = type(self)()
return self[key]
def set_nested_item(self, *keys, value):
head, *rest = keys
if not rest:
self[head] = value
else:
self[head].set_nested_item(*rest, value=value)
>>> s = ['C/A', 'C/B/A', 'C/B/B']
>>> output = Tree()
>>> default = [1, 2, 3]
>>> for item in s:
... output.set_nested_item(*item.split('/'), value=list(default))
>>> output
{'C': {'A': [1, 2, 3], 'B': {'A': [1, 2, 3], 'B': [1, 2, 3]}}}
You do not need numpy for this problem, but you may want to use recursion. Here is a recursive function add that adds a list of string keys lst and eventually a list of numbers to the dictionary d:
def add(lst, d):
key = lst[0]
if len(lst) == 1: # if the list has only 1 element
d[key] = [1, 2, 3] # That element is the last key
return
if key not in d: # Haven't seen that key before
d[key] = dict()
add(lst[1:], d[key]) # The recursive part
To use the function, create a new dictionary and apply the function to each splitter string:
d = dict()
for each in s:
add(each.split("/"), d)
# d
# {'C': {'A': [1, 2, 3], 'B': {'A': [1, 2, 3], 'B': [1, 2, 3]}}}

How to convert a list to a dict and concatenate values?

I have a list with schema as shown below:
list=[('a',2),('b',4),('a',1),('c',6)]
What I would like to do is convert it to a dict using the first value of each pair as key,I would also like pairs with the same key to be concatenated.For the above the result would be:
dict={ 'a':[2,1] , 'b':[4] , 'c':[6] }
I don't care about the order of the concatenated values,meaning we could also have 'a':[1,2].
How could this be done in python?
Do this:
l = [('a',2),('b',4),('a',1),('c',6)]
d = {}
for item in l:
if item[0] in d:
d[item[0]].append(item[1])
else:
d[item[0]] = [item[1]]
print(d) # {'a': [2, 1], 'b': [4], 'c': [6]}
To make it cleaner you could use defaultdict and 2 for iterators:
from collections import defaultdict
l = [('a',2),('b',4),('a',1),('c',6)]
d = defaultdict(lambda: [])
for key, val in l:
d[key].append(val)
print(dict(d)) # {'a': [2, 1], 'b': [4], 'c': [6]})
You can also use the setdefault method on dict to set a list as the default entry if a key is not in the dictionary yet:
l=[('a',2),('b',4),('a',1),('c',6)]
d = {}
for k, v in l:
d.setdefault(k, []).append(v)
d
{'a': [2, 1], 'b': [4], 'c': [6]}

Python nested dict comprehension

I have a dictionary:
my_dict = {'a': [1, 2, 3], 'c': 3, 'b': 2}
And I want a comprehension like add_dict = (x x +1 for x my_dict)
what would be the best approach to take when writing a comprehension to deal with keys with multiple values?
So the output would look like {'a': [2, 3, 4], 'c': 4, 'b':3} or maybe I might want to only +1 to values 1 and 2 of each key, keys 'b' and 'c' ... would be skipped.
I tried this (first two lines are kind redundant / was messing about)
my_dict = {'a': [1, 2, 3], 'b': 2, 'c': 3}
D = {x: y for (x, y) in zip(my_dict.keys(), my_dict.values())}
test = (v for v in D.values())
for x in test:
try:
if len(x):
for i in x:
print i +1
except:
print x +1
if name == 'main':
main()
output was
2
3
4
object of type 'int' has no len()
object of type 'int' has no len()
I was trying to find a more elegant way of doing this that worked using comprehensions.
Here's a one-liner for you (in Python 3), assuming the dictionary values never become double-nested:
>>> {k:([x+1 for x in v] if not isinstance(v,int) else v+1) for k,v in my_dict.items()}
{'a': [2, 3, 4], 'b': 3, 'c': 4}
Replace my_dict.items() with my_dict.iteritems() for Python 2

How do I build a dict using list comprehension?

How do I build a dict using list comprehension?
I have two lists.
series = [1,2,3,4,5]
categories = ['A', 'B', 'A', 'C','B']
I want to build a dict where the categories are the keys.
Thanks for your answers I'm looking to produce:
{'A' : [1, 3], 'B' : [2, 5], 'C' : [4]}
Because the keys can't exist twice
You have to have a list of tuples. The tuples are key/value pairs. You don't need a comprehension in this case, just zip:
dict(zip(categories, series))
Produces {'A': 3, 'B': 5, 'C': 4} (as pointed out by comments)
Edit: After looking at the keys, note that you can't have duplicate keys in a dictionary. So without further clarifying what you want, I'm not sure what solution you're looking for.
Edit: To get what you want, it's probably easiest to just do a for loop with either setdefault or a defaultdict.
categoriesMap = {}
for k, v in zip(categories, series):
categoriesMap.setdefault(k, []).append(v)
That should produce {'A': [1, 3], 'B': [2, 5], 'C': [3]}
from collectons import defaultdict
series = [1,2,3,4,5]
categories = ['A', 'B', 'A', 'C','B']
result = defaultdict(list)
for key, val in zip(categories, series)
result[key].append(value)
Rather than being clever (I have an itertools solution I'm fond of) there's nothing wrong with a good, old-fashioned for loop:
>>> from collections import defaultdict
>>>
>>> series = [1,2,3,4,5]
>>> categories = ['A', 'B', 'A', 'C','B']
>>>
>>> d = defaultdict(list)
>>> for c,s in zip(categories, series):
... d[c].append(s)
...
>>> d
defaultdict(<type 'list'>, {'A': [1, 3], 'C': [4], 'B': [2, 5]})
This doesn't use a list comprehension because a list comprehension is the wrong way to do it. But since you seem to really want one for some reason: how about:
>> dict([(c0, [s for (c,s) in zip(categories, series) if c == c0]) for c0 in categories])
{'A': [1, 3], 'C': [4], 'B': [2, 5]}
That has not one but two list comprehensions, and is very inefficient to boot.
In principle you can do as Kris suggested: dict(zip(categories, series)), just be aware that there can not be duplicates in categories (as in your sample code).
EDIT :
Now that you've clarified what you intended, this will work as expected:
from collections import defaultdict
d = defaultdict(list)
for k, v in zip(categories, series):
d[k].append(v)
d={ k:[] for k in categories }
map(lambda k,v: d[k].append(v), categories, series )
result:
d is now = {'A': [1, 3], 'C': [4], 'B': [2, 5]}
or (equivalent) using setdefault (thanks Kris R.)
d={}
map(lambda k,v: d.setdefault(k,[]).append(v), categories, series )

List of dicts to/from dict of lists

I want to change back and forth between a dictionary of (equal-length) lists:
DL = {'a': [0, 1], 'b': [2, 3]}
and a list of dictionaries:
LD = [{'a': 0, 'b': 2}, {'a': 1, 'b': 3}]
For those of you that enjoy clever/hacky one-liners.
Here is DL to LD:
v = [dict(zip(DL,t)) for t in zip(*DL.values())]
print(v)
and LD to DL:
v = {k: [dic[k] for dic in LD] for k in LD[0]}
print(v)
LD to DL is a little hackier since you are assuming that the keys are the same in each dict. Also, please note that I do not condone the use of such code in any kind of real system.
If you're allowed to use outside packages, Pandas works great for this:
import pandas as pd
pd.DataFrame(DL).to_dict(orient="records")
Which outputs:
[{'a': 0, 'b': 2}, {'a': 1, 'b': 3}]
You can also use orient="list" to get back the original structure
{'a': [0, 1], 'b': [2, 3]}
Perhaps consider using numpy:
import numpy as np
arr = np.array([(0, 2), (1, 3)], dtype=[('a', int), ('b', int)])
print(arr)
# [(0, 2) (1, 3)]
Here we access columns indexed by names, e.g. 'a', or 'b' (sort of like DL):
print(arr['a'])
# [0 1]
Here we access rows by integer index (sort of like LD):
print(arr[0])
# (0, 2)
Each value in the row can be accessed by column name (sort of like LD):
print(arr[0]['b'])
# 2
To go from the list of dictionaries, it is straightforward:
You can use this form:
DL={'a':[0,1],'b':[2,3], 'c':[4,5]}
LD=[{'a':0,'b':2, 'c':4},{'a':1,'b':3, 'c':5}]
nd={}
for d in LD:
for k,v in d.items():
try:
nd[k].append(v)
except KeyError:
nd[k]=[v]
print nd
#{'a': [0, 1], 'c': [4, 5], 'b': [2, 3]}
Or use defaultdict:
nd=cl.defaultdict(list)
for d in LD:
for key,val in d.items():
nd[key].append(val)
print dict(nd.items())
#{'a': [0, 1], 'c': [4, 5], 'b': [2, 3]}
Going the other way is problematic. You need to have some information of the insertion order into the list from keys from the dictionary. Recall that the order of keys in a dict is not necessarily the same as the original insertion order.
For giggles, assume the insertion order is based on sorted keys. You can then do it this way:
nl=[]
nl_index=[]
for k in sorted(DL.keys()):
nl.append({k:[]})
nl_index.append(k)
for key,l in DL.items():
for item in l:
nl[nl_index.index(key)][key].append(item)
print nl
#[{'a': [0, 1]}, {'b': [2, 3]}, {'c': [4, 5]}]
If your question was based on curiosity, there is your answer. If you have a real-world problem, let me suggest you rethink your data structures. Neither of these seems to be a very scalable solution.
Here are the one-line solutions (spread out over multiple lines for readability) that I came up with:
if dl is your original dict of lists:
dl = {"a":[0, 1],"b":[2, 3]}
Then here's how to convert it to a list of dicts:
ld = [{key:value[index] for key,value in dl.items()}
for index in range(max(map(len,dl.values())))]
Which, if you assume that all your lists are the same length, you can simplify and gain a performance increase by going to:
ld = [{key:value[index] for key, value in dl.items()}
for index in range(len(dl.values()[0]))]
Here's how to convert that back into a dict of lists:
dl2 = {key:[item[key] for item in ld]
for key in list(functools.reduce(
lambda x, y: x.union(y),
(set(dicts.keys()) for dicts in ld)
))
}
If you're using Python 2 instead of Python 3, you can just use reduce instead of functools.reduce there.
You can simplify this if you assume that all the dicts in your list will have the same keys:
dl2 = {key:[item[key] for item in ld] for key in ld[0].keys() }
cytoolz.dicttoolz.merge_with
Docs
from cytoolz.dicttoolz import merge_with
merge_with(list, *LD)
{'a': [0, 1], 'b': [2, 3]}
Non-cython version
Docs
from toolz.dicttoolz import merge_with
merge_with(list, *LD)
{'a': [0, 1], 'b': [2, 3]}
The python module of pandas can give you an easy-understanding solution. As a complement to #chiang's answer, the solutions of both D-to-L and L-to-D are as follows:
import pandas as pd
DL = {'a': [0, 1], 'b': [2, 3]}
out1 = pd.DataFrame(DL).to_dict('records')
Output:
[{'a': 0, 'b': 2}, {'a': 1, 'b': 3}]
In the other direction:
LD = [{'a': 0, 'b': 2}, {'a': 1, 'b': 3}]
out2 = pd.DataFrame(LD).to_dict('list')
Output:
{'a': [0, 1], 'b': [2, 3]}
Cleanest way I can think of a summer friday. As a bonus, it supports lists of different lengths (but in this case, DLtoLD(LDtoDL(l)) is no more identity).
From list to dict
Actually less clean than #dwerk's defaultdict version.
def LDtoDL (l) :
result = {}
for d in l :
for k, v in d.items() :
result[k] = result.get(k,[]) + [v] #inefficient
return result
From dict to list
def DLtoLD (d) :
if not d :
return []
#reserve as much *distinct* dicts as the longest sequence
result = [{} for i in range(max (map (len, d.values())))]
#fill each dict, one key at a time
for k, seq in d.items() :
for oneDict, oneValue in zip(result, seq) :
oneDict[k] = oneValue
return result
I needed such a method which works for lists of different lengths (so this is a generalization of the original question). Since I did not find any code here that the way that I expected, here's my code which works for me:
def dict_of_lists_to_list_of_dicts(dict_of_lists: Dict[S, List[T]]) -> List[Dict[S, T]]:
keys = list(dict_of_lists.keys())
list_of_values = [dict_of_lists[key] for key in keys]
product = list(itertools.product(*list_of_values))
return [dict(zip(keys, product_elem)) for product_elem in product]
Examples:
>>> dict_of_lists_to_list_of_dicts({1: [3], 2: [4, 5]})
[{1: 3, 2: 4}, {1: 3, 2: 5}]
>>> dict_of_lists_to_list_of_dicts({1: [3, 4], 2: [5]})
[{1: 3, 2: 5}, {1: 4, 2: 5}]
>>> dict_of_lists_to_list_of_dicts({1: [3, 4], 2: [5, 6]})
[{1: 3, 2: 5}, {1: 3, 2: 6}, {1: 4, 2: 5}, {1: 4, 2: 6}]
>>> dict_of_lists_to_list_of_dicts({1: [3, 4], 2: [5, 6], 7: [8, 9, 10]})
[{1: 3, 2: 5, 7: 8},
{1: 3, 2: 5, 7: 9},
{1: 3, 2: 5, 7: 10},
{1: 3, 2: 6, 7: 8},
{1: 3, 2: 6, 7: 9},
{1: 3, 2: 6, 7: 10},
{1: 4, 2: 5, 7: 8},
{1: 4, 2: 5, 7: 9},
{1: 4, 2: 5, 7: 10},
{1: 4, 2: 6, 7: 8},
{1: 4, 2: 6, 7: 9},
{1: 4, 2: 6, 7: 10}]
Here my small script :
a = {'a': [0, 1], 'b': [2, 3]}
elem = {}
result = []
for i in a['a']: # (1)
for key, value in a.items():
elem[key] = value[i]
result.append(elem)
elem = {}
print result
I'm not sure that is the beautiful way.
(1) You suppose that you have the same length for the lists
Here is a solution without any libraries used:
def dl_to_ld(initial):
finalList = []
neededLen = 0
for key in initial:
if(len(initial[key]) > neededLen):
neededLen = len(initial[key])
for i in range(neededLen):
finalList.append({})
for i in range(len(finalList)):
for key in initial:
try:
finalList[i][key] = initial[key][i]
except:
pass
return finalList
You can call it as follows:
dl = {'a':[0,1],'b':[2,3]}
print(dl_to_ld(dl))
#[{'a': 0, 'b': 2}, {'a': 1, 'b': 3}]
If you don't mind a generator, you can use something like
def f(dl):
l = list((k,v.__iter__()) for k,v in dl.items())
while True:
d = dict((k,i.next()) for k,i in l)
if not d:
break
yield d
It's not as "clean" as it could be for Technical Reasons: My original implementation did yield dict(...), but this ends up being the empty dictionary because (in Python 2.5) a for b in c does not distinguish between a StopIteration exception when iterating over c and a StopIteration exception when evaluating a.
On the other hand, I can't work out what you're actually trying to do; it might be more sensible to design a data structure that meets your requirements instead of trying to shoehorn it in to the existing data structures. (For example, a list of dicts is a poor way to represent the result of a database query.)
List of dicts ⟶ dict of lists
from collections import defaultdict
from typing import TypeVar
K = TypeVar("K")
V = TypeVar("V")
def ld_to_dl(ld: list[dict[K, V]]) -> dict[K, list[V]]:
dl = defaultdict(list)
for d in ld:
for k, v in d.items():
dl[k].append(v)
return dl
defaultdict creates an empty list if one does not exist upon key access.
Dict of lists ⟶ list of dicts
Collecting into "jagged" dictionaries
from typing import TypeVar
K = TypeVar("K")
V = TypeVar("V")
def dl_to_ld(dl: dict[K, list[V]]) -> list[dict[K, V]]:
ld = []
for k, vs in dl.items():
ld += [{} for _ in range(len(vs) - len(ld))]
for i, v in enumerate(vs):
ld[i][k] = v
return ld
This generates a list of dictionaries ld that may be missing items if the lengths of the lists in dl are unequal. It loops over all key-values in dl, and creates empty dictionaries if ld does not have enough.
Collecting into "complete" dictionaries only
(Usually intended only for equal-length lists.)
from typing import TypeVar
K = TypeVar("K")
V = TypeVar("V")
def dl_to_ld(dl: dict[K, list[V]]) -> list[dict[K, V]]:
ld = [dict(zip(dl.keys(), v)) for v in zip(*dl.values())]
return ld
This generates a list of dictionaries ld that have the length of the smallest list in dl.
DL={'a':[0,1,2,3],'b':[2,3,4,5]}
LD=[{'a':0,'b':2},{'a':1,'b':3}]
Empty_list = []
Empty_dict = {}
# to find length of list in values of dictionry
len_list = 0
for i in DL.values():
if len_list < len(i):
len_list = len(i)
for k in range(len_list):
for i,j in DL.items():
Empty_dict[i] = j[k]
Empty_list.append(Empty_dict)
Empty_dict = {}
LD = Empty_list

Categories

Resources