How can I create a dictionary of dictionaries of Unique Key Pairs? - python

I have a pandas dataframe with columns IssuerId and Sedol. The IssuerIds will be the same in some instances but the Sedol will always be different.
I would like to create a dictionary or multi-level index that aggregates these such that I can easily traverse them. For instance I currently have:
IssuerID Sedol
1 1
1 2
2 3
3 4
And I want to somehow create:
[{1: [1,2]},{2: 3},{3:4}]

If you do groupBy + apply(list) and then call to_dict, you get:
d = df.groupby("IssuerID")["Sedol"].apply(list).to_dict()
print(d)
#{1: [1, 2], 2: [3], 3: [4]}
Now just reformat d to get your desired output.
If you want a dictionary, use a dict comprehension:
new_dict = {k: v if len(v) > 1 else v[0] for k, v in d.items()}
print(new_dict)
#{1: [1, 2], 2: 3, 3: 4}
If you want a list of dictionaries, use a list comprehension:
new_list = [{k: v if len(v) > 1 else v[0]} for k, v in d.items()]
print(new_list )
#[{1: [1, 2]}, {2: 3}, {3: 4}]

Using groupby
l=[{x:y.tolist()}for x , y in df.groupby('IssuerID')['Sedol']]
l
[{1: [1, 2]}, {2: [3]}, {3: [4]}]

Related

How to return a string containing information about how many values exist for each key

I currently have the following:
mydict = {'a': [1, 2, 3], 'b': [1, 2]}
I want to return a string containing quantities of items available in the dictionary. For example
a: 3
b: 2
However, I want my output to update if I add another key value pair to the dictionary. For example mydict['c'] = [1, 2, 3]
I have thought about how to do this and this is all that comes to mind:
def quantities() -> str:
mydict = {'a': [1, 2, 3], 'b': [1, 2]}
for k, v in mydict:
print(f'{k}: {len(v)})
But I am not sure if this is correct. Are there any other ways to do this.
The statement:
for <variable> in mydict:
Iterates through only the keys of the dictionary. So, you can either use the key to get the item like:
mydict = {'a': [1, 2, 3], 'b': [1, 2]}
for k in mydict:
print(f'{k}: {len(mydict[k])}')
Or use mydict.items() This makes it iterate through every (key, value). USe it as:
mydict = {'a': [1, 2, 3], 'b': [1, 2]}
for k, v in mydict.items():
print(f'{k}: {len(v)}')
I don't think your sample code will work. I used this documentation and use sorted() I think what you want is something like this.
mydict = {'a': [1, 2, 3, 4], 'b': [1, 2]}
def quantities():
for k, v in sorted(mydict.items()):
print(k, len(v))
quantities()
You can do this with str.join and a generator expression:
def quantities(mydict):
return '\n'.join('{}: {}'.format(k, len(v)) for k, v in mydict.items())

How to convert a list to a dict and concatenate values?

I have a list with schema as shown below:
list=[('a',2),('b',4),('a',1),('c',6)]
What I would like to do is convert it to a dict using the first value of each pair as key,I would also like pairs with the same key to be concatenated.For the above the result would be:
dict={ 'a':[2,1] , 'b':[4] , 'c':[6] }
I don't care about the order of the concatenated values,meaning we could also have 'a':[1,2].
How could this be done in python?
Do this:
l = [('a',2),('b',4),('a',1),('c',6)]
d = {}
for item in l:
if item[0] in d:
d[item[0]].append(item[1])
else:
d[item[0]] = [item[1]]
print(d) # {'a': [2, 1], 'b': [4], 'c': [6]}
To make it cleaner you could use defaultdict and 2 for iterators:
from collections import defaultdict
l = [('a',2),('b',4),('a',1),('c',6)]
d = defaultdict(lambda: [])
for key, val in l:
d[key].append(val)
print(dict(d)) # {'a': [2, 1], 'b': [4], 'c': [6]})
You can also use the setdefault method on dict to set a list as the default entry if a key is not in the dictionary yet:
l=[('a',2),('b',4),('a',1),('c',6)]
d = {}
for k, v in l:
d.setdefault(k, []).append(v)
d
{'a': [2, 1], 'b': [4], 'c': [6]}

Merge two dictionaries and keep the values for duplicate keys in Python [duplicate]

This question already has answers here:
How to merge dicts, collecting values from matching keys?
(17 answers)
Closed 15 days ago.
Let's suppose that I have two dictionaries:
dic1 = { "first":1, "second":4, "third":8}
dic2 = { "first":9, "second":5, "fourth":3}
Is there a straightforward way to obtain something like the below?
dic3 = { "first":[1,9], "second":[4,5], "third":[8], "fourth":[3]}
I used lists to store values, but tuples are fine as well.
You can use a defaultdict to hold lists, and then just append the values to them. This approach easily extends to an arbitrary number of dictionaries.
from collections import defaultdict
dd = defaultdict(list)
dics = [dic1, dic2]
for dic in dics:
for key, val in dic.iteritems(): # .items() in Python 3.
dd[key].append(val)
>>> dict(dd)
{'first': [1, 9], 'fourth': [3], 'second': [4, 5], 'third': [8]}
All of the keys with a single value are still held within a list, which is probably the best way to go. You could, however, change anything of length one into the actual value, e.g.
for key, val in dd.iteritems(): # .items() in Python 3.
if len(val) == 1
dd[key] = val[0]
Here's a naive solution; copy one of the dictionaries over to the result and iterate over the other dictionary's keys and values, adding lists to the result as necessary. Since there are only two dictionaries, no merged list will have more than 2 items.
dic1 = {"first": 1, "second": 4, "third": 8}
dic2 = {"first": 9, "second": 5, "fourth": 3}
dic3 = dict(dic2)
for k, v in dic1.items():
dic3[k] = [dic3[k], v] if k in dic3 else v
print(dic3) # => {'first': [9, 1], 'second': [5, 4], 'fourth': 3, 'third': 8}
If you'd like single values to be lists (likely better design; mixed types aren't much fun to deal with) you can use:
dic3 = {k: [v] for k, v in dic2.items()}
for k, v in dic1.items():
dic3[k] = dic3[k] + [v] if k in dic3 else [v]
print(dic3) # => {'first': [9, 1], 'second': [5, 4], 'fourth': [3], 'third': [8]}
Generalizing it to any number of dictionaries:
def merge_dicts(*dicts):
"""
>>> merge_dicts({"a": 2}, {"b": 4, "a": 3}, {"a": 1})
{'a': [2, 3, 1], 'b': [4]}
"""
merged = {}
for d in dicts:
for k, v in d.items():
if k not in merged:
merged[k] = []
merged[k].append(v)
return merged
You can use collections.defaultdict to clean it up a bit if you don't mind the import:
from collections import defaultdict
def merge_dicts(*dicts):
"""
>>> merge_dicts({"a": 2}, {"b": 4, "a": 3}, {"a": 1})
defaultdict(<class 'list'>, {'a': [2, 3, 1], 'b': [4]})
"""
merged = defaultdict(list)
for d in dicts:
for k, v in d.items():
merged[k].append(v)
return merged
Given:
dic1 = { "first":1, "second":4, "third":8}
dic2 = { "first":9, "second":5, "fourth":3}
You can use .setdefault:
dic_new={}
for k,v in list(dic1.items())+list(dic2.items()):
dic_new.setdefault(k, []).append(v)
else:
dic_new={k:v if len(v)>1 else v[0] for k,v in dic_new.items()}
>>> dic_new
{'first': [1, 9], 'second': [4, 5], 'third': 8, 'fourth': 3}
This produces the output in question. I think that flattening the single elements lists to a different object type is an unnecessary complexity.
With the edit, this produces the desired result:
dic_new={}
for k,v in list(dic1.items())+list(dic2.items()):
dic_new.setdefault(k, []).append(v)
>>> dic_new
{'first': [1, 9], 'second': [4, 5], 'third': [8], 'fourth': [3]}
Using set and dictionary comprehension
L = [d1, d2]
dups = set(d1.keys() & d2.keys())
d = {k: [L[0][k], L[1][k]] if k in dups else i[k] for i in L for k in i}
{'first': [1, 9], 'second': [4, 5], 'third': 8, 'fourth': 3}
In general, I would say it's bad practice to cast the values of different keys as different object types. I would simply do something like:
def merge_values(val1, val2):
if val1 is None:
return [val2]
elif val2 is None:
return [val1]
else:
return [val1, val2]
dict3 = {
key: merge_values(dic1.get(key), dic2.get(key))
for key in set(dic1).union(dic2)
}
Create a new dictionary dic having for keys the keys of dic1 and dic2 and value an empty list, then iterate over dic1 and dic2 appending values to dic:
dic1 = { "first":1, "second":4, "third":8}
dic2 = { "first":9, "second":5, "fourth":3}
dic = {key:[] for key in list(dic1.keys()) + list(dic2.keys())}
for key in dic1.keys():
dic[key].append(dic1[key])
for key in dic2.keys():
dic[key].append(dic2[key])
Solution for dict of lists (adapted from #dawg):
dic1 = { "first":[1], "second":[4], "third":[8]}
dic2 = { "first":[9], "second":[5], "fourth":[3]}
dic_new={}
for k,v in list(dic1.items())+list(dic2.items()):
dic_new.setdefault(k, []).extend(v)
>>> dic_new
{'first': [1, 9], 'second': [4, 5], 'third': [8], 'fourth': [3]}
from copy import deepcopy
def _add_value_to_list(value, lis):
if value:
if isinstance(value, list):
lis.extend(value)
else:
lis.append(value)
else:
pass
def _merge_value(value_a, value_b):
merged_value = []
_add_value_to_list(value_a, merged_value)
_add_value_to_list(value_b, merged_value)
return merged_value
def _recursion_merge_dict(new_dic, dic_a, dic_b):
if not dic_a or not dic_b:
return new_dic
else:
if isinstance(new_dic, dict):
for k, v in new_dic.items():
new_dic[k] = _recursion_merge_dict(v, dic_a.get(k, {}), dic_b.get(k, {}))
return new_dic
else:
return _merge_value(dic_a, dic_b)
def merge_dicts(dic_a, dic_b):
new_dic = deepcopy(dic_a)
new_dic.update(dic_b)
return _recursion_merge_dict(new_dic, dic_a, dic_b)

Delete dictionary keys that are out of bounds (python)

If you have a dictionary of integers:
d = {
1:[0],
2:[1],
3:[0,1,2,3,4],
4:[0],
5:[1],
6:[0,1,2,3,4],
11:[0],
22:[1],
33:[0,1,2,3,4],
44:[0],
55:[1],
66:[0,1,2,3,4]
}
You want to:
Validate that the keys are between 0 and 25.
Delete any keys that are outside of the range as they are not valid and will ruin the data set.
Dictionary keys are not naturally sorted.
Given, how would validate that your keys are in the required range?
My try:
for x,y in d.items():
if x<0 or x>25:
del d[x]
When ran I get the error:
RuntimeError: dictionary changed size during iteration
How would I compensate for this?
In your example, you are mutating the d while looping through it. This is bad.
The easiest way to do this if you don't need to change the original d is to use a dictionary comprehension:
d = {k: v for k, v in d.items() if 0 <= k <= 25}
If you want to delete keys while iterating, you need to iterate over a copy instead and pop keys that don't hold to your condition:
d = {1:[0], 2:[1], 3:[0,1,2,3,4], 4:[0], 5:[1], 6:[0,1,2,3,4], 11:[0], 22:[1], 33:[0,1,2,3,4], 44:[0], 55:[1], 66:[0,1,2,3,4]}
for k in d.copy(): # or list(d)
if not 0 <= k <= 25:
d.pop(k) # or del d[k]
Which Outputs:
{1: [0], 2: [1], 3: [0, 1, 2, 3, 4], 4: [0], 5: [1], 6: [0, 1, 2, 3, 4], 11: [0], 22: [1]}
As others have shown, reconstructing a new dictionary is always an easy way around this.
You can use a basic dict comprehension here:
{k: d[k] for k in d if 0 <= k <= 25}
Or even a functional approach with filter():
dict(filter(lambda x: 0 <= x[0] <= 25, d.items()))
You can use a dictionary comprehension:
d = { 1:[0], 2:[1], 3:[0,1,2,3,4], 4:[0], 5:[1], 6:[0,1,2,3,4], 11:[0], 22:[1], 33:[0,1,2,3,4], 44:[0], 55:[1], 66:[0,1,2,3,4] }
new_d = {a:b for a, b in d.items() if a <= 25 and a >= 0}
Output:
{1: [0], 2: [1], 3: [0, 1, 2, 3, 4], 4: [0], 5: [1], 6: [0, 1, 2, 3, 4], 11: [0], 22: [1]}

List of dicts to/from dict of lists

I want to change back and forth between a dictionary of (equal-length) lists:
DL = {'a': [0, 1], 'b': [2, 3]}
and a list of dictionaries:
LD = [{'a': 0, 'b': 2}, {'a': 1, 'b': 3}]
For those of you that enjoy clever/hacky one-liners.
Here is DL to LD:
v = [dict(zip(DL,t)) for t in zip(*DL.values())]
print(v)
and LD to DL:
v = {k: [dic[k] for dic in LD] for k in LD[0]}
print(v)
LD to DL is a little hackier since you are assuming that the keys are the same in each dict. Also, please note that I do not condone the use of such code in any kind of real system.
If you're allowed to use outside packages, Pandas works great for this:
import pandas as pd
pd.DataFrame(DL).to_dict(orient="records")
Which outputs:
[{'a': 0, 'b': 2}, {'a': 1, 'b': 3}]
You can also use orient="list" to get back the original structure
{'a': [0, 1], 'b': [2, 3]}
Perhaps consider using numpy:
import numpy as np
arr = np.array([(0, 2), (1, 3)], dtype=[('a', int), ('b', int)])
print(arr)
# [(0, 2) (1, 3)]
Here we access columns indexed by names, e.g. 'a', or 'b' (sort of like DL):
print(arr['a'])
# [0 1]
Here we access rows by integer index (sort of like LD):
print(arr[0])
# (0, 2)
Each value in the row can be accessed by column name (sort of like LD):
print(arr[0]['b'])
# 2
To go from the list of dictionaries, it is straightforward:
You can use this form:
DL={'a':[0,1],'b':[2,3], 'c':[4,5]}
LD=[{'a':0,'b':2, 'c':4},{'a':1,'b':3, 'c':5}]
nd={}
for d in LD:
for k,v in d.items():
try:
nd[k].append(v)
except KeyError:
nd[k]=[v]
print nd
#{'a': [0, 1], 'c': [4, 5], 'b': [2, 3]}
Or use defaultdict:
nd=cl.defaultdict(list)
for d in LD:
for key,val in d.items():
nd[key].append(val)
print dict(nd.items())
#{'a': [0, 1], 'c': [4, 5], 'b': [2, 3]}
Going the other way is problematic. You need to have some information of the insertion order into the list from keys from the dictionary. Recall that the order of keys in a dict is not necessarily the same as the original insertion order.
For giggles, assume the insertion order is based on sorted keys. You can then do it this way:
nl=[]
nl_index=[]
for k in sorted(DL.keys()):
nl.append({k:[]})
nl_index.append(k)
for key,l in DL.items():
for item in l:
nl[nl_index.index(key)][key].append(item)
print nl
#[{'a': [0, 1]}, {'b': [2, 3]}, {'c': [4, 5]}]
If your question was based on curiosity, there is your answer. If you have a real-world problem, let me suggest you rethink your data structures. Neither of these seems to be a very scalable solution.
Here are the one-line solutions (spread out over multiple lines for readability) that I came up with:
if dl is your original dict of lists:
dl = {"a":[0, 1],"b":[2, 3]}
Then here's how to convert it to a list of dicts:
ld = [{key:value[index] for key,value in dl.items()}
for index in range(max(map(len,dl.values())))]
Which, if you assume that all your lists are the same length, you can simplify and gain a performance increase by going to:
ld = [{key:value[index] for key, value in dl.items()}
for index in range(len(dl.values()[0]))]
Here's how to convert that back into a dict of lists:
dl2 = {key:[item[key] for item in ld]
for key in list(functools.reduce(
lambda x, y: x.union(y),
(set(dicts.keys()) for dicts in ld)
))
}
If you're using Python 2 instead of Python 3, you can just use reduce instead of functools.reduce there.
You can simplify this if you assume that all the dicts in your list will have the same keys:
dl2 = {key:[item[key] for item in ld] for key in ld[0].keys() }
cytoolz.dicttoolz.merge_with
Docs
from cytoolz.dicttoolz import merge_with
merge_with(list, *LD)
{'a': [0, 1], 'b': [2, 3]}
Non-cython version
Docs
from toolz.dicttoolz import merge_with
merge_with(list, *LD)
{'a': [0, 1], 'b': [2, 3]}
The python module of pandas can give you an easy-understanding solution. As a complement to #chiang's answer, the solutions of both D-to-L and L-to-D are as follows:
import pandas as pd
DL = {'a': [0, 1], 'b': [2, 3]}
out1 = pd.DataFrame(DL).to_dict('records')
Output:
[{'a': 0, 'b': 2}, {'a': 1, 'b': 3}]
In the other direction:
LD = [{'a': 0, 'b': 2}, {'a': 1, 'b': 3}]
out2 = pd.DataFrame(LD).to_dict('list')
Output:
{'a': [0, 1], 'b': [2, 3]}
Cleanest way I can think of a summer friday. As a bonus, it supports lists of different lengths (but in this case, DLtoLD(LDtoDL(l)) is no more identity).
From list to dict
Actually less clean than #dwerk's defaultdict version.
def LDtoDL (l) :
result = {}
for d in l :
for k, v in d.items() :
result[k] = result.get(k,[]) + [v] #inefficient
return result
From dict to list
def DLtoLD (d) :
if not d :
return []
#reserve as much *distinct* dicts as the longest sequence
result = [{} for i in range(max (map (len, d.values())))]
#fill each dict, one key at a time
for k, seq in d.items() :
for oneDict, oneValue in zip(result, seq) :
oneDict[k] = oneValue
return result
I needed such a method which works for lists of different lengths (so this is a generalization of the original question). Since I did not find any code here that the way that I expected, here's my code which works for me:
def dict_of_lists_to_list_of_dicts(dict_of_lists: Dict[S, List[T]]) -> List[Dict[S, T]]:
keys = list(dict_of_lists.keys())
list_of_values = [dict_of_lists[key] for key in keys]
product = list(itertools.product(*list_of_values))
return [dict(zip(keys, product_elem)) for product_elem in product]
Examples:
>>> dict_of_lists_to_list_of_dicts({1: [3], 2: [4, 5]})
[{1: 3, 2: 4}, {1: 3, 2: 5}]
>>> dict_of_lists_to_list_of_dicts({1: [3, 4], 2: [5]})
[{1: 3, 2: 5}, {1: 4, 2: 5}]
>>> dict_of_lists_to_list_of_dicts({1: [3, 4], 2: [5, 6]})
[{1: 3, 2: 5}, {1: 3, 2: 6}, {1: 4, 2: 5}, {1: 4, 2: 6}]
>>> dict_of_lists_to_list_of_dicts({1: [3, 4], 2: [5, 6], 7: [8, 9, 10]})
[{1: 3, 2: 5, 7: 8},
{1: 3, 2: 5, 7: 9},
{1: 3, 2: 5, 7: 10},
{1: 3, 2: 6, 7: 8},
{1: 3, 2: 6, 7: 9},
{1: 3, 2: 6, 7: 10},
{1: 4, 2: 5, 7: 8},
{1: 4, 2: 5, 7: 9},
{1: 4, 2: 5, 7: 10},
{1: 4, 2: 6, 7: 8},
{1: 4, 2: 6, 7: 9},
{1: 4, 2: 6, 7: 10}]
Here my small script :
a = {'a': [0, 1], 'b': [2, 3]}
elem = {}
result = []
for i in a['a']: # (1)
for key, value in a.items():
elem[key] = value[i]
result.append(elem)
elem = {}
print result
I'm not sure that is the beautiful way.
(1) You suppose that you have the same length for the lists
Here is a solution without any libraries used:
def dl_to_ld(initial):
finalList = []
neededLen = 0
for key in initial:
if(len(initial[key]) > neededLen):
neededLen = len(initial[key])
for i in range(neededLen):
finalList.append({})
for i in range(len(finalList)):
for key in initial:
try:
finalList[i][key] = initial[key][i]
except:
pass
return finalList
You can call it as follows:
dl = {'a':[0,1],'b':[2,3]}
print(dl_to_ld(dl))
#[{'a': 0, 'b': 2}, {'a': 1, 'b': 3}]
If you don't mind a generator, you can use something like
def f(dl):
l = list((k,v.__iter__()) for k,v in dl.items())
while True:
d = dict((k,i.next()) for k,i in l)
if not d:
break
yield d
It's not as "clean" as it could be for Technical Reasons: My original implementation did yield dict(...), but this ends up being the empty dictionary because (in Python 2.5) a for b in c does not distinguish between a StopIteration exception when iterating over c and a StopIteration exception when evaluating a.
On the other hand, I can't work out what you're actually trying to do; it might be more sensible to design a data structure that meets your requirements instead of trying to shoehorn it in to the existing data structures. (For example, a list of dicts is a poor way to represent the result of a database query.)
List of dicts ⟶ dict of lists
from collections import defaultdict
from typing import TypeVar
K = TypeVar("K")
V = TypeVar("V")
def ld_to_dl(ld: list[dict[K, V]]) -> dict[K, list[V]]:
dl = defaultdict(list)
for d in ld:
for k, v in d.items():
dl[k].append(v)
return dl
defaultdict creates an empty list if one does not exist upon key access.
Dict of lists ⟶ list of dicts
Collecting into "jagged" dictionaries
from typing import TypeVar
K = TypeVar("K")
V = TypeVar("V")
def dl_to_ld(dl: dict[K, list[V]]) -> list[dict[K, V]]:
ld = []
for k, vs in dl.items():
ld += [{} for _ in range(len(vs) - len(ld))]
for i, v in enumerate(vs):
ld[i][k] = v
return ld
This generates a list of dictionaries ld that may be missing items if the lengths of the lists in dl are unequal. It loops over all key-values in dl, and creates empty dictionaries if ld does not have enough.
Collecting into "complete" dictionaries only
(Usually intended only for equal-length lists.)
from typing import TypeVar
K = TypeVar("K")
V = TypeVar("V")
def dl_to_ld(dl: dict[K, list[V]]) -> list[dict[K, V]]:
ld = [dict(zip(dl.keys(), v)) for v in zip(*dl.values())]
return ld
This generates a list of dictionaries ld that have the length of the smallest list in dl.
DL={'a':[0,1,2,3],'b':[2,3,4,5]}
LD=[{'a':0,'b':2},{'a':1,'b':3}]
Empty_list = []
Empty_dict = {}
# to find length of list in values of dictionry
len_list = 0
for i in DL.values():
if len_list < len(i):
len_list = len(i)
for k in range(len_list):
for i,j in DL.items():
Empty_dict[i] = j[k]
Empty_list.append(Empty_dict)
Empty_dict = {}
LD = Empty_list

Categories

Resources