Converting list to nested dictionary - python

How can I convert a list into nested `dictionary'?
For example:
l = [1, 2, 3, 4]
I'd like to convert it to a dictionary that looks like this:
{1: {2: {3: {4: {}}}}}

For that reverse the list, then start creating the empty dictionary element.
l = [1, 2, 3, 4]
d = {}
for i in reversed(l):
d = {i: d}
>>> print(d)
{1: {2: {3: {4: {}}}}}

You could also use functools.reduce for this.
reduce(lambda cur, k: {k: cur}, reversed(l), {})
Demo
>>> from functools import reduce
>>> l = [1, 2, 3, 4]
>>> reduce(lambda cur, k: {k: cur}, reversed(l), {})
{1: {2: {3: {4: {}}}}}
The flow of construction looks something like
{4: {}} -> {3: {4: {}} -> {2: {3: {4: {}}}} -> {1: {2: {3: {4: {}}}}}
as reduce traverses the reverse iterator making a new single-element dict.

You can do something like this:
l = [1,2,3,4]
d = {}
for i in l[::-1]:
d = {i: d}
print(d)
{1: {2: {3: {4: {}}}}} [Finished in 0.4s]

Here is an abstraction. Uses for setdefault are typically overshadowed by defaultdict, but here is an interesting application if you have one or more lists (iterables):
def make_nested_dict(*iterables):
"""Return a nested dictionary."""
d = {}
for it in iterables:
temp = d
for i in it:
temp = temp.setdefault(i, {})
return d
make_nested_dict([1, 2, 3, 4])
# {1: {2: {3: {4: {}}}}}
make_nested_dict([1, 2, 3, 4], [5, 6])
# {1: {2: {3: {4: {}}}}, 5: {6: {}}}
Nested Branches
Unlike defaultdict, this technique accepts duplicate keys by appending to existing "branches". For example, we will append a new 7 → 8 branch at the third level of the first (A) branch:
A B C
make_nested_dict([1, 2, 3, 4], [5, 6], [1, 2, 7, 8])
# {1: {2: {3: {4: {}}, 7: {8: {}}}}, 5: {6: {}}}
Visually:
1 → 2 → 3 → 4 (A) 5 → 6 (B)
\
7 → 8 (C)

Related

How can I create a whole new dictionary in last random dictionary QAQ

dic = {1: 1, 3: 3, 5: 6, 6: 6}
new_mappping is reverse from dic {1: [1], 3: [3], 6: [5, 6]}
for keys, values in new_mapping.items():
if keys not in new_sort.items():
if values not in new_sort.items():
new_sort = new_mapping
for keys,values in new_sort.items():
lens = len(values)
ky = {keys: values}
lll[lens] = ky
output from code:
{1: {3: [3]}, 2: {6: [5, 6]}}
desired output:
{1: {1: [1], 3: [3]}, 2: {6: [5, 6]}}
Code:
#INPUT
dic = {1: 1, 3: 3, 5: 6, 6: 6}
#Step 1 - change keys to values and values to keys
rev_dic={}
[rev_dic.update({v: rev_dic[v]+[k]}) if v in rev_dic else rev_dic.update({v: [k]}) for k, v in dic.items()]
#Step 2 - create new dictionary by list length
new_dic = {k:{} for k in range(1,max([len(v) for v in rev_dic.values()])+1)}
#Lastly, insert the dictionary in new_dic by dics' value list length
[new_dic[len(val)].update({key:val}) for key,val in rev_dic.items()]
new_dic
Output:
{1: {1: [1], 3: [3]}, 2: {6: [5, 6]}}

How to obtain a set of dictionaries?

I have a list of dictionaries, but some of them are duplicates and I want to remove them (the duplicates).
The keys of the dict are a sequential number.
An example is the following:
[{1: {a:[1,2,3], b: 4}},
{2: {a:[4,5,6], d: 5}},
{3: {a:[1,2,3], b: 4}},
.....,
{1000: {a:[2,5,1], b: 99}},
]
Considering the previous example I would like to obtain:
[{1: {a:[1,2,3], b: 4}},
{2: {a:[4,5,6], d: 5}},
.....,
{1000: {a:[2,5,1], b: 99}},
]
In fact the dictionaries with keys 1 and 3 are identically in their values.
I tried with a set, but since dict is a not hashable type I am not able to do so.
How can i fix the problem?
EDIT
In my case the number of items inside the dict is not fix, so I can have:
[{1: {a:[1,2,3], b: 4}},
{2: {a:[4,5,6], d: 5}},
.....,
{1000: {a:[2,5,1], b: 99, c:["a","v"]}},
]
where the dict with keys 100 has three elements inside insted of two as the other shown.
To get around the limitation of #jdehesa's solution, where [1, 2] would be treated as a duplicate as (1, 2), you can preserve the data types by using pprint.pformat instead to serialize the data structure. Since pprint.pformat sorts dicts by keys and sets by items, {1: 2, 3: 4} would be properly considered the same as {3: 4, 1: 2}, but [1, 2] would not be considered a duplicate to (1, 2):
from pprint import pformat
lst = [
{1: {'a': [1, 2, 3], 'b': 4}},
{2: {'a': [4, 5, 6], 'd': 5}},
{3: {'b': 4, 'a': [1, 2, 3]}},
{4: {'a': (4, 5, 6), 'd': 5}},
]
seen = set()
output = []
for d in lst:
for k, v in d.items():
signature = pformat(v)
if signature not in seen:
seen.add(signature)
output.append({k: v})
output becomes:
[{1: {'a': [1, 2, 3], 'b': 4}},
{2: {'a': [4, 5, 6], 'd': 5}},
{4: {'a': (4, 5, 6), 'd': 5}}]
You can maybe use a function like this to turn your objects into something hashable:
def make_hashable(o):
if isinstance(o, dict):
return frozenset((k, make_hashable(v)) for k, v in o.items())
elif isinstance(o, list):
return tuple(make_hashable(elem) for elem in o)
elif isinstance(o, set):
return frozenset(make_hashable(elem) for elem in o)
else:
return o
Then you keep a set of seen objects and keep only the keys of each dictionary containing objects that you did not see before:
lst = [
{1: {'a':[1,2,3], 'b': 4}},
{2: {'a':[4,5,6], 'd': 5}},
{3: {'a':[1,2,3], 'b': 4}},
]
seen = set()
result_keys = []
for elem in lst:
keep_keys = []
for k, v in elem.items():
v_hashable = make_hashable(v)
if v_hashable not in seen:
seen.add(v_hashable)
keep_keys.append(k)
result_keys.append(keep_keys)
result = [{k: elem[k] for k in keys} for elem, keys in zip(lst, result_keys) if keys]
print(result)
# [{1: {'a': [1, 2, 3], 'b': 4}}, {2: {'a': [4, 5, 6], 'd': 5}}]
Note that, as blhsing notes in the comments, this has some limitations, such as considering (1, 2) and [1, 2] equals, as well as {1: 2} and {(1, 2)}. Also, some types may not be convertible to an equivalent hashable type.
EDIT: As a_guest suggests, you can work around the type ambiguity by returning the type itself along with the hashable object in make_hashable:
def make_hashable(o):
t = type(o)
if isinstance(o, dict):
o = frozenset((k, make_hashable(v)) for k, v in o.items())
elif isinstance(o, list):
o = tuple(make_hashable(elem) for elem in o)
elif isinstance(o, set):
o = frozenset(make_hashable(elem) for elem in o)
return t, o
If you don't need to look into the hashable object, this will easily provide strict type comparison. Note in this case even things like {1, 2} and frozenset({1, 2}) will be different.
This is the simplest solution I've been able to come up with assuming the nested dictionary like
{1: {'a': [1,2,3,5,79], 'b': 234 ...}}
as long as the only container inside the dictionary is a list like {'a': [1,2,3..]} then this will work. Or you can just add a simple check like the function below will show.
def serialize(dct): # this is the sub {'a': [1,2,3]} dictionary
tmp = []
for value in dct.values():
if type(value) == list:
tmp.append(tuple(value))
else:
tmp.append(value)
return tuple(tmp)
def clean_up(lst):
seen = set()
clean = []
for dct in lst:
# grabs the 1..1000 key inside the primary dictionary
# assuming there is only 1 key being the "id" or the counter etc...
key = list(dct.keys())[0]
serialized = serialize(dct[key])
if serialized not in seen:
seen.add(serialized)
clean.append(dct)
return clean
So the function serialize grabs the nested dictionary and creates a simple tuple from the contents. This is then checked if its in the set "seen" to verify its unique.
benchmarks
generate a data set using some random values just because
lst = []
for i in range(1,1000):
dct = {
i: {
random.choice(string.ascii_letters): [n for n in range(random.randint(0,i))],
random.choice(string.ascii_letters): random.randint(0,i)
}
}
lst.append(dct)
Running the benchmarks:
%timeit clean_up(lst)
3.25 ms ± 17.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit jdhesa(lst)
126 ms ± 606 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
As seen the function clean_up is significantly faster but simpler (not necessarily a good thing) in its implementation checks.
You can define a custom hash of your dictionaries by subclassing dict:
class MyData(dict):
def __hash__(self):
return hash((k, repr(v)) for k, v in self.items())
l = [
{1: {'a': [1, 2, 3], 'b': 4}},
{2: {'a': [4, 5, 6], 'd': 5}},
{3: {'b': 4, 'a': [1, 2, 3]}},
{4: {'a': (4, 5, 6), 'd': 5}},
]
s = set([MyData(*d.values()) for d in l])
This is assuming that all the dictionaries in the list have only one key-value pair.
I don't know how big is your list and how many duplicates are in it, but, just in case, here is the basic solution.
It might be not efficient, but you don't have to worry about elements type:
import datetime as dt
data = [
{1: {"b": 4, "a":[1,2,3]}},
{2: {"a":[4,5,6], "d": 5}},
{3: {"a":[1,2,3], "b": 4}},
{4: {'a': dt.datetime(2019, 5, 10), 'd': set([4])}},
{5: {'a': dt.datetime(2019, 5, 10), 'd': set([4])}},
{6: {"a":[2,5,1], "b": 99}},
{7: {"a":[5,2,1], "b": 99}},
{8: {"a":(5,2,1), "b": 99}}
]
seen = []
output = []
for d in data:
for k, v in d.items():
if v not in seen:
seen.append(v)
output.append({k:v})
>>> print(output)
[{1: {'a': [1, 2, 3], 'b': 4}},
{2: {'a': [4, 5, 6], 'd': 5}},
{4: {'a': datetime.datetime(2019, 5, 10, 0, 0), 'd': {4}}},
{6: {'a': [2, 5, 1], 'b': 99}},
{7: {'a': [5, 2, 1], 'b': 99}},
{8: {'a': (5, 2, 1), 'b': 99}}]

How can I create a dictionary of dictionaries of Unique Key Pairs?

I have a pandas dataframe with columns IssuerId and Sedol. The IssuerIds will be the same in some instances but the Sedol will always be different.
I would like to create a dictionary or multi-level index that aggregates these such that I can easily traverse them. For instance I currently have:
IssuerID Sedol
1 1
1 2
2 3
3 4
And I want to somehow create:
[{1: [1,2]},{2: 3},{3:4}]
If you do groupBy + apply(list) and then call to_dict, you get:
d = df.groupby("IssuerID")["Sedol"].apply(list).to_dict()
print(d)
#{1: [1, 2], 2: [3], 3: [4]}
Now just reformat d to get your desired output.
If you want a dictionary, use a dict comprehension:
new_dict = {k: v if len(v) > 1 else v[0] for k, v in d.items()}
print(new_dict)
#{1: [1, 2], 2: 3, 3: 4}
If you want a list of dictionaries, use a list comprehension:
new_list = [{k: v if len(v) > 1 else v[0]} for k, v in d.items()]
print(new_list )
#[{1: [1, 2]}, {2: 3}, {3: 4}]
Using groupby
l=[{x:y.tolist()}for x , y in df.groupby('IssuerID')['Sedol']]
l
[{1: [1, 2]}, {2: [3]}, {3: [4]}]

pythonic way of reassigning values to dictionary

I have a dictionary with lists as values:
my_dict = {1: [2,3], 2: [4, 5], 3: [6, 7]}
and I want to get to update the dictionary to update the values to be the sum of the old list values:
my_dict = {1: 5, 2: 9, 3: 13}
What is the most efficient/pythonic way of doing so? What I usually do is:
for key in my_dict:
my_dict[key] = sum(my_dict[key])
Are there better ways?
You can use a dictionary comprehension:
my_dict = {1: [2,3], 2: [4, 5], 3: [6, 7]}
new_d = {a:sum(b) for a, b in my_dict.items()}
Output:
{1: 5, 2: 9, 3: 13}
You can use reduce instead of sum:
from functools import reduce
my_dict = {1: [2,3], 2: [4, 5], 3: [6, 7]}
final = {k: reduce(lambda x,y: x+y, v) for k,v in my_dict.items()}
output:
{1: 5, 2: 9, 3: 13}
Otherwise you can refer to this thread for more informations.

List of dicts to/from dict of lists

I want to change back and forth between a dictionary of (equal-length) lists:
DL = {'a': [0, 1], 'b': [2, 3]}
and a list of dictionaries:
LD = [{'a': 0, 'b': 2}, {'a': 1, 'b': 3}]
For those of you that enjoy clever/hacky one-liners.
Here is DL to LD:
v = [dict(zip(DL,t)) for t in zip(*DL.values())]
print(v)
and LD to DL:
v = {k: [dic[k] for dic in LD] for k in LD[0]}
print(v)
LD to DL is a little hackier since you are assuming that the keys are the same in each dict. Also, please note that I do not condone the use of such code in any kind of real system.
If you're allowed to use outside packages, Pandas works great for this:
import pandas as pd
pd.DataFrame(DL).to_dict(orient="records")
Which outputs:
[{'a': 0, 'b': 2}, {'a': 1, 'b': 3}]
You can also use orient="list" to get back the original structure
{'a': [0, 1], 'b': [2, 3]}
Perhaps consider using numpy:
import numpy as np
arr = np.array([(0, 2), (1, 3)], dtype=[('a', int), ('b', int)])
print(arr)
# [(0, 2) (1, 3)]
Here we access columns indexed by names, e.g. 'a', or 'b' (sort of like DL):
print(arr['a'])
# [0 1]
Here we access rows by integer index (sort of like LD):
print(arr[0])
# (0, 2)
Each value in the row can be accessed by column name (sort of like LD):
print(arr[0]['b'])
# 2
To go from the list of dictionaries, it is straightforward:
You can use this form:
DL={'a':[0,1],'b':[2,3], 'c':[4,5]}
LD=[{'a':0,'b':2, 'c':4},{'a':1,'b':3, 'c':5}]
nd={}
for d in LD:
for k,v in d.items():
try:
nd[k].append(v)
except KeyError:
nd[k]=[v]
print nd
#{'a': [0, 1], 'c': [4, 5], 'b': [2, 3]}
Or use defaultdict:
nd=cl.defaultdict(list)
for d in LD:
for key,val in d.items():
nd[key].append(val)
print dict(nd.items())
#{'a': [0, 1], 'c': [4, 5], 'b': [2, 3]}
Going the other way is problematic. You need to have some information of the insertion order into the list from keys from the dictionary. Recall that the order of keys in a dict is not necessarily the same as the original insertion order.
For giggles, assume the insertion order is based on sorted keys. You can then do it this way:
nl=[]
nl_index=[]
for k in sorted(DL.keys()):
nl.append({k:[]})
nl_index.append(k)
for key,l in DL.items():
for item in l:
nl[nl_index.index(key)][key].append(item)
print nl
#[{'a': [0, 1]}, {'b': [2, 3]}, {'c': [4, 5]}]
If your question was based on curiosity, there is your answer. If you have a real-world problem, let me suggest you rethink your data structures. Neither of these seems to be a very scalable solution.
Here are the one-line solutions (spread out over multiple lines for readability) that I came up with:
if dl is your original dict of lists:
dl = {"a":[0, 1],"b":[2, 3]}
Then here's how to convert it to a list of dicts:
ld = [{key:value[index] for key,value in dl.items()}
for index in range(max(map(len,dl.values())))]
Which, if you assume that all your lists are the same length, you can simplify and gain a performance increase by going to:
ld = [{key:value[index] for key, value in dl.items()}
for index in range(len(dl.values()[0]))]
Here's how to convert that back into a dict of lists:
dl2 = {key:[item[key] for item in ld]
for key in list(functools.reduce(
lambda x, y: x.union(y),
(set(dicts.keys()) for dicts in ld)
))
}
If you're using Python 2 instead of Python 3, you can just use reduce instead of functools.reduce there.
You can simplify this if you assume that all the dicts in your list will have the same keys:
dl2 = {key:[item[key] for item in ld] for key in ld[0].keys() }
cytoolz.dicttoolz.merge_with
Docs
from cytoolz.dicttoolz import merge_with
merge_with(list, *LD)
{'a': [0, 1], 'b': [2, 3]}
Non-cython version
Docs
from toolz.dicttoolz import merge_with
merge_with(list, *LD)
{'a': [0, 1], 'b': [2, 3]}
The python module of pandas can give you an easy-understanding solution. As a complement to #chiang's answer, the solutions of both D-to-L and L-to-D are as follows:
import pandas as pd
DL = {'a': [0, 1], 'b': [2, 3]}
out1 = pd.DataFrame(DL).to_dict('records')
Output:
[{'a': 0, 'b': 2}, {'a': 1, 'b': 3}]
In the other direction:
LD = [{'a': 0, 'b': 2}, {'a': 1, 'b': 3}]
out2 = pd.DataFrame(LD).to_dict('list')
Output:
{'a': [0, 1], 'b': [2, 3]}
Cleanest way I can think of a summer friday. As a bonus, it supports lists of different lengths (but in this case, DLtoLD(LDtoDL(l)) is no more identity).
From list to dict
Actually less clean than #dwerk's defaultdict version.
def LDtoDL (l) :
result = {}
for d in l :
for k, v in d.items() :
result[k] = result.get(k,[]) + [v] #inefficient
return result
From dict to list
def DLtoLD (d) :
if not d :
return []
#reserve as much *distinct* dicts as the longest sequence
result = [{} for i in range(max (map (len, d.values())))]
#fill each dict, one key at a time
for k, seq in d.items() :
for oneDict, oneValue in zip(result, seq) :
oneDict[k] = oneValue
return result
I needed such a method which works for lists of different lengths (so this is a generalization of the original question). Since I did not find any code here that the way that I expected, here's my code which works for me:
def dict_of_lists_to_list_of_dicts(dict_of_lists: Dict[S, List[T]]) -> List[Dict[S, T]]:
keys = list(dict_of_lists.keys())
list_of_values = [dict_of_lists[key] for key in keys]
product = list(itertools.product(*list_of_values))
return [dict(zip(keys, product_elem)) for product_elem in product]
Examples:
>>> dict_of_lists_to_list_of_dicts({1: [3], 2: [4, 5]})
[{1: 3, 2: 4}, {1: 3, 2: 5}]
>>> dict_of_lists_to_list_of_dicts({1: [3, 4], 2: [5]})
[{1: 3, 2: 5}, {1: 4, 2: 5}]
>>> dict_of_lists_to_list_of_dicts({1: [3, 4], 2: [5, 6]})
[{1: 3, 2: 5}, {1: 3, 2: 6}, {1: 4, 2: 5}, {1: 4, 2: 6}]
>>> dict_of_lists_to_list_of_dicts({1: [3, 4], 2: [5, 6], 7: [8, 9, 10]})
[{1: 3, 2: 5, 7: 8},
{1: 3, 2: 5, 7: 9},
{1: 3, 2: 5, 7: 10},
{1: 3, 2: 6, 7: 8},
{1: 3, 2: 6, 7: 9},
{1: 3, 2: 6, 7: 10},
{1: 4, 2: 5, 7: 8},
{1: 4, 2: 5, 7: 9},
{1: 4, 2: 5, 7: 10},
{1: 4, 2: 6, 7: 8},
{1: 4, 2: 6, 7: 9},
{1: 4, 2: 6, 7: 10}]
Here my small script :
a = {'a': [0, 1], 'b': [2, 3]}
elem = {}
result = []
for i in a['a']: # (1)
for key, value in a.items():
elem[key] = value[i]
result.append(elem)
elem = {}
print result
I'm not sure that is the beautiful way.
(1) You suppose that you have the same length for the lists
Here is a solution without any libraries used:
def dl_to_ld(initial):
finalList = []
neededLen = 0
for key in initial:
if(len(initial[key]) > neededLen):
neededLen = len(initial[key])
for i in range(neededLen):
finalList.append({})
for i in range(len(finalList)):
for key in initial:
try:
finalList[i][key] = initial[key][i]
except:
pass
return finalList
You can call it as follows:
dl = {'a':[0,1],'b':[2,3]}
print(dl_to_ld(dl))
#[{'a': 0, 'b': 2}, {'a': 1, 'b': 3}]
If you don't mind a generator, you can use something like
def f(dl):
l = list((k,v.__iter__()) for k,v in dl.items())
while True:
d = dict((k,i.next()) for k,i in l)
if not d:
break
yield d
It's not as "clean" as it could be for Technical Reasons: My original implementation did yield dict(...), but this ends up being the empty dictionary because (in Python 2.5) a for b in c does not distinguish between a StopIteration exception when iterating over c and a StopIteration exception when evaluating a.
On the other hand, I can't work out what you're actually trying to do; it might be more sensible to design a data structure that meets your requirements instead of trying to shoehorn it in to the existing data structures. (For example, a list of dicts is a poor way to represent the result of a database query.)
List of dicts ⟶ dict of lists
from collections import defaultdict
from typing import TypeVar
K = TypeVar("K")
V = TypeVar("V")
def ld_to_dl(ld: list[dict[K, V]]) -> dict[K, list[V]]:
dl = defaultdict(list)
for d in ld:
for k, v in d.items():
dl[k].append(v)
return dl
defaultdict creates an empty list if one does not exist upon key access.
Dict of lists ⟶ list of dicts
Collecting into "jagged" dictionaries
from typing import TypeVar
K = TypeVar("K")
V = TypeVar("V")
def dl_to_ld(dl: dict[K, list[V]]) -> list[dict[K, V]]:
ld = []
for k, vs in dl.items():
ld += [{} for _ in range(len(vs) - len(ld))]
for i, v in enumerate(vs):
ld[i][k] = v
return ld
This generates a list of dictionaries ld that may be missing items if the lengths of the lists in dl are unequal. It loops over all key-values in dl, and creates empty dictionaries if ld does not have enough.
Collecting into "complete" dictionaries only
(Usually intended only for equal-length lists.)
from typing import TypeVar
K = TypeVar("K")
V = TypeVar("V")
def dl_to_ld(dl: dict[K, list[V]]) -> list[dict[K, V]]:
ld = [dict(zip(dl.keys(), v)) for v in zip(*dl.values())]
return ld
This generates a list of dictionaries ld that have the length of the smallest list in dl.
DL={'a':[0,1,2,3],'b':[2,3,4,5]}
LD=[{'a':0,'b':2},{'a':1,'b':3}]
Empty_list = []
Empty_dict = {}
# to find length of list in values of dictionry
len_list = 0
for i in DL.values():
if len_list < len(i):
len_list = len(i)
for k in range(len_list):
for i,j in DL.items():
Empty_dict[i] = j[k]
Empty_list.append(Empty_dict)
Empty_dict = {}
LD = Empty_list

Categories

Resources