Python: Combine several nested lists into a dictionary

Python: Combine several nested lists into a dictionary - python

I have a bunch of lists like the following two:
['a', ['b', ['x', '1'], ['y', '2']]]
['a', ['c', ['xx', '4'], ['gg', ['m', '3']]]]
What is the easiest way to combine all of them into a single dictionary that looks like:
{'a': {
'b': {
'x':1,
'y':2
}
'c': {
'xx':4,
'gg': {
'm':3
}
}
}
The depth of nesting is variable.

Here's a very crude implementation, it does not handle weird cases such as lists having less than two elements and it overwrites duplicate keys, but its' something to get you started:
l1 = ['a', ['b', ['x', '1'], ['y', '2']]]
l2 = ['a', ['c', ['xx', '4'], ['gg', ['m', '3']]]]
def combine(d, l):
if not l[0] in d:
d[l[0]] = {}
for v in l[1:]:
if type(v) == list:
combine(d[l[0]],v)
else:
d[l[0]] = v
h = {}
combine(h, l1)
combine(h, l2)
print h
Output:
{'a': {'c': {'gg': {'m': '3'}, 'xx': '4'}, 'b': {'y': '2', 'x': '1'}}}

It's not really 'pythonic' but i dont see a good way to do this without recursion
def listToDict(l):
if type(l) != type([]): return l
return {l[0] : listToDict(l[1])}

It made the most sense to me to break this problem into two parts (well, that and I misread the question the first time through..'S)
transformation
The first part transforms the [key, list1, list2] data structure into nested dictionaries:
def recdict(elements):
"""Create recursive dictionaries from [k, v1, v2, ...] lists.
>>> import pprint, functools
>>> pprint = functools.partial(pprint.pprint, width=2)
>>> pprint(recdict(['a', ['b', ['x', '1'], ['y', '2']]]))
{'a': {'b': {'x': '1',
'y': '2'}}}
>>> pprint(recdict(['a', ['c', ['xx', '4'], ['gg', ['m', '3']]]]))
{'a': {'c': {'gg': {'m': '3'},
'xx': '4'}}}
"""
def rec(item):
if isinstance(item[1], list):
return [item[0], dict(rec(e) for e in item[1:])]
return item
return dict([rec(elements)])
It expects that
every list has at least two elements
the first element of every list is a key
if the second element of a list is a list, then all subsequent elements are also lists; these are combined into a dictionary.
The tricky bit (at least for me) was realizing that you have to return a list from the recursive function rather than a dictionary. Otherwise, you can't combine the parallel lists that form the second and third elements of some of the lists.
To make this more generally useful (i.e. to tuples and other sequences), I would change
if isinstance(item[1], list):
to
if (isinstance(item[1], collections.Sequence)
and not isinstance(item[1], basestring)):
You can also make it work for any iterable but that requires a little bit of reorganization.
merging
The second part merges the dictionaries that result from running the first routine on the two given data structures. I think this will recursively merge any number of dictionaries that don't have conflicting keys, though I didn't really test it for anything other than this use case.
def mergedicts(*dicts):
"""Recursively merge an arbitrary number of dictionaries.
>>> import pprint
>>> d1 = {'a': {'b': {'x': '1',
... 'y': '2'}}}
>>> d2 = {'a': {'c': {'gg': {'m': '3'},
... 'xx': '4'}}}
>>> pprint.pprint(mergedicts(d1, d2), width=2)
{'a': {'b': {'x': '1',
'y': '2'},
'c': {'gg': {'m': '3'},
'xx': '4'}}}
"""
keys = set(k for d in dicts for k in d)
def vals(key):
"""Returns all values for `key` in all `dicts`."""
withkey = (d for d in dicts if d.has_key(key))
return [d[key] for d in withkey]
def recurse(*values):
"""Recurse if the values are dictionaries."""
if isinstance(values[0], dict):
return mergedicts(*values)
if len(values) == 1:
return values[0]
raise TypeError("Multiple non-dictionary values for a key.")
return dict((key, recurse(*vals(key))) for key in keys)

Related

How to change a dictionary while iterating over its keys while iterating over a list

I have a list l.
I have a dictionary d.
I want to iterate over l. For any list-item, I want to iterate over the d.keys.
If some condition is met, I would like to 'update' my dictionary.
I have naively tried to nest two for-loops and put in an if-statement -- One cannot change the length of the object one is iterating over.
d = {'this': '1', 'is': '2', 'a': '3', 'list': '4'}
l = ['A', 'B', 'C', 'D', 'E']
for word in l:
for key in d.keys():
if len(key) < 2:#some condition
d.pop(key)
else:
print(word, key)
This is the output I get:
A this
A is
Traceback (most recent call last):
File "untitled3.py", line 6, in <module>
for key in d.keys():
RuntimeError: dictionary changed size during iteration

Instead of looping over d you could loop over a copy.
d = {'this': '1', 'is': '2', 'a': '3', 'list': '4'}
l = ['A', 'B', 'C', 'D', 'E']
for word in l:
for key in d.copy().keys(): # Notice the change
if len(key) < 2:#some condition
d.pop(key)
else:
print(word, key)

You should not change the size of a dictionary while iterating over a view of that dictionary. You can, instead, construct a new dictionary and then print whatever you like. For example:
d = {'this': '1', 'is': '2', 'a': '3', 'list': '4'}
L = ['A', 'B', 'C', 'D', 'E']
d_new = {k: v for k, v in d.items() if len(k) >= 2}
for word in L:
for key in d_new:
print(word, key)
As described in the docs:
The objects returned by dict.keys(), dict.values() and
dict.items() are view objects. They provide a dynamic view on the
dictionary’s entries, which means that when the dictionary changes,
the view reflects these changes....
Iterating views while adding or deleting entries in the dictionary may
raise a RuntimeError or fail to iterate over all entries.

Check if repeating Key or Value exists in Python Dictionary

The following is my dictionary and I need to check if I have repeated key or Value
dict = {' 1': 'a', '2': 'b', '3': 'b', '4': 'c', '5': 'd', '5': 'e'}
This should return false or some kind of indicator which helps me print out that key or value might be repeated. It would be much appreciated if I am able to identify if a key is repeated or a Value (but not required).

Dictionaries can't have duplicate keys, so in case of repeated keys it only keeps the last value, so check values (one-liner is your friend):
print(('There are duplicates' if len(set(dict.values()))!=len(values) else 'No duplicates'))

Well in a dictionary keys can't repeat so we only have to deal with values.
dict = {...}
# get the values
values = list(dict.values())
And then you can use a set() to check for duplicates:
if len(values) == len(set(values)): print("no duplicates")
else: print("duplicates)

It's not possible to check if a key repeats in a dictionary, because dictionaries in Python only support unique keys. If you enter the dictionary as is, only the last value will be associated with the redundant key:
In [4]: dict = {' 1': 'a', '2': 'b', '3': 'b', '4': 'c', '5': 'd', '5': 'e'}
In [5]: dict
Out[5]: {' 1': 'a', '2': 'b', '3': 'b', '4': 'c', '5': 'e'}

A one-liner to find repeating values
In [138]: {v: [k for k in d if d[k] == v] for v in set(d.values())}
Out[138]: {'a': [' 1'], 'b': ['2', '3'], 'c': ['4'], 'e': ['5']}
Check all the unique values of the dict with set(d.values()) and then creating a list of keys that correspond to those values.
Note: repeating keys will just be overwritten
In [139]: {'a': 1, 'a': 2}
Out[139]: {'a': 2}

What about
has_dupes = len(d) != len(set(d.values()))
I'm on my phone so I cant test it. But j think it will work.

Well, although key value should be unique according to the documentation, there is still condition where repeated key could appear.
For example,
>>> import json
>>> a = {1:10, "1":20}
>>> b = json.dumps(a)
>>> b
'{"1": 20, "1": 10}'
>>> c = json.loads(b)
>>> c
{u'1': 10}
>>>
But in general, when python finds out there's conflict, it takes the latest value assigned to that key.
For your question, you should use comparison such as
len(dict) == len(set(dict.values()))
because set in python contains an unordered collection of unique and immutable objects, it could automatically get all unique values even when you have duplicate values in dict.values()

Pythonic way to create a dictionary from a list where the keys are the elements that are found in another list and values are elements between keys

Considering that I have two lists like:
l1 = ['a', 'c', 'b', 'e', 'f', 'd']
l2 = [
'x','q','we','da','po',
'a', 'el1', 'el2', 'el3', 'el4',
'b', 'some_other_el_1', 'some_other_el_2',
'c', 'another_element_1', 'another_element_2',
'd', '', '', 'another_element_3', 'd4'
]
and I need to create a dictionary where the keys are those element from second list that are found in the first and values are lists of elements found between "keys" like:
result = {
'a': ['el1', 'el2', 'el3', 'el4'],
'b': ['some_other_el_1', 'some_other_el_2'],
'c': ['another_element_1', 'another_element_2'],
'd': ['', '', 'another_element_3', 'd4']
}
What's a more pythonic way to do this?
Currently I'm doing this :
# I'm not sure that the first element in the second list
# will also be in the first so I have to create a key
k = ''
d[k] = []
for x in l2:
if x in l1:
k = x
d[k] = []
else:
d[k].append(x)
But I'm quite positive that this is not the best way to do it and it also doesn't looks nice :)
Edit:
I also have to mention that no list is necessary ordered and neither the second list must start with an element from the first one.

I don't think you'll do much better if this is the most specific statement of the problem. I mean I'd do it this way, but it's not much better.
import collections
d = collections.defaultdict(list)
s = set(l1)
k = ''
for x in l2:
if x in s:
k = x
else:
d[k].append(x)

For fun, you can also do this with itertools and 3rd party numpy:
import numpy as np
from itertools import zip_longest, islice
arr = np.where(np.in1d(l2, l1))[0]
res = {l2[i]: l2[i+1: j] for i, j in zip_longest(arr, islice(arr, 1, None))}
print(res)
{'a': ['el1', 'el2', 'el3', 'el4'],
'b': ['some_other_el_1', 'some_other_el_2'],
'c': ['another_element_1', 'another_element_2'],
'd': ['', '', 'another_element_3', 'd4']}

Here is a version using itertools.groupby. It may or may not be more efficient than the plain version from your post, depending on how groupby is implemented, because the for loop has fewer iterations.
from itertools import groupby
from collections import defaultdict, deque
def group_by_keys(keys, values):
"""
>>> sorted(group_by_keys('abcdef', [
... 1, 2, 3,
... 'b', 4, 5,
... 'd',
... 'a', 6, 7,
... 'c', 8, 9,
... 'a', 10, 11, 12
... ]).items())
[('a', [6, 7, 10, 11, 12]), ('b', [4, 5]), ('c', [8, 9])]
"""
keys = set(keys)
result = defaultdict(list)
current_key = None
for is_key, items in groupby(values, key=lambda x: x in keys):
if is_key:
current_key = deque(items, maxlen=1).pop() # last of items
elif current_key is not None:
result[current_key].extend(items)
return result
This doesn't distinguish between keys that don't occur in values at all (like e and f), and keys for which there are no corresponding values (like d). If this information is needed, one of the other solutions might be better suited.

Updated ... Again
I misinterpreted the question. If you are using large lists then list comprehensions are the way to go and they are fairly simple once you learn how to use them.
I am going to use two list comprehensions.
idxs = [i for i, val in enumerate(l2) if val in l1] + [len(l2)+1]
res = {l2[idxs[i]]: list(l2[idxs[i]+1: idxs[i+1]]) for i in range(len(idxs)-1)}
print(res)
Results:
{'a': ['el1', 'el2', 'el3', 'el4'],
'b': ['some_other_el_1', 'some_other_el_2'],
'c': ['another_element_1', 'another_element_2'],
'd': ['', '', 'another_element_3', 'd4']}
Speed Testing for large lists:
import collections
l1 = ['a', 'c', 'b', 'e', 'f', 'd']
l2 = [
'x','q','we','da','po',
'a', 'el1', 'el2', 'el3', 'el4', *(str(i) for i in range(300)),
'b', 'some_other_el_1', 'some_other_el_2', *(str(i) for i in range(100)),
'c', 'another_element_1', 'another_element_2', *(str(i) for i in range(200)),
'd', '', '', 'another_element_3', 'd4'
]
def run_comp():
idxs = [i for i, val in enumerate(l2) if val in l1] + [len(l2)+1]
res = {l2[idxs[i]]: list(l2[idxs[i]+1: idxs[i+1]]) for i in range(len(idxs)-1)}
def run_other():
d = collections.defaultdict(list)
k = ''
for x in l2:
if x in l1:
k = x
else:
d[k].append(x)
import timeit
print('For Loop:', timeit.timeit(run_other, number=1000))
print("List Comprehension:", timeit.timeit(run_comp, number=1000))
Results:
For Loop: 0.1327093063242541
List Comprehension: 0.09343156142774986
old stuff below
This is rather simple with list comprehensions.
{key: [val for val in l2 if key in val] for key in l1}
Results:
{'a': ['a', 'a1', 'a2', 'a3', 'a4'],
'b': ['b', 'b1', 'b2', 'b3', 'b4'],
'c': ['c', 'c1', 'c2', 'c3', 'c4'],
'd': ['d', 'd1', 'd2', 'd3', 'd4'],
'e': [],
'f': []}
The code below shows what is happening above.
d = {}
for key in l1:
d[key] = []
for val in l2:
if key in val:
d[key].append(val)
The list comprehension / dictionary comprehension (First piece of code) is actually way faster. List comprehensions are creating the list in place which is much faster than walking through and appending to the list. Appending makes the program walk the list, allocate more memory, and add the data to the list which can be very slow for large lists.
References:
http://www.pythonforbeginners.com/basics/list-comprehensions-in-python
https://docs.python.org/3.6/tutorial/datastructures.html#list-comprehensions

You can use itertools.groupby:
import itertools
l1 = ['a', 'c', 'b', 'e', 'f', 'd']
l2 = ['x', 'q', 'we', 'da', 'po', 'a', 'el1', 'el2', 'el3', 'el4', 'b', 'some_other_el_1', 'some_other_el_2', 'c', 'another_element_1', 'another_element_2', 'd', '', '', 'another_element_3', 'd4']
groups = [[a, list(b)] for a, b in itertools.groupby(l2, key=lambda x:x in l1)]
final_dict = {groups[i][-1][-1]:groups[i+1][-1] for i in range(len(groups)-1) if groups[i][0]}
Output:
{'a': ['el1', 'el2', 'el3', 'el4'], 'b': ['some_other_el_1', 'some_other_el_2'], 'c': ['another_element_1', 'another_element_2'], 'd': ['', '', 'another_element_3', 'd4']}

Your code is readable, does the job and is reasonably efficient. There's no need to change much!
You could use more descriptive variable names and replace l1 with a set for faster lookup:
keys = ('a', 'c', 'b', 'e', 'f', 'd')
keys_and_values = [
'x','q','we','da','po',
'a', 'el1', 'el2', 'el3', 'el4',
'b', 'some_other_el_1', 'some_other_el_2',
'c', 'another_element_1', 'another_element_2',
'd', '', '', 'another_element_3', 'd4'
]
current_key = None
result = {}
for x in keys_and_values:
if x in keys:
current_key = x
result[current_key] = []
elif current_key:
result[current_key].append(x)
print(result)
# {'a': ['el1', 'el2', 'el3', 'el4'],
# 'c': ['another_element_1', 'another_element_2'],
# 'b': ['some_other_el_1', 'some_other_el_2'],
# 'd': ['', '', 'another_element_3', 'd4']}

def find_index():
idxs = [l2.index(i) for i in set(l1).intersection(set(l2))]
idxs.sort()
idxs+= [len(l2)+1]
res = {l2[idxs[i]]: list(l2[idxs[i]+1: idxs[i+1]]) for i in range(len(idxs)-1)}
return(res)
Comparison of methods, using justengel's test:
justengel
run_comp: .455
run_other: .244
mkrieger1
group_by_keys: .160
me
find_index: .068
Note that my method ignores keys that don't appear l2, and doesn't handle cases where keys appear more than once in l2. Adding in empty lists for keys that don't appear in l2 can be done by {**res, **{key: [] for key in set(l1).difference(set(l2))}}, which raises the time to .105.

Even cleaner than turning l1 into a set, use the keys of the dictionary you're building. Like this
d = {x: [] for x in l1}
k = None
for x in l2:
if x in d:
k = x
elif k is not None:
d[k].append(x)
This is because (in the worst case) your code would be iterating over all the values in l1 for every value in l2 on the if x in l1: line, because checking if a value is in a list takes linear time. Checking if a value is in a dictionary's keys is constant time in the average case (same with sets, as already suggested by Eric Duminil).
I set k to None and check for it because your code would've returned d with '': ['x','q','we','da','po'], which is presumably not what you want. This assumes l1 can't contain None.
My solution also assumes it's okay for the resulting dictionary to contain keys with empty lists if there are items in l1 that never appear in l2. If that's not okay, you can remove them at the end with
final_d = {k: v for k, v in d.items() if v}

Convert list to dictionary with duplicate keys using dict comprehension [duplicate]

This question already has answers here:
How can one make a dictionary with duplicate keys in Python?
(9 answers)
Closed 6 months ago.
Good day all,
I am trying to convert a list of length-2 items to a dictionary using the below:
my_list = ["b4", "c3", "c5"]
my_dict = {key: value for (key, value) in my_list}
The issue is that when a key occurrence is more than one in the list, only the last key and its value are kept.
So in this case instead of
my_dict = {'c': '3', 'c': '5', 'b': '4'}
I get
my_dict = {'c': '5', 'b': '4'}
How can I keep all key:value pairs even if there are duplicate keys.
Thanks

For one key in a dictionary you can only store one value.
You can chose to have the value as a list.
{'b': ['4'], 'c': ['3', '5']}
following code will do that for you :
new_dict = {}
for (key, value) in my_list:
if key in new_dict:
new_dict[key].append(value)
else:
new_dict[key] = [value]
print(new_dict)
# output: {'b': ['4'], 'c': ['3', '5']}
Same thing can be done with setdefault. Thanks #Aadit M Shah for pointing it out
new_dict = {}
for (key, value) in my_list:
new_dict.setdefault(key, []).append(value)
print(new_dict)
# output: {'b': ['4'], 'c': ['3', '5']}
Same thing can be done with defaultdict. Thanks #MMF for pointing it out.
from collections import defaultdict
new_dict = defaultdict(list)
for (key, value) in my_list:
new_dict[key].append(value)
print(new_dict)
# output: defaultdict(<class 'list'>, {'b': ['4'], 'c': ['3', '5']})
you can also chose to store the value as a list of dictionaries:
[{'b': '4'}, {'c': '3'}, {'c': '5'}]
following code will do that for you
new_list = [{key: value} for (key, value) in my_list]

If you don't care about the O(n^2) asymptotic behaviour you can use a dict comprehension including a list comprehension:
>>> {key: [i[1] for i in my_list if i[0] == key] for (key, value) in my_list}
{'b': ['4'], 'c': ['3', '5']}
or the iteration_utilities.groupedby function (which might be even faster than using collections.defaultdict):
>>> from iteration_utilities import groupedby
>>> from operator import itemgetter
>>> groupedby(my_list, key=itemgetter(0), keep=itemgetter(1))
{'b': ['4'], 'c': ['3', '5']}

You can use defaultdict to avoid checking if a key is in the dictionnary or not :
from collections import defaultdict
my_dict = defaultdict(list)
for k, v in my_list:
my_dict[k].append(v)
Output :
defaultdict(list, {'b': ['4'], 'c': ['3', '5']})

Python miminum value in dictionary of lists

Sorry about the question repost...I should have just edited this question in the first place. Flagged the new one for the mods. Sorry for the trouble
Had to re-write the question due to changed requirements.
I have a dictionary such as the following:
d = {'a': [4, 2], 'b': [3, 4], 'c': [4, 3], 'd': [4, 3], 'e': [4], 'f': [4], 'g': [4]}
I want to get the keys that are associated with the smallest length in the dictionary d, as well as those that have the maximum value.
In this case, the keys with the smallest length (smallest length of lists in this dictionary) should return
'e, 'f', 'g'
And those with the greatest value(the sum of the integers in each list) should return
'b' 'c'
I have tried
min_value = min(dict.itervalues())
min_keys = [k for k in d if dict[k] == min_value]
But that does not give me the result I want.
Any ideas?
Thanks!

Your problem is that your lists contain strings ('2'), and not integers (2). Leave out the quotes, or use the following:
min_value = min(min(map(int, v) for v in dct.values()))
min_keys = [k for k,v in d.items() if min_value in map(int, v)]
Similarily, to calculate the keys with the max length:
max_length = max(map(len, dct.values()))
maxlen_keys = [k for k,v in d.items() if max_length == len(v)]
Also, it's a bad idea to use dict as a variable name, as doing so overshadows the built-in dict.

You can use min() with a key= argument, and specify a key function that compares the way you want.
d = {'a': ['1'], 'b': ['1', '2'], 'c': ['2'], 'd':['1']}
min_value = min(d.values())
min_list = [key for key, value in d.items() if value == min_value]
max_len = len(max(d.values(), key=len))
long_list = [key for key, value in d.items() if len(value) == max_len]
print(min_list)
print(long_list)
Notes:
0) Don't use dict as a variable name; that's the name of the class for dictionary, and if you use it as a variable name you "shadow" it. I just used d for the name here.
1) min_value was easy; no need to use a key= function.
2) max_len uses a key= function, len(), to find the longest value.

How about using sorting and lambdas?
#!/usr/bin/env python
d = {'a': ['1'], 'b': ['1', '2'], 'c': ['8', '1'], 'd':['1'], 'e':['1', '2', '3'], 'f': [4, 1]}
sorted_by_sum_d = sorted(d, key=lambda key: sum(list(int(item) for item in d[key])))
sorted_by_length_d = sorted(d, key=lambda key: len(d[key]))
print "Sorted by sum of the items in the list : %s" % sorted_by_sum_d
print "Sorted by length of the items in the list : %s" % sorted_by_length_d
This would output:
Sorted by sum of the items in the list : ['a', 'd', 'b', 'f', 'e', 'c']
Sorted by length of the items in the list : ['a', 'd', 'c', 'b', 'f', 'e']
Be aware I changed the initial 'd' dictionary (just to make sure it was working)
Then, if you want the item with the biggest sum, you get the last element of the sorted_by_sum_d list.
(I'm not too sure this is what you want, though)
Edit:
If you can ensure that the lists are always going to be lists of integers (or numeric types, for that matter, such as long, float...), there's not need to cast strings to integers. The calculation of the sorted_by_sum_d variable can be done simply using:
d = {'a': [1], 'b': [1, 2], 'c': [8, 1], 'd':[1], 'e':[1, 2, 3], 'f': [4, 1]}
sorted_by_sum_d = sorted(d, key=lambda key: sum(d[key]))

I've found such a simple solution:
min_len = len(min(d.values(), key=(lambda value: len(value)))) # 1
min_keys = [key for i, key in enumerate(d) if len(d[key]) == min_len] # ['e', 'f', 'g']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Combine several nested lists into a dictionary - python

It's not really 'pythonic' but i dont see a good way to do this without recursion def listToDict(l): if type(l) != type([]): return l return {l[0] : listToDict(l[1])}

Related

How to change a dictionary while iterating over its keys while iterating over a list

Check if repeating Key or Value exists in Python Dictionary

Pythonic way to create a dictionary from a list where the keys are the elements that are found in another list and values are elements between keys

Convert list to dictionary with duplicate keys using dict comprehension [duplicate]

Python miminum value in dictionary of lists

Categories

Resources