How do I build a dict using list comprehension? - python

How do I build a dict using list comprehension?
I have two lists.
series = [1,2,3,4,5]
categories = ['A', 'B', 'A', 'C','B']
I want to build a dict where the categories are the keys.
Thanks for your answers I'm looking to produce:
{'A' : [1, 3], 'B' : [2, 5], 'C' : [4]}
Because the keys can't exist twice

You have to have a list of tuples. The tuples are key/value pairs. You don't need a comprehension in this case, just zip:
dict(zip(categories, series))
Produces {'A': 3, 'B': 5, 'C': 4} (as pointed out by comments)
Edit: After looking at the keys, note that you can't have duplicate keys in a dictionary. So without further clarifying what you want, I'm not sure what solution you're looking for.
Edit: To get what you want, it's probably easiest to just do a for loop with either setdefault or a defaultdict.
categoriesMap = {}
for k, v in zip(categories, series):
categoriesMap.setdefault(k, []).append(v)
That should produce {'A': [1, 3], 'B': [2, 5], 'C': [3]}

from collectons import defaultdict
series = [1,2,3,4,5]
categories = ['A', 'B', 'A', 'C','B']
result = defaultdict(list)
for key, val in zip(categories, series)
result[key].append(value)

Rather than being clever (I have an itertools solution I'm fond of) there's nothing wrong with a good, old-fashioned for loop:
>>> from collections import defaultdict
>>>
>>> series = [1,2,3,4,5]
>>> categories = ['A', 'B', 'A', 'C','B']
>>>
>>> d = defaultdict(list)
>>> for c,s in zip(categories, series):
... d[c].append(s)
...
>>> d
defaultdict(<type 'list'>, {'A': [1, 3], 'C': [4], 'B': [2, 5]})
This doesn't use a list comprehension because a list comprehension is the wrong way to do it. But since you seem to really want one for some reason: how about:
>> dict([(c0, [s for (c,s) in zip(categories, series) if c == c0]) for c0 in categories])
{'A': [1, 3], 'C': [4], 'B': [2, 5]}
That has not one but two list comprehensions, and is very inefficient to boot.

In principle you can do as Kris suggested: dict(zip(categories, series)), just be aware that there can not be duplicates in categories (as in your sample code).
EDIT :
Now that you've clarified what you intended, this will work as expected:
from collections import defaultdict
d = defaultdict(list)
for k, v in zip(categories, series):
d[k].append(v)

d={ k:[] for k in categories }
map(lambda k,v: d[k].append(v), categories, series )
result:
d is now = {'A': [1, 3], 'C': [4], 'B': [2, 5]}
or (equivalent) using setdefault (thanks Kris R.)
d={}
map(lambda k,v: d.setdefault(k,[]).append(v), categories, series )

Related

Sorting lists in dictionary based on other list without assigning them again

I have a large dictionary of which I want to sort the list values based on one list. For a simple dictionary I would do it like this:
d = {'a': [2, 3, 1], 'b': [103, 101, 102]}
d['a'], d['b'] = [list(i) for i in zip(*sorted(zip(d['a'], d['b'])))]
print(d)
Output:
{'a': [1, 2, 3], 'b': [102, 103, 101]}
My actual dictionary has many keys with list values, so unpacking the zip tuple like above becomes unpractical. Is there any way to do the go over the keys and values without specifying them all? Something like:
d.values() = [list(i) for i in zip(*sorted(zip(d.values())))]
Using d.values() results in SyntaxError: can't assign function call, but I'm looking for something like this.
If you have many keys (and they all have equal length list values) using pandas sort_values would be an efficient way of sorting:
d = {'a': [2, 3, 1], 'b': [103, 101, 102], 'c' : [4, 5, 6]}
d = pd.DataFrame(d).sort_values(by='a').to_dict('list')
Output:
{'a': [1, 2, 3], 'b': [102, 103, 101], 'c': [6, 4, 5]}
If memory is an issue, you can sort in place, however since that means sort_values returns None, you can no longer chain the operations:
df = pd.DataFrame(d)
df.sort_values(by='a', inplace=True)
d = df.to_dict('list')
The output is the same as above.
As far as I understand your question, you could try simple looping:
for k in d.keys():
d[k] = [list(i) for i in zip(*sorted(zip(d['a'], d[k])))]
where d['a'] stores the list which others should be compared to. However, using dicts in this way seems slow and messy. Since every entry in your dictionary - presumably - is a list of the same length, a simple fix would be to store the data in a numpy array and call an argsort method to sort by ith column:
a = np.array( --your data here-- )
a[a[:, i].argsort()]
Finally, the most clear approach would be to use a pandas DataFrame, which is designed to store large amounts of data using a dict-like syntax. In this way, you could just sort by contents of a named column 'a':
df = pd.DataFrame( --your data here-- )
df.sort_values(by='a')
For further references, please see the links below:
Sorting arrays in NumPy by column
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html
For the given input data and the required output then this will suffice:
from operator import itemgetter
d = {'a': [2, 3, 1], 'b': [103, 101, 102]}
def sort_dict(dict_, refkey):
reflist = sorted([(v, i) for i, v in enumerate(dict_[refkey])], key=itemgetter(0))
for v in dict_.values():
v_ = v[:]
for i, (_, p) in enumerate(reflist):
v[i] = v_[p]
sort_dict(d, 'a')
print(d)
Output:
{'a': [1, 2, 3], 'b': [102, 103, 101]}

How to convert a list to a dict and concatenate values?

I have a list with schema as shown below:
list=[('a',2),('b',4),('a',1),('c',6)]
What I would like to do is convert it to a dict using the first value of each pair as key,I would also like pairs with the same key to be concatenated.For the above the result would be:
dict={ 'a':[2,1] , 'b':[4] , 'c':[6] }
I don't care about the order of the concatenated values,meaning we could also have 'a':[1,2].
How could this be done in python?
Do this:
l = [('a',2),('b',4),('a',1),('c',6)]
d = {}
for item in l:
if item[0] in d:
d[item[0]].append(item[1])
else:
d[item[0]] = [item[1]]
print(d) # {'a': [2, 1], 'b': [4], 'c': [6]}
To make it cleaner you could use defaultdict and 2 for iterators:
from collections import defaultdict
l = [('a',2),('b',4),('a',1),('c',6)]
d = defaultdict(lambda: [])
for key, val in l:
d[key].append(val)
print(dict(d)) # {'a': [2, 1], 'b': [4], 'c': [6]})
You can also use the setdefault method on dict to set a list as the default entry if a key is not in the dictionary yet:
l=[('a',2),('b',4),('a',1),('c',6)]
d = {}
for k, v in l:
d.setdefault(k, []).append(v)
d
{'a': [2, 1], 'b': [4], 'c': [6]}

How to shorten this nested list comprehension?

This question is a continuation of this one: Comprehension list and output <generator object.<locals>.<genexpr> at 0x000002C392688C78>
I was oriented to create a new question.
I have a few dicts inside another dict. And they get pretty big sometimes, since I'm keeping them in log I would like to limit the size of them to 30 'items' (key:value).
So I tried something like this: (In the example I limit the size to two)
main_dict = {
'A':{
'a1': [1,2,3],
'a2': [4,5,6]
},
'B': {
'b1': [0,2,4],
'b2': [1,3,5]
}
}
print([main_dict[x][i][:2] for x in main_dict.keys() for i in main_dict[x].keys()])
The output I get is this:
[[1, 2], [4, 5], [0, 2], [1, 3]]
What I expected was this:
['A':['a1':[1, 2],'a2':[4, 5]], 'B':['b1':[0, 2], 'b2':[1, 3]]]
Or something like that. It doesn't have to be exactly that, but I need to know what value belongs to what dict, which isn't clear in the output I end up getting.
To put it simple all I want is to cut short the sub-dicts inside the dictionary. Elegantly, if possible.
This is a nice clean way to do it in one line, without altering the original dictionary:
print({key: {sub_k: ls[:2] for sub_k, ls in sub_dict.items()} for key, sub_dict in main_dict.items()})
Output:
{'A': {'a1': [1, 2], 'a2': [4, 5]}, 'B': {'b1': [0, 2], 'b2': [1, 3]}}
Your original trial used list comprehension [], but this case actually needs dict comprehension {}.
Try this:
print({key: {sub_key: lst[:2] for sub_key, lst in sub_dict.items()}
for key, sub_dict in main_dict.items()})
Note the use of {} (dict comprehension) instead of [] (list comprehension)
A more efficient approach is to use nested for loops to delete the tail end of the sub-lists in-place:
for d in main_dict.values():
for k in d:
del d[k][2:]
main_dict becomes:
{'A': {'a1': [1, 2], 'a2': [4, 5]}, 'B': {'b1': [0, 2], 'b2': [1, 3]}}
d = {'A':{
'a1': [1,2,3],
'a2': [4,5,6],
'a3': [7,8,9]
},
'B':{
'b1': [0,2,4],
'b2': [1,3,5]
}
}
If the dictionaries are only nested one-deep
q = []
for k,v in d.items():
keys, values = v.keys(), v.values()
values = (value[:2] for value in values)
q.append((k,tuple(zip(keys,values))))
I have rewrote my code based on the comments provided. See below.
my_dict = {}
for key, value in main_dict.iteritems():
sub_dict = {}
for sub_key, sub_value in value.iteritems():
sub_dict[sub_key] = sub_value[:2]
my_dict[key] = sub_dict
print my_dict
This will give you something that looks like this, and save it to a separate variable.
{'A': {'a1': [1, 2], 'a2': [4, 5]}, 'B': {'b1': [0, 2], 'b2': [1, 3]}}

iterate over only two keys of python dictionary

What is the pythonic way to iterate over a dictionary with a setup like this:
dict = {'a': [1, 2, 3], 'b': [3, 4, 5], 'c': 6}
if I only wanted to iterate a for loop over all the values in a and b and skip c. There's obviously a million ways to solve this but I'd prefer to avoid something like:
for each in dict['a']:
# do something
pass
for each in dict['b']:
# do something
pass
of something destructive like:
del dict['c']
for k,v in dict.iteritems():
pass
The more generic way is using filter-like approaches by putting an if in the end of a generator expression.
If you want to iterate over every iterable value, filter with hasattr:
for key in (k for k in dict if hasattr(dict[k], '__iter__')):
for item in dict[key]:
print(item)
If you want to exclude some keys, use a "not in" filter:
invalid = set(['c', 'd'])
for key in (k for k in dict if key not in invalid):
....
If you want to select only specific keys, use a "in" filter:
valid = set(['a', 'b'])
for key in (k for k in dict if key in valid):
....
Similar to SSDMS's solution you can also just do:
mydict = {'a': [1, 2, 3], 'b': [3, 4, 5], 'c': 6}
for each in mydict['a']+mydict['b']:
....
You can use chain from the itertools module to do this:
In [29]: from itertools import chain
In [30]: mydict = {'a': [1, 2, 3], 'b': [3, 4, 5], 'c': 6}
In [31]: for item in chain(mydict['a'], mydict['b']):
...: print(item)
...:
1
2
3
3
4
5
To iterate over only the values the keys' value in the dictionary that are instance of list simply use chain.from_iterable.
wanted_key = ['a', 'b']
for item in chain.from_iterable(mydict[key] for key in wanted_key if isinstance(mydict[key], list)):
# do something with the item

Merge dictionaries retaining values for duplicate keys [duplicate]

This question already has answers here:
How to merge dicts, collecting values from matching keys?
(17 answers)
Closed 12 days ago.
Given n dictionaries, write a function that will return a unique dictionary with a list of values for duplicate keys.
Example:
d1 = {'a': 1, 'b': 2}
d2 = {'c': 3, 'b': 4}
d3 = {'a': 5, 'd': 6}
result:
>>> newdict
{'c': 3, 'd': 6, 'a': [1, 5], 'b': [2, 4]}
My code so far:
>>> def merge_dicts(*dicts):
... x = []
... for item in dicts:
... x.append(item)
... return x
...
>>> merge_dicts(d1, d2, d3)
[{'a': 1, 'b': 2}, {'c': 3, 'b': 4}, {'a': 5, 'd': 6}]
What would be the best way to produce a new dictionary that yields a list of values for those duplicate keys?
Python provides a simple and fast solution to this: the defaultdict in the collections module. From the examples in the documentation:
Using list as the default_factory, it is easy to group a sequence of
key-value pairs into a dictionary of lists:
>>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]
>>> d = defaultdict(list)
>>> for k, v in s:
... d[k].append(v)
...
>>> d.items()
[('blue', [2, 4]), ('red', 1), ('yellow', [1, 3])]
When each key is encountered for the first time, it is not already in
the mapping; so an entry is automatically created using the
default_factory function which returns an empty list. The
list.append() operation then attaches the value to the new list. When
keys are encountered again, the look-up proceeds normally (returning
the list for that key) and the list.append() operation adds another
value to the list.
In your case, that would be roughly:
import collections
def merge_dicts(*dicts):
res = collections.defaultdict(list)
for d in dicts:
for k, v in d.iteritems():
res[k].append(v)
return res
>>> merge_dicts(d1, d2, d3)
defaultdict(<type 'list'>, {'a': [1, 5], 'c': [3], 'b': [2, 4], 'd': [6]})
def merge_dicts(*dicts):
d = {}
for dict in dicts:
for key in dict:
try:
d[key].append(dict[key])
except KeyError:
d[key] = [dict[key]]
return d
This retuns:
{'a': [1, 5], 'b': [2, 4], 'c': [3], 'd': [6]}
There is a slight difference to the question. Here all dictionary values are lists. If that is not to be desired for lists of length 1, then add:
for key in d:
if len(d[key]) == 1:
d[key] = d[key][0]
before the return d statement. However, I cannot really imagine when you would want to remove the list. (Consider the situation where you have lists as values; then removing the list around the items leads to ambiguous situations.)

Categories

Resources