Related
Given an iterable (like a string or list or something), is there a clean, O(n) way to make a dictionary that maps elements to their frequency using 1 line? I don't want to use any external libraries or modules
The code should have the same functionality as the following snippet:
s = 'abcaba'
freq = {}
for i in s:
if i not in freq:
freq[i] = 1
else:
freq[i] += 1
### and now, freq = {'a':3, 'b':2, 'c':1}
This is O(n), but it's a couple of lines. I can also do this:
s = 'abcaba'
freq = {i: s.count(i) for i in s}
### same thing, now freq = {'a':3, 'b':2, 'c':1}
This is 1 line, but it's O(n2), since count is O(n) and you also have a loop.
There's probably an easy solution to this that I'm not thinking of. I apologize if this is a duplicate.
In [212]: s = 'abcaba'
In [213]: collections.Counter(s)
Out[213]: Counter({'a': 3, 'b': 2, 'c': 1})
Here's another approach (though not exactly a one-liner):
In [214]: freq = {}
In [215]: for char in s: freq[char] = freq.get(char, 0)+1
In [216]: freq
Out[216]: {'a': 3, 'b': 2, 'c': 1}
>>> from collections import Counter
>>> freq = Counter('abcab')
>>> freq['a']
2
from collections import Counter
result = Counter("123245")
print(result)
Output:
Counter({'2': 2, '1': 1, '3': 1, '4': 1, '5': 1})
You can also use:
dict(result)
# {'1': 1, '2': 2, '3': 1, '4': 1, '5': 1}
How can I turn a list of dicts like this
dico = [{'a':1}, {'b':2}, {'c':1}, {'d':2}, {'e':2}, {'d':3}, {'g':1}, {'h':4}, {'h':2}, {'f':6}, {'a':2}, {'b':2}]
Into a single dict like this
{'a':3, 'b':4, 'c':1, 'd':5,'e':2,'f':6 , 'g':1 ,'h':6}
At the moment when doing this
result = {}
for d in dico:
result.update(d)
print(result)
Result :
{'a': 2, 'b': 2, 'c': 1, 'd': 3, 'e': 2, 'g': 1, 'h': 2, 'f': 6}
Just replace your dictionary with collections.Counter and it will work:
from collections import Counter
dico = [{'a':1}, {'b':2}, {'c':1}, {'d':2}, {'e':2}, {'d':3}, {'g':1}, {'h':4}, {'h':2}, {'f':6}, {'a':2}, {'b':2}]
result = Counter()
for d in dico:
result.update(d)
print(result)
Output:
Counter({'h': 6, 'f': 6, 'd': 5, 'b': 4, 'a': 3, 'e': 2, 'c': 1, 'g': 1})
Why the above works with update for Counter from the docs:
Elements are counted from an iterable or added-in from another mapping (or counter). Like dict.update() but adds counts instead of replacing them. Also, the iterable is expected to be a sequence of elements, not a sequence of (key, value) pairs.
Here's a fancy way to do it using collections.Counter, which is a kind of dictionary:
from collections import Counter
def add_dicts(dicts):
return sum(map(Counter, dicts), Counter())
The above is not efficient for a large number of dictionaries since it creates many intermediate Counter objects for the result, rather than updating one result in-place, so it runs in quadratic time. Here's a similar solution which runs in linear time:
from collections import Counter
def add_dicts(dicts):
out = Counter()
for d in dicts:
out += d
return out
Using a defaultdict:
from collections import defaultdict
dct = defaultdict(int)
for element in dico:
for key, value in element.items():
dct[key] += value
print(dct)
Which yields
defaultdict(<class 'int'>,
{'a': 3, 'b': 4, 'c': 1, 'd': 5, 'e': 2, 'g': 1, 'h': 6, 'f': 6})
As for time measurements, this is a comparison between the four answers:
from collections import defaultdict, Counter
from timeit import timeit
def solution_dani():
result = sum((Counter(e) for e in dico), Counter())
def solution_kaya():
return sum(map(Counter, dico), Counter())
def solution_roadrunner():
result = Counter()
for d in dico:
result.update(d)
return result
def solution_jan():
dct = defaultdict(int)
for element in dico:
for key, value in element.items():
dct[key] += value
return dct
print(timeit(solution_dani, number=10000))
print(timeit(solution_kaya, number=10000))
print(timeit(solution_roadrunner, number=10000))
print(timeit(solution_jan, number=10000))
On my MacBookAir this yields
0.839742998
0.8093687279999999
0.18643740100000006
0.04764247300000002
So the solution with a default dict is by far the fastest (factor 15-20), followed by #RoadRunner.
Use collections.Counter and sum:
from collections import Counter
dico = [{'a':1}, {'b':2}, {'c':1}, {'d':2}, {'e':2}, {'d':3}, {'g':1}, {'h':4}, {'h':2}, {'f':6}, {'a':2}, {'b':2}]
result = sum((Counter(e) for e in dico), Counter())
print(result)
Output
Counter({'h': 6, 'f': 6, 'd': 5, 'b': 4, 'a': 3, 'e': 2, 'c': 1, 'g': 1})
If you need an strict dictionary do:
result = dict(sum((Counter(e) for e in dico), Counter()))
print(result)
You could modify your approach, like this:
result = {}
for d in dico:
for key, value in d.items():
result[key] = result.get(key, 0) + value
print(result)
The update method will replace the values of existing keys, from the documentation:
Update the dictionary with the key/value pairs from other, overwriting
existing keys.
import collections
counter = collections.Counter()
for d in dico:
counter.update(d)
result = dict(counter)
print(result)
Output
{'a': 3, 'b': 4, 'c': 1, 'd': 5, 'e': 2, 'g': 1, 'h': 6, 'f': 6}
If I have a dictionary with their corresponding frequency values:
numbers = {'a': 1, 'b': 4, 'c': 1, 'd': 3, 'e': 3}
To find the highest, what I know is:
mode = max(numbers, key=numbers.get)
print mode
and that prints:
b
But if I have:
numbers = {'a': 1, 'b': 0, 'c': 1, 'd': 3, 'e': 3}
and apply the 'max' function above, the output is:
d
What I need is:
d,e
Or something similar, displaying both keys.
numbers = {'a': 1, 'b': 0, 'c': 1, 'd': 3, 'e': 3}
max_value = max(numbers.values())
[k for k,v in numbers.items() if v == max_value]
prints
['e', 'd']
what it does is, loop over all entries via .items and then check if the value is the maximum and if so add the key to a list.
numbers = {'a': 1, 'b': 4, 'c': 1, 'd':4 , 'e': 3}
mx_tuple = max(numbers.items(),key = lambda x:x[1]) #max function will return a (key,value) tuple of the maximum value from the dictionary
max_list =[i[0] for i in numbers.items() if i[1]==mx_tuple[1]] #my_tuple[1] indicates maximum dictionary items value
print(max_list)
This code will work in O(n). O(n) in finding maximum value and O(n) in the list comprehension. So overall it will remain O(n).
Note : O(2n) is equivalent to O(n).
The collections.Counter object is useful for this as well. It gives you a .most_common() method which will given you the keys and counts of all available values:
from collections import Counter
numbers = Counter({'a': 1, 'b': 0, 'c': 1, 'd': 3, 'e': 3})
values = list(numbers.values())
max_value = max(values)
count = values.count(max_value)
numbers.most_common(n=count)
You can use the .items() property and sort after a tuple of count, key - on similar counts the key will decide:
d = ['a','b','c','b','c','d','c','d','e','d','b']
from collections import Counter
get_data = Counter(d)
# sort by count, then key
maxmax = sorted(get_data.items(), key=lambda a: (a[1],a[0]) )
for elem in maxmax:
if elem[1] == maxmax[0][1]:
print (elem)
Output:
('a', 1)
('e', 1) # the last one is the one with "highest" key
To get the "highest" key, use maxmax[-1].
To my understanding, I know when I invoke Counter to covert dict. This dict includes value of keys is zero will disappear.
from collections import Counter
a = {"a": 1, "b": 5, "d": 0}
b = {"b": 1, "c": 2}
print Counter(a) + Counter(b)
If I want to keep my keys, how to do?
This is my expected result:
Counter({'b': 6, 'c': 2, 'a': 1, 'd': 0})
You can also use the update() method of Counter instead of + operator, example -
>>> a = {"a": 1, "b": 5, "d": 0}
>>> b = {"b": 1, "c": 2}
>>> x = Counter(a)
>>> x.update(Counter(b))
>>> x
Counter({'b': 6, 'c': 2, 'a': 1, 'd': 0})
update() function adds counts instead of replacing them , and it does not remove the zero value one either. We can also do Counter(b) first, then update with Counter(a), Example -
>>> y = Counter(b)
>>> y.update(Counter(a))
>>> y
Counter({'b': 6, 'c': 2, 'a': 1, 'd': 0})
Unfortunately, when summing two counter, only elements with a positive count are used.
If you want to keep the elements with a count of zero, you could define a function like this:
def addall(a, b):
c = Counter(a) # copy the counter a, preserving the zero elements
for x in b: # for each key in the other counter
c[x] += b[x] # add the value in the other counter to the first
return c
You can just subclass Counter and adjust its __add__ method:
from collections import Counter
class MyCounter(Counter):
def __add__(self, other):
"""Add counts from two counters.
Preserves counts with zero values.
>>> MyCounter('abbb') + MyCounter('bcc')
MyCounter({'b': 4, 'c': 2, 'a': 1})
>>> MyCounter({'a': 1, 'b': 0}) + MyCounter({'a': 2, 'c': 3})
MyCounter({'a': 3, 'c': 3, 'b': 0})
"""
if not isinstance(other, Counter):
return NotImplemented
result = MyCounter()
for elem, count in self.items():
newcount = count + other[elem]
result[elem] = newcount
for elem, count in other.items():
if elem not in self:
result[elem] = count
return result
counter1 = MyCounter({'a': 1, 'b': 0})
counter2 = MyCounter({'a': 2, 'c': 3})
print(counter1 + counter2) # MyCounter({'a': 3, 'c': 3, 'b': 0})
I help Anand S Kumar to do more a additional explanation.
Even though your dict includes negative value, it still keep your keys.
from collections import Counter
a = {"a": 1, "b": 5, "d": -1}
b = {"b": 1, "c": 2}
print Counter(a) + Counter(b)
#Counter({'b': 6, 'c': 2, 'a': 1})
x = Counter(a)
x.update(Counter(b))
print x
#Counter({'b': 6, 'c': 2, 'a': 1, 'd': -1})
If I have a dictionary with their corresponding frequency values:
numbers = {'a': 1, 'b': 4, 'c': 1, 'd': 3, 'e': 3}
To find the highest, what I know is:
mode = max(numbers, key=numbers.get)
print mode
and that prints:
b
But if I have:
numbers = {'a': 1, 'b': 0, 'c': 1, 'd': 3, 'e': 3}
and apply the 'max' function above, the output is:
d
What I need is:
d,e
Or something similar, displaying both keys.
numbers = {'a': 1, 'b': 0, 'c': 1, 'd': 3, 'e': 3}
max_value = max(numbers.values())
[k for k,v in numbers.items() if v == max_value]
prints
['e', 'd']
what it does is, loop over all entries via .items and then check if the value is the maximum and if so add the key to a list.
numbers = {'a': 1, 'b': 4, 'c': 1, 'd':4 , 'e': 3}
mx_tuple = max(numbers.items(),key = lambda x:x[1]) #max function will return a (key,value) tuple of the maximum value from the dictionary
max_list =[i[0] for i in numbers.items() if i[1]==mx_tuple[1]] #my_tuple[1] indicates maximum dictionary items value
print(max_list)
This code will work in O(n). O(n) in finding maximum value and O(n) in the list comprehension. So overall it will remain O(n).
Note : O(2n) is equivalent to O(n).
The collections.Counter object is useful for this as well. It gives you a .most_common() method which will given you the keys and counts of all available values:
from collections import Counter
numbers = Counter({'a': 1, 'b': 0, 'c': 1, 'd': 3, 'e': 3})
values = list(numbers.values())
max_value = max(values)
count = values.count(max_value)
numbers.most_common(n=count)
You can use the .items() property and sort after a tuple of count, key - on similar counts the key will decide:
d = ['a','b','c','b','c','d','c','d','e','d','b']
from collections import Counter
get_data = Counter(d)
# sort by count, then key
maxmax = sorted(get_data.items(), key=lambda a: (a[1],a[0]) )
for elem in maxmax:
if elem[1] == maxmax[0][1]:
print (elem)
Output:
('a', 1)
('e', 1) # the last one is the one with "highest" key
To get the "highest" key, use maxmax[-1].