Given an iterable (like a string or list or something), is there a clean, O(n) way to make a dictionary that maps elements to their frequency using 1 line? I don't want to use any external libraries or modules
The code should have the same functionality as the following snippet:
s = 'abcaba'
freq = {}
for i in s:
if i not in freq:
freq[i] = 1
else:
freq[i] += 1
### and now, freq = {'a':3, 'b':2, 'c':1}
This is O(n), but it's a couple of lines. I can also do this:
s = 'abcaba'
freq = {i: s.count(i) for i in s}
### same thing, now freq = {'a':3, 'b':2, 'c':1}
This is 1 line, but it's O(n2), since count is O(n) and you also have a loop.
There's probably an easy solution to this that I'm not thinking of. I apologize if this is a duplicate.
In [212]: s = 'abcaba'
In [213]: collections.Counter(s)
Out[213]: Counter({'a': 3, 'b': 2, 'c': 1})
Here's another approach (though not exactly a one-liner):
In [214]: freq = {}
In [215]: for char in s: freq[char] = freq.get(char, 0)+1
In [216]: freq
Out[216]: {'a': 3, 'b': 2, 'c': 1}
>>> from collections import Counter
>>> freq = Counter('abcab')
>>> freq['a']
2
from collections import Counter
result = Counter("123245")
print(result)
Output:
Counter({'2': 2, '1': 1, '3': 1, '4': 1, '5': 1})
You can also use:
dict(result)
# {'1': 1, '2': 2, '3': 1, '4': 1, '5': 1}
Related
How can I turn a list of dicts like this
dico = [{'a':1}, {'b':2}, {'c':1}, {'d':2}, {'e':2}, {'d':3}, {'g':1}, {'h':4}, {'h':2}, {'f':6}, {'a':2}, {'b':2}]
Into a single dict like this
{'a':3, 'b':4, 'c':1, 'd':5,'e':2,'f':6 , 'g':1 ,'h':6}
At the moment when doing this
result = {}
for d in dico:
result.update(d)
print(result)
Result :
{'a': 2, 'b': 2, 'c': 1, 'd': 3, 'e': 2, 'g': 1, 'h': 2, 'f': 6}
Just replace your dictionary with collections.Counter and it will work:
from collections import Counter
dico = [{'a':1}, {'b':2}, {'c':1}, {'d':2}, {'e':2}, {'d':3}, {'g':1}, {'h':4}, {'h':2}, {'f':6}, {'a':2}, {'b':2}]
result = Counter()
for d in dico:
result.update(d)
print(result)
Output:
Counter({'h': 6, 'f': 6, 'd': 5, 'b': 4, 'a': 3, 'e': 2, 'c': 1, 'g': 1})
Why the above works with update for Counter from the docs:
Elements are counted from an iterable or added-in from another mapping (or counter). Like dict.update() but adds counts instead of replacing them. Also, the iterable is expected to be a sequence of elements, not a sequence of (key, value) pairs.
Here's a fancy way to do it using collections.Counter, which is a kind of dictionary:
from collections import Counter
def add_dicts(dicts):
return sum(map(Counter, dicts), Counter())
The above is not efficient for a large number of dictionaries since it creates many intermediate Counter objects for the result, rather than updating one result in-place, so it runs in quadratic time. Here's a similar solution which runs in linear time:
from collections import Counter
def add_dicts(dicts):
out = Counter()
for d in dicts:
out += d
return out
Using a defaultdict:
from collections import defaultdict
dct = defaultdict(int)
for element in dico:
for key, value in element.items():
dct[key] += value
print(dct)
Which yields
defaultdict(<class 'int'>,
{'a': 3, 'b': 4, 'c': 1, 'd': 5, 'e': 2, 'g': 1, 'h': 6, 'f': 6})
As for time measurements, this is a comparison between the four answers:
from collections import defaultdict, Counter
from timeit import timeit
def solution_dani():
result = sum((Counter(e) for e in dico), Counter())
def solution_kaya():
return sum(map(Counter, dico), Counter())
def solution_roadrunner():
result = Counter()
for d in dico:
result.update(d)
return result
def solution_jan():
dct = defaultdict(int)
for element in dico:
for key, value in element.items():
dct[key] += value
return dct
print(timeit(solution_dani, number=10000))
print(timeit(solution_kaya, number=10000))
print(timeit(solution_roadrunner, number=10000))
print(timeit(solution_jan, number=10000))
On my MacBookAir this yields
0.839742998
0.8093687279999999
0.18643740100000006
0.04764247300000002
So the solution with a default dict is by far the fastest (factor 15-20), followed by #RoadRunner.
Use collections.Counter and sum:
from collections import Counter
dico = [{'a':1}, {'b':2}, {'c':1}, {'d':2}, {'e':2}, {'d':3}, {'g':1}, {'h':4}, {'h':2}, {'f':6}, {'a':2}, {'b':2}]
result = sum((Counter(e) for e in dico), Counter())
print(result)
Output
Counter({'h': 6, 'f': 6, 'd': 5, 'b': 4, 'a': 3, 'e': 2, 'c': 1, 'g': 1})
If you need an strict dictionary do:
result = dict(sum((Counter(e) for e in dico), Counter()))
print(result)
You could modify your approach, like this:
result = {}
for d in dico:
for key, value in d.items():
result[key] = result.get(key, 0) + value
print(result)
The update method will replace the values of existing keys, from the documentation:
Update the dictionary with the key/value pairs from other, overwriting
existing keys.
import collections
counter = collections.Counter()
for d in dico:
counter.update(d)
result = dict(counter)
print(result)
Output
{'a': 3, 'b': 4, 'c': 1, 'd': 5, 'e': 2, 'g': 1, 'h': 6, 'f': 6}
I want to count occurrence of all letters in a word using dictionary. So far I've tried adding to dict in for loop.
I wonder is it possible to use dictionary comprehensions?
word = "aabcd"
occurrence = {}
for l in word.lower():
if l in occurrence:
occurrence[l] += 1
else:
occurrence[l] = 1
Sure it is possible.
Use a Counter.
from collections import Counter
c = Counter(word)
print(c)
Counter({'a': 2, 'b': 1, 'c': 1, 'd': 1})
Another solution using defaultdict.
from collections import defaultdict
occurrence = defaultdict(int)
for c in word.lower():
occurrence[c] += 1
print(occurrence)
defaultdict(<class 'int'>, {'a': 2, 'b': 1, 'c': 1, 'd': 1})
Or another one without using any imports.
occurrence = {}
for c in word.lower():
occurrence[c] = occurrence.get(c,0) + 1
print(occurrence)
{'a': 2, 'b': 1, 'c': 1, 'd': 1}
I want to calculate number of friends for each person given a relationship graph without using any libraries. The graph is represented as lists of lists:
graph = [[A,B],[A,C],[C,B],[B,D],[E]]
Expected dictionary output: {'A':2, 'B':3, 'C':2, 'D':1, 'E':0}
Note: Since E has no friends, E should be 0
Straightforward solution without changing input format
>>> graph = [['A', 'B'], ['A', 'C'],['C', 'B'], ['B', 'D'], ['E']]
>>> from collections import defaultdict
>>> friends_counter = defaultdict(int)
>>> for friends in graph:
... for person in friends:
... friends_counter[person] += len(friends) - 1
>>> dict(friends_counter)
{'A': 2, 'B': 3, 'C': 2, 'D': 1, 'E': 0}
You could use a python library specific to graphs called NetworkX. I changed the data to be easier to load.
import networkx as nx
graph = [['A','B'],['A','C'],['C','B'],['B','D']]
G = nx.Graph()
G.add_edges_from(graph)
G.add_node('E')
dict(G.degree)
# {'A': 2, 'B': 3, 'C': 2, 'D': 1, 'E': 0}
Edit: this answer was given before the "without using any libraries" caveat was added.
Got the solution. Is there a better way to do this?
graph = [['A','B'],['A','C'],['C','B'],['B','D'],['E']]
dct ={}
for v in graph:
for x in v:
if x in v:
if x in dct.keys():
dct[x] += 1
else:
dct[x]= len(v)-1
print(dct)
{'A': 2, 'B': 3, 'C': 2, 'D': 1, 'E': 0}
so you can do like this
graph = [["A","B"],["A","C"],["C","B"],["B","D"],["E"]]
ans = {}
for n in graph:
if len(n) == 1:
ans[n[0]] = ans.get(n[0], 0)
else:
l, r = n
ans[l] = ans.get(l, 0) + 1
ans[r] = ans.get(r, 0) + 1
print(ans)
# {'A': 2, 'B': 3, 'C': 2, 'D': 1, 'E': 0}
If I have a dictionary with their corresponding frequency values:
numbers = {'a': 1, 'b': 4, 'c': 1, 'd': 3, 'e': 3}
To find the highest, what I know is:
mode = max(numbers, key=numbers.get)
print mode
and that prints:
b
But if I have:
numbers = {'a': 1, 'b': 0, 'c': 1, 'd': 3, 'e': 3}
and apply the 'max' function above, the output is:
d
What I need is:
d,e
Or something similar, displaying both keys.
numbers = {'a': 1, 'b': 0, 'c': 1, 'd': 3, 'e': 3}
max_value = max(numbers.values())
[k for k,v in numbers.items() if v == max_value]
prints
['e', 'd']
what it does is, loop over all entries via .items and then check if the value is the maximum and if so add the key to a list.
numbers = {'a': 1, 'b': 4, 'c': 1, 'd':4 , 'e': 3}
mx_tuple = max(numbers.items(),key = lambda x:x[1]) #max function will return a (key,value) tuple of the maximum value from the dictionary
max_list =[i[0] for i in numbers.items() if i[1]==mx_tuple[1]] #my_tuple[1] indicates maximum dictionary items value
print(max_list)
This code will work in O(n). O(n) in finding maximum value and O(n) in the list comprehension. So overall it will remain O(n).
Note : O(2n) is equivalent to O(n).
The collections.Counter object is useful for this as well. It gives you a .most_common() method which will given you the keys and counts of all available values:
from collections import Counter
numbers = Counter({'a': 1, 'b': 0, 'c': 1, 'd': 3, 'e': 3})
values = list(numbers.values())
max_value = max(values)
count = values.count(max_value)
numbers.most_common(n=count)
You can use the .items() property and sort after a tuple of count, key - on similar counts the key will decide:
d = ['a','b','c','b','c','d','c','d','e','d','b']
from collections import Counter
get_data = Counter(d)
# sort by count, then key
maxmax = sorted(get_data.items(), key=lambda a: (a[1],a[0]) )
for elem in maxmax:
if elem[1] == maxmax[0][1]:
print (elem)
Output:
('a', 1)
('e', 1) # the last one is the one with "highest" key
To get the "highest" key, use maxmax[-1].
If I have a dictionary with their corresponding frequency values:
numbers = {'a': 1, 'b': 4, 'c': 1, 'd': 3, 'e': 3}
To find the highest, what I know is:
mode = max(numbers, key=numbers.get)
print mode
and that prints:
b
But if I have:
numbers = {'a': 1, 'b': 0, 'c': 1, 'd': 3, 'e': 3}
and apply the 'max' function above, the output is:
d
What I need is:
d,e
Or something similar, displaying both keys.
numbers = {'a': 1, 'b': 0, 'c': 1, 'd': 3, 'e': 3}
max_value = max(numbers.values())
[k for k,v in numbers.items() if v == max_value]
prints
['e', 'd']
what it does is, loop over all entries via .items and then check if the value is the maximum and if so add the key to a list.
numbers = {'a': 1, 'b': 4, 'c': 1, 'd':4 , 'e': 3}
mx_tuple = max(numbers.items(),key = lambda x:x[1]) #max function will return a (key,value) tuple of the maximum value from the dictionary
max_list =[i[0] for i in numbers.items() if i[1]==mx_tuple[1]] #my_tuple[1] indicates maximum dictionary items value
print(max_list)
This code will work in O(n). O(n) in finding maximum value and O(n) in the list comprehension. So overall it will remain O(n).
Note : O(2n) is equivalent to O(n).
The collections.Counter object is useful for this as well. It gives you a .most_common() method which will given you the keys and counts of all available values:
from collections import Counter
numbers = Counter({'a': 1, 'b': 0, 'c': 1, 'd': 3, 'e': 3})
values = list(numbers.values())
max_value = max(values)
count = values.count(max_value)
numbers.most_common(n=count)
You can use the .items() property and sort after a tuple of count, key - on similar counts the key will decide:
d = ['a','b','c','b','c','d','c','d','e','d','b']
from collections import Counter
get_data = Counter(d)
# sort by count, then key
maxmax = sorted(get_data.items(), key=lambda a: (a[1],a[0]) )
for elem in maxmax:
if elem[1] == maxmax[0][1]:
print (elem)
Output:
('a', 1)
('e', 1) # the last one is the one with "highest" key
To get the "highest" key, use maxmax[-1].