Python dictionary counting - python

I have a dictionary
a={}
a['A']=1
a['B']=1
a['C']=2
I need to output the following
1 has occurred 2 times
2 has occurred 1 times
What is the best way to do this.

This is easily (and efficiently) done with collections.Counter(), which is designed to (unsurprisingly) count things:
>>> import collections
>>> a = {"A": 1, "B": 1, "C": 2}
>>> collections.Counter(a.values())
Counter({1: 2, 2: 1})
This gives you a dictionary-like object that can trivially be used to generate the output you want.

Use the Counter class:
from collections import Counter
a = {}
a["A"] = 1
a["B"] = 1
a["C"] = 2
c = Counter(a.values())
c
=> Counter({1: 2, 2: 1})
From the documentation:
A Counter is a dict subclass for counting hashable objects. It is an unordered collection where elements are stored as dictionary keys and their counts are stored as dictionary values. Counts are allowed to be any integer value including zero or negative counts. The Counter class is similar to bags or multisets in other languages.

Related

Bast way to update a value in a dict

I have a dict that contains parameters. I want to consider that any unspecified parameter should be read as a zero. I see several ways of doing it and wonder which one is recommended:
parameters = {
"apples": 2
}
def gain_fruits0 (quantity, fruit):
if not fruit in parameters :
parameters[fruit] = 0
parameters[fruit] += quantity
def gain_fruits1 (quantity, fruits):
parameters[fruit] = quantity + parameters.get(fruit,0)
parameters is actually way bigger than that, if that is important to know for optimization purposes.
So, what would be the best way? gain_fruits0, gain_fruits1, or something else?
This is a typical use of defaultdict, which works exactly like a regular dictionary except that it has the functionality you're after built in:
>>> from collections import defaultdict
>>> d = defaultdict(int) # specify default value `int()`, which is 0
>>> d['apples'] += 1
>>> d
defaultdict(int, {'apples': 1})
>>> d['apples'] # can index normally
1
>>> d['oranges'] # missing keys map to the default value
0
>>> dict(d) # can also cast to regular dict
{'apples': 1, 'oranges': 0}

Is it possible to call an iterator in a defaultdict to get the next key-value?

Python 3.6.x
I've got a defaultdict, which is named xref_to_records. It has got strings as keys, and lists as values.
for k, v in xref_to_records.items():
print(type(k))
print(type(xref_to_records[k]))
break
It produces:
<class 'str'>
<class 'list'>
What I'm trying to do is to iterate through its items to compare the values list of a key against the next one. I know this question was probably already answered somewhere, but I couldn't figure to make work any of the provided approaches.
I've tried to iterate through the lenght of keys, but it doesn't work.
keys = xref_to_records.keys()
for i in range(len(keys)):
this_key = keys[i]
It raises me a TypeError:
TypeError: 'dict_keys' object does not support indexing
I've tried also to iterate through keys using next() but unsuccessfully.
frick = None
for k,v in iter(xref_to_records.items()):
if k != frick:
res = next(k, None)
print(res)
break
Again a TypeError :
TypeError: 'str' object is not an iterator
Expected output
for k, v in xref_to_records.items():
somefunctions(k)
somefunctions(next(k))
Strictly speaking, a dictionary does not have a specific order. However, in never versions of Python, the items in a dict should be iterated in the original order of insertion, and since defaultdict is a subclass of dict, the same should hold there, too.
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> for i in range(4): d[i] = list(range(i+1))
>>> d[10] = []
>>> d[4] = []
>>> d
defaultdict(list, {0: [0], 1: [0, 1], 2: [0, 1, 2], 3: [0, 1, 2, 3], 4: [], 10: []})
>>> list(d)
[0, 1, 2, 3, 10, 4]
You can then get two iter of the dictionary, advance one of those once using next, and zip them to get pairs of (current, next) elements. (Here, it's just pairs of the keys, but of course you can just get the corresponding values from the dictionary.)
>>> from itertools import tee
>>> it1, it2 = tee(iter(d))
>>> next(it2)
>>> for a, b in zip(it1, it2):
... print(a, b)
...
0 1
1 2
2 3
3 10
10 4
For older versions of Python, you might have to use a collections.OrderedDict instead. If, instead, you do not want insertion order but e.g. lexicographic ordering, you can just get the iter from the sorted keys.
see below
from collections import defaultdict
xref_to_records = defaultdict(list)
xref_to_records['A'].append(9)
xref_to_records['A'].append(12)
xref_to_records['B'].append(99)
xref_to_records['B'].append(112)
xref_to_records['C'].append(99.34)
xref_to_records['C'].append(112.88)
xref_to_records['C'].append(4112.88)
keys = list(xref_to_records.keys())
for idx,key in enumerate(keys):
if idx > 0:
print('compare:')
print('current:' + str(xref_to_records[key]))
print('previous: ' + str(xref_to_records[keys[idx-1]]))
print('')
output
compare:
current:[99, 112]
previous: [9, 12]
compare:
current:[99.34, 112.88, 4112.88]
previous: [99, 112]

Find count of characters within the string in Python

I am trying to create a dictionary of word and number of times it is repeating in string. Say suppose if string is like below
str1 = "aabbaba"
I want to create a dictionary like this
word_count = {'a':4,'b':3}
I am trying to use dictionary comprehension to do this.
I did
dic = {x:dic[x]+1 if x in dic.keys() else x:1 for x in str}
This ends up giving an error saying
File "<stdin>", line 1
dic = {x:dic[x]+1 if x in dic.keys() else x:1 for x in str}
^
SyntaxError: invalid syntax
Can anybody tell me what's wrong with the syntax? Also,How can I create such a dictionary using dictionary comprehension?
As others have said, this is best done with a Counter.
You can also do:
>>> {e:str1.count(e) for e in set(str1)}
{'a': 4, 'b': 3}
But that traverses the string 1+n times for each unique character (once to create the set, and once for each unique letter to count the number of times it appears. i.e., This has quadratic runtime complexity.). Bad result if you have a lot of unique characters in a long string... A Counter only traverses the string once.
If you want no import version that is more efficient than using .count, you can use .setdefault to make a counter:
>>> count={}
>>> for c in str1:
... count[c]=count.setdefault(c, 0)+1
...
>>> count
{'a': 4, 'b': 3}
That only traverses the string once no matter how long or how many unique characters.
You can also use defaultdict if you prefer:
>>> from collections import defaultdict
>>> count=defaultdict(int)
>>> for c in str1:
... count[c]+=1
...
>>> count
defaultdict(<type 'int'>, {'a': 4, 'b': 3})
>>> dict(count)
{'a': 4, 'b': 3}
But if you are going to import collections -- Use a Counter!
Ideal way to do this is via using collections.Counter:
>>> from collections import Counter
>>> str1 = "aabbaba"
>>> Counter(str1)
Counter({'a': 4, 'b': 3})
You can not achieve this via simple dict comprehension expression as you will require reference to your previous value of count of element. As mentioned in Dawg's answer, as a work around you may use list.count(e) in order to find count of each element from the set of string within you dict comprehension expression. But time complexity will be n*m as it will traverse the complete string for each unique element (where m are uniques elements), where as with counter it will be n.
This is a nice case for collections.Counter:
>>> from collections import Counter
>>> Counter(str1)
Counter({'a': 4, 'b': 3})
It's dict subclass so you can work with the object similarly to standard dictionary:
>>> c = Counter(str1)
>>> c['a']
4
You can do this without use of Counter class as well. The simple and efficient python code for this would be:
>>> d = {}
>>> for x in str1:
... d[x] = d.get(x, 0) + 1
...
>>> d
{'a': 4, 'b': 3}
Note that this is not the correct way to do it since it won't count repeated characters more than once (apart from losing other characters from the original dict) but this answers the original question of whether if-else is possible in comprehensions and demonstrates how it can be done.
To answer your question, yes it's possible but the approach is like this:
dic = {x: (dic[x] + 1 if x in dic else 1) for x in str1}
The condition is applied on the value only not on the key:value mapping.
The above can be made clearer using dict.get:
dic = {x: dic.get(x, 0) + 1 for x in str1}
0 is returned if x is not in dic.
Demo:
In [78]: s = "abcde"
In [79]: dic = {}
In [80]: dic = {x: (dic[x] + 1 if x in dic else 1) for x in s}
In [81]: dic
Out[81]: {'a': 1, 'b': 1, 'c': 1, 'd': 1, 'e': 1}
In [82]: s = "abfg"
In [83]: dic = {x: dic.get(x, 0) + 1 for x in s}
In [84]: dic
Out[84]: {'a': 2, 'b': 2, 'f': 1, 'g': 1}

How to only store 3 values for a key in a dictionary? Python

So I tried to only allow the program to store only last 3 scores(values) for each key(name) however I experienced a problem of the program only storing the 3 scores and then not updating the last 3 or the program appending more values then it should do.
The code I have so far:
#appends values if a key already exists
while tries < 3:
d.setdefault(name, []).append(scores)
tries = tries + 1
Though I could not fully understand your question, the concept that I derive from it is that, you want to store only the last three scores in the list. That is a simple task.
d.setdefault(name,[]).append(scores)
if len(d[name])>3:
del d[name][0]
This code will check if the length of the list exceeds 3 for every addition. If it exceeds, then the first element (Which is added before the last three elements) is deleted
Use a collections.defaultdict + collections.deque with a max length set to 3:
from collections import deque,defaultdict
d = defaultdict(lambda: deque(maxlen=3))
Then d[name].append(score), if the key does not exist the key/value will be created, if it does exist we will just append.
deleting an element from the start of a list is an inefficient solution.
Demo:
from random import randint
for _ in range(10):
for name in range(4):
d[name].append(randint(1,10))
print(d)
defaultdict(<function <lambda> at 0x7f06432906a8>, {0: deque([9, 1, 1], maxlen=3), 1: deque([5, 5, 8], maxlen=3), 2: deque([5, 1, 3], maxlen=3), 3: deque([10, 6, 10], maxlen=3)})
One good way for keeping the last N items in python is using deque with maxlen N, so in this case you can use defaultdict and deque functions from collections module.
example :
>>> from collections import defaultdict ,deque
>>> l=[1,2,3,4,5]
>>> d=defaultdict()
>>> d['q']=deque(maxlen=3)
>>> for i in l:
... d['q'].append(i)
...
>>> d
defaultdict(<type 'collections.deque'>, {'q': deque([3, 4, 5], maxlen=3)})
A slight variation on another answer in case you want to extend the list in the entry name
d.setdefault(name,[]).extend(scores)
if len(d[name])>3:
del d[name][:-3]
from collections import defaultdict
d = defaultdict(lambda:[])
d[key].append(val)
d[key] = d[key][:3]
len(d[key])>2 or d[key].append(value) # one string solution

How to iterate over the elements of a map in python

Given a string s, I want to know how many times each character at the string occurs. Here is the code:
def main() :
while True :
try :
line=raw_input('Enter a string: ')
except EOFError :
break;
mp={};
for i in range(len(line)) :
if line[i] in mp :
mp[line[i]] += 1;
else :
mp[line[i]] = 1;
for i in range(len(line)) :
print line[i],': ',mp[line[i]];
if __name__ == '__main__' :
main();
When I run this code and I enter abbba, I get:
a : 2
b : 3
b : 3
b : 3
a : 2
I would like to get only:
a : 2
b : 3
I understand why this is happening, but as I'm new to python, I don't know any other ways to iterate over the elements of a map. Could anyone tell me how to do this? Thanks in advance.
You could try a Counter (Python 2.7 and above; see below for a pre-2.7 option):
>>> from collections import Counter
>>> Counter('abbba')
Counter({'b': 3, 'a': 2})
You can then access the elements just like a dictionary:
>>> counts = Counter('abbba')
>>> counts['a']
2
>>> counts['b']
3
And to iterate, you can use #BurhanKhalid's suggestion (the Counter behaves as a dictionary, where you can iterate over the key/value pairs):
>>> for k, v in Counter('abbba').iteritems():
... print k, v
...
a 2
b 3
If you're using a pre-2.7 version of Python, you can use a defaultdict to simplify your code a bit (process is still the same - only difference is that now you don't have to check for the key first - it will 'default' to 0 if a matching key isn't found). Counter has other features built into it, but if you simply want counts (and don't care about most_common, or being able to subtract, for instance), this should be fine and can be treated just as any other dictionary:
>>> from collections import defaultdict
>>> counts = defaultdict(int)
>>> for c in 'abbba':
... counts[c] += 1
...
>>> counts
defaultdict(<type 'int'>, {'a': 2, 'b': 3})
When you use iteritems() on a dictionary (or the Counter/defaultdict here), a key and a value are returned for each iteration (in this case, the key being the letter and the value being the number of occurrences). One thing to note about using dictionaries is that they are inherently unordered, so you won't necessarily get 'a', 'b', ... while iterating. One basic way to iterate through a dictionary in a sorted manner would be to iterate through a sorted list of the keys (here alphabetical, but sorted can be manipulated to handle a variety of options), and return the dictionary value for that key (there are other ways, but this will hopefully be somewhat informative):
>>> mapping = {'some': 2, 'example': 3, 'words': 5}
>>> mapping
{'some': 2, 'example': 3, 'words': 5}
>>> for key in sorted(mapping.keys()):
... print key, mapping[key]
...
example 3
some 2
words 5
Iterating over a mapping yields keys.
>>> d = {'foo': 42, 'bar': 'quux'}
>>> for k in d:
... print k, d[k]
...
foo 42
bar quux
You need to look up the help for dict(). It's all there -- 'for k in mp' iterates over keys, 'for v in mp.values()' iterates over values, 'for k,v in mp.items()' iterates over key, value pairs.
Also, you don't need those semicolons. While they are legal in Python, nobody uses them, there's pretty much no reason to.
Python 2.5 and above
dDIct = collections.defaultdict(int)
[(d[i]+=1) for i in line]
print dDict

Categories

Resources