Find count of characters within the string in Python - python

I am trying to create a dictionary of word and number of times it is repeating in string. Say suppose if string is like below
str1 = "aabbaba"
I want to create a dictionary like this
word_count = {'a':4,'b':3}
I am trying to use dictionary comprehension to do this.
I did
dic = {x:dic[x]+1 if x in dic.keys() else x:1 for x in str}
This ends up giving an error saying
File "<stdin>", line 1
dic = {x:dic[x]+1 if x in dic.keys() else x:1 for x in str}
^
SyntaxError: invalid syntax
Can anybody tell me what's wrong with the syntax? Also,How can I create such a dictionary using dictionary comprehension?

As others have said, this is best done with a Counter.
You can also do:
>>> {e:str1.count(e) for e in set(str1)}
{'a': 4, 'b': 3}
But that traverses the string 1+n times for each unique character (once to create the set, and once for each unique letter to count the number of times it appears. i.e., This has quadratic runtime complexity.). Bad result if you have a lot of unique characters in a long string... A Counter only traverses the string once.
If you want no import version that is more efficient than using .count, you can use .setdefault to make a counter:
>>> count={}
>>> for c in str1:
... count[c]=count.setdefault(c, 0)+1
...
>>> count
{'a': 4, 'b': 3}
That only traverses the string once no matter how long or how many unique characters.
You can also use defaultdict if you prefer:
>>> from collections import defaultdict
>>> count=defaultdict(int)
>>> for c in str1:
... count[c]+=1
...
>>> count
defaultdict(<type 'int'>, {'a': 4, 'b': 3})
>>> dict(count)
{'a': 4, 'b': 3}
But if you are going to import collections -- Use a Counter!

Ideal way to do this is via using collections.Counter:
>>> from collections import Counter
>>> str1 = "aabbaba"
>>> Counter(str1)
Counter({'a': 4, 'b': 3})
You can not achieve this via simple dict comprehension expression as you will require reference to your previous value of count of element. As mentioned in Dawg's answer, as a work around you may use list.count(e) in order to find count of each element from the set of string within you dict comprehension expression. But time complexity will be n*m as it will traverse the complete string for each unique element (where m are uniques elements), where as with counter it will be n.

This is a nice case for collections.Counter:
>>> from collections import Counter
>>> Counter(str1)
Counter({'a': 4, 'b': 3})
It's dict subclass so you can work with the object similarly to standard dictionary:
>>> c = Counter(str1)
>>> c['a']
4
You can do this without use of Counter class as well. The simple and efficient python code for this would be:
>>> d = {}
>>> for x in str1:
... d[x] = d.get(x, 0) + 1
...
>>> d
{'a': 4, 'b': 3}

Note that this is not the correct way to do it since it won't count repeated characters more than once (apart from losing other characters from the original dict) but this answers the original question of whether if-else is possible in comprehensions and demonstrates how it can be done.
To answer your question, yes it's possible but the approach is like this:
dic = {x: (dic[x] + 1 if x in dic else 1) for x in str1}
The condition is applied on the value only not on the key:value mapping.
The above can be made clearer using dict.get:
dic = {x: dic.get(x, 0) + 1 for x in str1}
0 is returned if x is not in dic.
Demo:
In [78]: s = "abcde"
In [79]: dic = {}
In [80]: dic = {x: (dic[x] + 1 if x in dic else 1) for x in s}
In [81]: dic
Out[81]: {'a': 1, 'b': 1, 'c': 1, 'd': 1, 'e': 1}
In [82]: s = "abfg"
In [83]: dic = {x: dic.get(x, 0) + 1 for x in s}
In [84]: dic
Out[84]: {'a': 2, 'b': 2, 'f': 1, 'g': 1}

Related

How to make function unique in python

Okay, so I have to make a function called unique. This is what it should do:
If the input is: s1 = [{1,2,3,4}, {3,4,5}]
unique(s1) should return: {1,2,5} because the 1, 2 and 5 are NOT in both lists.
And if the input is s2 = [{1,2,3,4}, {3,4,5}, {2,6}]
unique(s2) should return: {1,5,6} because those numbers are unique and are in only one list of this collection of 3 lists.
I tried to make something like this:
for x in s1:
if x not in unique_list:
unique_list.append(x)
else:
unique_list.remove(x)
print(unique_list)
But the problem with this is that it takes a whole list as "x" and not each element from each list.
Anyone that can help me a bit with this?
I am not allowed to import anything.
Python set() objects have a symmetric_difference() method to find elements in either, but not both sets. You can reduce your list with this to find the total elements unique to each set:
from functools import reduce
l = [{1,2,3,4}, {3,4,5}, {2,6}]
reduce(set.symmetric_difference, l)
# {1, 5, 6}
You can, of course do this without reduce by manually looping over the list. ^ will produce the symmetric_difference:
l = [{1,2,3,4}, {3,4,5}, {2,6}]
final = set()
for s in l:
final = final ^ s
print(final)
# {1, 5, 6}
In [13]: def f(sets):
...: c = {}
...: for s in sets:
...: for x in s:
...: c[x] = c.setdefault(x, 0) + 1
...: return {x for x, v in c.items() if v == 1}
...:
In [14]: f([{1,2}, {2, 3}, {3, 4}])
Out[14]: {1, 4}

Count how many times is a character repeated in a row in Python [duplicate]

This question already has answers here:
Why does range(start, end) not include end? [duplicate]
(11 answers)
Closed 4 years ago.
I'm currently trying to solve a problem of counting repeating characters in a row in Python.
This code works until it comes to the last different character in a string, and I have no idea how to solve this problem
def repeating(word):
count=1
tmp = ""
res = {}
for i in range(1, len(word)):
tmp += word[i - 1]
if word[i - 1] == word[i]:
count += 1
else :
res[tmp] = count
count = 1
tmp = ""
return res
word="aabc"
print (repeating(word))
The given output should be {'aa': 2, 'b': 1, 'c' : 1},
but I am getting {'aa': 2, 'b': 1}
How do I solve this?
In this case, you can use the collections.Counter which does all the work for you.
>>> from collections import Counter
>>> Counter('aabc')
Counter({'a': 2, 'c': 1, 'b': 1})
You can also iterator over the letters in string, since this is iterable. But then I would use the defaultdict from collections to save on the 'counting' part.
>>> from collections import defaultdict
>>>
>>> def repeating(word):
... res = defaultdict(int)
... for letter in word:
... res[letter] +=1
... return res
...
>>> word="aabc"
>>> print (repeating(word))
defaultdict(<type 'int'>, {'a': 2, 'c': 1, 'b': 1})
I would recommend using Counter from the collections module. It does exactly what you are trying to achieve
from collections import Counter
wourd = "aabc"
print(Counter(word))
# Counter({'a': 2, 'b': 1, 'c': 1})
But if you want to implement it yourself, I should know that str is an Iterable. Hence you are able to iterate over every letter with a simple loop.
Additionally, there is something called defaultdict, which comes quite handy in this scenario. Normally you have to check whether a key (in this case a letter) is already defined. If not you have to create that key. If you are using a defaultdict, you can define that every new key has a default value of something.
from collections import defaultdict
def repeating(word):
counter = defaultdict(int)
for letter in word:
counter[letter] += 1
return counter
The result would be similar:
In [6]: repeating('aabc')
Out[6]: defaultdict(int, {'a': 2, 'b': 1, 'c': 1})

Better way to write 'assign A or if not possible - B' [duplicate]

This question already has answers here:
Check if a given key already exists in a dictionary and increment it
(12 answers)
Closed 6 years ago.
So, in my code I have a dictionary I use to count up items I have no prior knowledge of:
if a_thing not in my_dict:
my_dict[a_thing] = 0
else:
my_dict[a_thing] += 1
Obviously, I can't increment an entry of a value that doesn't exist yet. For some reason I have a feeling (in my still-Python-inexperienced brain) there might exist a more Pythonic way to do this with, say, some construct which allows to assign a result of an expression to a thing and if not possible something else in a single statement.
So, does anything like that exist in Python?
This looks like a good job for defaultdict, from collections. Observe the example below:
>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> d['a'] += 1
>>> d
defaultdict(<class 'int'>, {'a': 1})
>>> d['b'] += 1
>>> d['a'] += 1
>>> d
defaultdict(<class 'int'>, {'b': 1, 'a': 2})
defaultdict will take a single parameter which indicates your initial value. In this case you are incrementing integer values, so you want int.
Alternatively, since you are counting items, you could also (as mentioned in comments) use Counter which will ultimately do all the work for you:
>>> d = Counter(['a', 'b', 'a', 'c', 'a', 'b', 'c'])
>>> d
Counter({'a': 3, 'c': 2, 'b': 2})
It also comes with some nice bonuses. Like most_common:
>>> d.most_common()
[('a', 3), ('c', 2), ('b', 2)]
Now you have an order to give you the most common counts.
using get method
>>> d = {}
>>> d['a'] = d.get('a', 0) + 1
>>> d
{'a': 1}
>>> d['b'] = d.get('b', 2) + 1
>>> d
{'b': 3, 'a': 1}

Count words without checking that a word is "in" dictionary

I understand that there are modules out there that can do this kind of behavior, but I'm interested in how to approach the following "issue".
Whenever I used to want to count occurrences I found it a bit silly I had to first check for whether or not a key is "in" the dictionary (#1). I believe at the time I even used a try...exception because I didn't know how to do it properly.
# 1
words = ['a', 'b', 'c', 'a', 'b']
dicty = {}
for w in words:
if w in dicty:
dicty[w] += 1
else:
dicty[w] = 1
At this moment, I'm interested in the question what has to be done to make a class "SpecialDictionary" behave such that if a word is not in a dictionary, it automatically gets a default 0 value (#2). Which concepts are needed for this question?
Note: I understand that this "in" check could be done in the class' definition, but there must be something more pythonic/elegant?
# 2
special_dict = SpecialDictionary()
for w in words:
special_dict[w] += 1
Subclass dict and override its __missing__ method to return 0:
class SpecialDictionary(dict):
def __missing__(self, k):
return 0
words = ['a', 'b', 'c', 'a', 'b']
special_dict = SpecialDictionary()
for w in words:
special_dict[w] += 1
print special_dict
#{'c': 1, 'a': 2, 'b': 2}
You need to use dict.get:
>>> my_dict = {}
>>> for x in words:
... my_dict[x] = my_dict.get(x,0) + 1
...
>>> my_dict
{'a': 2, 'c': 1, 'b': 2}
dict.get returns the value of the key if present, else a default
Syntax: dict.get(key,[default])
you can also use try and except, if key is not found in dictionary it raises keyError:
>>> for x in words:
... try:
... my_dict[x] += 1
... except KeyError:
... my_dict[x] = 1
...
>>> my_dict
{'a': 2, 'c': 1, 'b': 2}
using Counter:
>>> from collections import Counter
>>> words = ['a', 'b', 'c', 'a', 'b']
>>> my_count = Counter(words)
>>> my_count
Counter({'a': 2, 'b': 2, 'c': 1})
You can use a defaultdict. Or is this one of the “modules out there” that you wish to avoid?
from collections import defaultdict
d = defaultdict(lambda : 0)
d['a'] += 1
print(d['a'])
print(d['b'])
It will print:
1
0
The 'SpecialDictionary' that implements that kind of behavior is the collections.defaultdict. It takes a function as first parameter as an default-value-factory. When ever a lookup is performed it checks if the key is already in the dictionary and if thats not the case it uses that factory-function to create a value which is then added to the dictionary (and returned by the lookup). See the docs on how it is implemented.
Counter is a special variant of the defaultdict that uses int as factory-function (and provides some additional methods )

How to count occurrences of specific dict key in list of dicts

I'm trying to count the number of times a specified key occurs in my list of dicts. I've used Counter() and most_common(n) to count up all the keys, but how can I find the count for a specific key? I have this code, which does not work currently:
def Artist_Stats(self, artist_pick):
entries = TopData(self.filename).data_to_dict()
for d in entries:
x = d['artist']
find_artist = Counter()
print find_artist[x][artist_pick]
The "entries" data has about 60k entries and looks like this:
[{'album': 'Nikki Nack', 'song': 'Find a New Way', 'datetime': '2014-12-03 09:08:00', 'artist': 'tUnE-yArDs'},]
You could extract it, put it into a list, and calculate the list's length.
key_artists = [k['artist'] for k in entries if k.get('artist')]
len(key_artists)
Edit: using a generator expression might be better if your data is big:
key_artists = (1 for k in entries if k.get('artist'))
sum(key_artists)
2nd Edit:
for a specific artist, you would replace if k.get('artist') with if k.get('artist') == artist_pick
3rd Edit: you could loop as well, if you're not comfortable with comprehensions or generators, or if you feel that enhances code readability
n = 0 # number of artists
for k in entries:
n += 1 if k.get('artist') == artist_pick else 0
You can add Counter objects together with +. Below is a demonstration:
>>> from collections import Counter
>>> data = [{'a':1, 'b':1}, {'a':1, 'c':1}, {'b':1, 'c':1}, {'a':1, 'c':1}, {'a':1, 'd':1}]
>>> counter = Counter(data[0])
>>> for d in data[1:]:
... counter += Counter(d)
...
>>> counter
Counter({'a': 4, 'c': 3, 'b': 2, 'd': 1})
>>> counter['a'] # Count of 'a' key
4
>>> counter['d'] # Count of 'd' key
1
>>>
Or, if you want to get fancy, replace the for-loop with sum and a generator expression:
>>> from collections import Counter
>>> data = [{'a':1, 'b':1}, {'a':1, 'c':1}, {'b':1, 'c':1}, {'a':1, 'c':1}, {'a':1, 'd':1}]
>>> counter = sum((Counter(d) for d in data[1:]), Counter(data[0]))
>>> counter
Counter({'a': 4, 'c': 3, 'b': 2, 'd': 1})
>>>
I personally prefer the readability of the for-loop though.
If you mean to count the keys rather than the distinct values to a particular key, then without Counter():
artist_key_count = 0
for track in entries:
if 'artist' in track.keys():
artist_key_count += 1
If you mean to count the number of times each artist appears in your list of tracks, you can also do this without Counter():
artist_counts = {}
for track in entries:
artist = track.get('artist')
try:
artist_counts[artist] += 1
except KeyError:
artist_counts[artist] = 1

Categories

Resources