Get frequency of letters in a sentence - python

I am trying to make a code where I can input a random sentence, and count the frequency of the times a letter returns in this string:
def getfreq(lines):
""" calculate a list with letter frequencies
lines - list of lines (character strings)
both lower and upper case characters are counted.
"""
totals = 26*[0]
chars = []
for line in lines:
for ch in line:
chars.append(totals)
return totals
# convert totals to frequency
freqlst = []
grandtotal = sum(totals)
for total in totals:
freq = totals.count(chars)
freqlst.append(freq)
return freqlst
So far I have achieved to append each letter of the input in the list (chars). But now I need a way to count the amount of times a character returns in that list, and express this in a frequency.

Without a collections.Counter:
import collections
sentence = "A long sentence may contain repeated letters"
count = collections.defaultdict(int) # save some time with a dictionary factory
for letter in sentence: # iterate over each character in the sentence
count[letter] += 1 # increase count for each of the sentences
Or if you really want to do it fully manually:
sentence = "A long sentence may contain repeated letters"
count = {} # a counting dictionary
for letter in sentence: # iterate over each character in the sentence
count[letter] = count.get(letter, 0) + 1 # get the current value and increase by 1
In both cases count dictionary will have each different letter as its key and its value will be the number of times a letter was encountered, e.g.:
print(count["e"]) # 8
If you want to have it case-insensitive, be sure to call letter.lower() when adding it to the count.

There's a very handy function, Counter, within the collections module which will compute the frequency of objects within a sequence:
import collections
collections.Counter('A long sentence may contain repeated letters')
which will produce:
Counter({' ': 6,
'A': 1,
'a': 3,
'c': 2,
'd': 1,
'e': 8,
'g': 1,
'i': 1,
'l': 2,
'm': 1,
'n': 5,
'o': 2,
'p': 1,
'r': 2,
's': 2,
't': 5,
'y': 1})
In your case, you might want to concatenate your lines, e.g. using ''.join(lines) before passing into the Counter.
If you want to achieve a similar result using raw dictionaries, you might want to do something like the following:
counts = {}
for c in my_string:
counts[c] = counts.get(c, 0) + 1
Depending on your version of Python, this may be slower, but uses the .get() method of dict to either return an existing count or a default value before incrementing the count for each character in your string.

You can use a set to reduce the text to unique characters and then just count:
text = ' '.join(lines) # Create one long string
# Then create a set of all unique characters in the text
characters = {char for char in text if char.isalpha()}
statistics = {} # Create a dictionary to hold the results
for char in characters: # Loop through unique characters
statistics[char] = text.count(char) # and count them

Related

How to count occurrences of a char in a list of words

I'm trying to create a function that takes a list of words and returns a dictionary, where the keys are letters which are not vowels from any word in the list. The value is the number of occurrences of the corresponding letter.
Code so far:
def get_all_alphabetic_non_vowels(words_list):
adict = {}
vowels = "aeiou"
for i in range(len(words_list)):
words_list[i] = words_list[i].lower()
for word in words_list:
for char in word:
if char not in vowels:
keys = char
occ = ''.join(words_list).count(char)
adict[keys] = occ
return adict
This works, however I was penalized as we weren't supposed to use .join and .count for this question to count the occurrences of the corresponding letter. I've tried implementing another nested for loop below the keys = char line to check for occurrences which failed so now I'm a bit confused as to where and how I should count occurences.
This seems like a job for collections.Counter. You can just pass it a comprehensions that iterates and filters the letters:
from collections import Counter
words_list = ["hello", "world"]
counts = Counter(letter for word in words_list for letter in word if letter not in "aeiou")
#Counter({'h': 1, 'l': 3, 'w': 1, 'r': 1, 'd': 1})
This will give you a subclass of a dict. So this still works as expected:
counts['l']
# 3
Of course, you can do it the hard way with a dict directly:
words_list = ["hello", "world"]
counts = {}
for word in words_list:
for letter in word:
if letter not in "aeiuo":
counts.setdefault(letter, 0)
counts[letter] += 1
setdefault(letter, 0) initialized the dictionary entry if it's not already set. It's basically like saying if letter not in counts: letter[counts] = 0
The reason you were penalized is that doing count inside the loop is really inefficient. Each time you call count() python needs to iterate through the string, and you are already iterating through the string. This means that the number of iterations can explode exponentially (you are also joining the whole list into a string in the loop, which is also expensive and unnecessary).
You can replace ''.join(x) with
result = ''
for char in x:
result += char
and you can replace x.count(y) with
count = 0
for char in x:
if x == y:
count += 1
The above only works if y is a single-character string, which in your case is. That said, you can refactor
occ = ''.join(words_list).count(char)
as
words = ''
for word in words_list:
words += word
occ = 0
for c in words:
occ += 1
vowels = set('aeiou')
words = ['Lorem', 'ipsum', 'dolor', 'sit', 'amet']
out_dict = {}
# join, remove vowels, lower, sort and iterate
for ch in sorted(''.join(words).translate({ord(i): None for i in vowels}).lower()):
out_dict[ch] = out_dict.setdefault(ch, 0) + 1
print(out_dict)
Output:
{'d': 1, 'l': 2, 'm': 3, 'p': 1, 'r': 2, 's': 2, 't': 2}

Python word counter sensitive to if word is surrounded by quotation marks?

I have a problem with my Python program. I am trying to make a word counter, an exercise from Exercism.
Now, my program must pass 13 tests, all of which are diffrent strings with spaces, characters, digits, etc.
I used to have a problem because I would replace all non-letters and non-digits by a space. This created problem's for words like "don't", because it would divided it into two strings, don and t. To counter this I added an if statement excluding single ' marks from being replaced, which worked.
However, one of the strings I must test is "Joe can't tell between 'large' and large.". The problem is that since I exclude ' markets, here large and 'large' are considered as two different things, also they are the same word. How do I tell my program to "erase" quotes surrounding a word?
Here is my code, and I have added two scenarios, one being the string above, and the other being another string with only one ' mark that you should not delete:
def word_count(phrase):
count = {}
for c in phrase:
if not c.isalpha() and not c.isdigit() and c != "'":
phrase = phrase.replace(c, " ")
for word in phrase.lower().split():
if word not in count:
count[word] = 1
else:
count[word] += 1
return count
print(word_count("Joe can't tell between 'large' and large."))
print(word_count("Don't delete that single quote!"))
Thank you for your help.
The module string holds some nice text constants - important for you would be punctuation. The module collections holds Counter - a specialized dictionary class used to count things:
from collections import Counter
from string import punctuation
# lookup in set is fastest
ps = set(string.punctuation) # "!#$%&'()*+,-./:;<=>?#[\]^_`{|}~
def cleanSplitString(s):
"""cleans all punctualtion from the string s and returns split words."""
return ''.join([m for m in s if m not in ps]).lower().split()
def word_count(sentence):
return dict(Counter(cleanSplitString(sentence))) # return a "normal" dict
print(word_count("Joe can't tell between 'large' and large."))
print(word_count("Don't delete that single quote!"))
Output:
{'joe': 1, 'cant': 1, 'tell': 1, 'between': 1, 'large': 2, 'and': 1}
{'dont': 1, 'delete': 1, 'that': 1, 'single': 1, 'quote': 1}
If you want to keep the punctuations inside words, use:
def cleanSplitString_2(s):
"""Cleans all punctuations from start and end of words, keeps them if inside."""
return [w.strip(punctuation) for w in s.lower().split()]
Output:
{'joe': 1, "can't": 1, 'tell': 1, 'between': 1, 'large': 2, 'and': 1}
{"don't": 1, 'delete': 1, 'that': 1, 'single': 1, 'quote': 1}
Readup on strip()
Use .strip() to take off the first and last characters once you have them in the list - https://python-reference.readthedocs.io/en/latest/docs/str/strip.html
def word_count(phrase):
count = {}
for c in phrase:
if not c.isalpha() and not c.isdigit() and c != "'":
phrase = phrase.replace(c, " ")
print(phrase)
for word in phrase.lower().split():
word = word.strip("\'")
if word not in count:
count[word] = 1
else:
count[word] += 1
return count

using lambda and dictionaries functions

I wrote this function:
def make_upper(words):
for word in words:
ind = words.index(word)
words[ind] = word.upper()
I also wrote a function that counts the frequency of occurrences of each letter:
def letter_cnt(word,freq):
for let in word:
if let == 'A': freq[0]+=1
elif let == 'B': freq[1]+=1
elif let == 'C': freq[2]+=1
elif let == 'D': freq[3]+=1
elif let == 'E': freq[4]+=1
Counting letter frequency would be much more efficient with a dictionary, yes. Note that you are manually lining up each letter with a number ("A" with 0, et cetera). Wouldn't it be easier if we could have a data type that directly associated a letter with the number of times it occurs, without adding an extra set of numbers in between?
Consider the code:
freq = {"A":0, "B":0, "C":0, "D":0, ... ..., "Z":0}
for letter in text:
freq[letter] += 1
This dictionary is used to count frequencies much more efficiently than your current code does. You just add one to an entry for a given letter each time you see it.
I will also mention that you can count frequencies effectively with certain libraries. If you are interested in analyzing frequencies, look into collections.Counter() and possibly the collections.Counter.most_common() method.
Whether or not you decide to just use collections.Counter(), I would attempt to learn why dictionaries are useful in this context.
One final note: I personally found typing out the values for the "freq" dictionary to be tedious. If you want you could construct an empty dictionary of alphabet letters on-the-fly with this code:
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
freq = {letter:0 for letter in alphabet}
If you want to convert strings in the list to upper case using lambda, you may use it with map() as:
>>> words = ["Hello", "World"]
>>> map(lambda word: word.upper(), words) # In Python 2
['HELLO', 'WORLD']
# In Python 3, use it as: list(map(...))
As per the map() document:
map(function, iterable, ...)
Apply function to every item of iterable and return a list of the results.
For finding the frequency of each character in word, you may use collections.Counter() (sub class dict type) as:
>>> from collections import Counter
>>> my_word = "hello world"
>>> c = Counter(my_word)
# where c holds dictionary as:
# {'l': 3,
# 'o': 2,
# ' ': 1,
# 'e': 1,
# 'd': 1,
# 'h': 1,
# 'r': 1,
# 'w': 1}
As per Counter Document:
A Counter is a dict subclass for counting hashable objects. It is an unordered collection where elements are stored as dictionary keys and their counts are stored as dictionary values.
for the letter counting, don't reinvent the wheel collections.Counter
A Counter is a dict subclass for counting hashable objects. It is an unordered collection where elements are stored as dictionary keys and their counts are stored as dictionary values. Counts are allowed to be any integer value including zero or negative counts. The Counter class is similar to bags or multisets in other languages.
def punc_remove(words):
for word in words:
if word.isalnum() == False:
charl = []
for char in word:
if char.isalnum()==True:
charl.append(char)
ind = words.index(word)
delimeter = ""
words[ind] = delimeter.join(charl)
def letter_cnt_dic(word,freq_d):
for let in word:
freq_d[let] += 1
import string
def letter_freq(fname):
fhand = open(fname)
freqs = dict()
alpha = list(string.uppercase[:26])
for let in alpha: freqs[let] = freqs.get(let,0)
for line in fhand:
line = line.rstrip()
words = line.split()
punc_remove(words)
#map(lambda word: word.upper(),words)
words = [word.upper() for word in words]
for word in words:
letter_cnt_dic(word,freqs)
fhand.close()
return freqs.values()
You can read the docs about the Counter and the List Comprehensions or run this as a small demo:
from collections import Counter
words = ["acdefg","abcdefg","abcdfg"]
#list comprehension no need for lambda or map
new_words = [word.upper() for word in words]
print(new_words)
# Lets create a dict and a counter
letters = {}
letters_counter = Counter()
for word in words:
# The counter count and add the deltas.
letters_counter += Counter(word)
# We can do it to
for letter in word:
letters[letter] = letters.get(letter,0) + 1
print(letters_counter)
print(letters)

Python - Count letters in random strings

I have a bunch of integers which are allocated values using the random module, then converted to letters depending on their position of the alphabet.
I then combine a random sample of these variables into a "master" variable, which is printed to the console.
I want to then count the occurrence of each character, which will later be written to an output file.
Any help on how i would go about doing this?
>>> from collections import Counter
>>> for letter, count in Counter("aaassd").items():
... print("letter", letter, "count", count)
...
letter s count 2
letter a count 3
letter d count 1
Probably better to use collections.Counter(), but here is a list comprehension
>>> li = 'aaassd'
>>> res = {ch: sum(1 for x in li if x==ch) for ch in set(li)}
{'d': 1, 's': 2, 'a': 3}

Can Words in List1 be Spelled by Letters in List2

I'm new to coding and python, and I'm trying a version of the Scrabble Challenge at OpenHatch: https://openhatch.org/wiki/Scrabble_challenge.
The goal is to check whether each word in a list can be spelled by letters in a tile rack. I wrote a for-loop to check whether each letter in the word is in the tile rack, and if so, remove the letter from the rack (to deal with duplicates). However, I'm stumped on how to add a word to my valid_word list if the for-loop finds that each letter of the word is in the rack.
In this example, 'age' should be valid, but 'gag' should not be, as there is only one 'g' in the rack.
word_list = ['age', 'gag']
rack = 'page'
valid_words = []
for word in word_list:
new_rack = rack
for x in range(len(word)):
if word[x] in new_rack:
new_rack = new_rack.replace(str(word[x]), "")
I would probably use a Counter here to simplify things. What the Counter class does is create a mapping of the items in an iterable to its frequency. I can use that to check whether the frequency of the individual characters is greater than those in the rack and print the word accordingly.
>>> from collections import Counter
>>> word_list = ['age', 'gag']
>>> rack = Counter('page')
>>> print rack
Counter({'a': 1, 'p': 1, 'e': 1, 'g': 1})
>>> for word in word_list:
word_count = Counter(word)
for key, val in word_count.iteritems():
if rack[key] < val:
break
else:
print word
age # Output.
Also, Counter has the nice property that it returns a 0 if the given key does not exist in the Counter class. So, we can skip the check to see whether the tile has the key, since rack[key] < val would fail in that case.

Categories

Resources