count letters of a file and create a histogram

count letters of a file and create a histogram - python

I'm looking for some help with an issue I'm facing.
I'm trying to read a text file, count the number of times each letter occurs in the file using a dictionary.
Uppercase letters are turned into lowercase letters and only a-z in English are counted. Then display a star histogram like below from the counts and print a count of the total amount of letters.
I had the first part working, count the number of times each letter occurs in the file, until I added in my histogram code.
I'm not getting an error but the Terminal just displays this when ran:
{'d': 1}
My current code is:
def LetterCount(file_path):
file_path = file_path.lower().translate(file_path)
file_path = file_path.translate(string.punctuation)
file_path = file_path.strip(string.punctuation + string.whitespace)
list1=list(file_path)
lcDict= {}
with open(file_path,'r') as f:
for l in list1:
if l.isalpha():
if l in lcDict:
lcDict[l] +=1
else:
lcDict[l]= 1
return lcDict
file_path = '/myfolder/text.txt'
if __name__ == "__main__":
print(LetterCount(file_path))
def histogram(file_path):
sumValues = LetterCount(file_path)
padding = max(len(sumValues), len('Element'))
padding1 = max(len(str(max(sumValues))), len('Value'))
print("\nCreating a histogram from values: ")
print("%s %10s %10s" %("Element", "Value", "Histogram"))
for i,n in enumerate(sumValues, start=1):
('{0} {1} {2}'.format(
str(i).ljust(padding),
str(i).rjust(padding1),
'*'*n))
print(histogram(file_path)
What I'm trying to achieve from the histogram is this
a | *****
b | ***
c | ******
d | ****
e | *******
f | **
h | *****
...
z | *
I'd be really grateful for any help

because I don't have your file and cannot reproduce your specific example, I would answer the 2 questions apart.
First, in order to create an histogram as dictionary for your file (represented as list of strings) follow this part of code:
list_of_sentences = ["this is my first code in python", "it's rainy today", "thanks"]
m_dict = {}
for sentence in list_of_sentences:
for letter in sentence:
if letter.isalpha():
if letter in m_dict.keys():
m_dict[letter]+= 1
else:
m_dict[letter] =1
print(m_dict)
output:
{'t': 6, 'h': 3, 'i': 6, 's': 5, 'm': 1, 'y': 4, 'f': 1, 'r': 2, 'c': 1, 'o': 3, 'd': 2, 'e': 1, 'n': 4, 'p': 1, 'a': 3, 'k': 1}
The approach above will iterate over the letters in your file and count them, if you want to iterate over a to z, that for big files would be much efficient (moreover, it will print also letters that don't exist in your file), you better use this:
for code in range(ord('a'), ord('z') + 1):
m_dict[chr(code)] = ''.join(list_of_sentences).count(chr(code))
output:
{'t': 6, 'h': 3, 'i': 6, 's': 5, 'm': 1, 'y': 4, 'f': 1, 'r': 2, 'c': 1, 'o': 3, 'd': 2, 'e': 1, 'n': 4, 'p': 1, 'a': 3, 'k': 1, 'b': 0, 'g': 0, 'j': 0, 'l': 0, 'q': 0, 'u': 0, 'v': 0, 'w': 0, 'x': 0, 'z': 0}
Now when we have the histogram in our hands (let's continue with the first one), let's face the second part of formatting it as you want to:
def print_as_histogram(m_dict):
for letter in sorted(m_dict.keys()):
print(f'{letter} | {"*"*m_dict[letter]}')
print_as_histogram(m_dict)
output:
a | ***
c | *
d | **
e | *
f | *
h | ***
i | ******
k | *
m | *
n | ****
o | ***
p | *
r | **
s | *****
t | ******
y | ****
Sorted the letters, because it looks better in my opinion

You can use some standard libraries to make your life a bit easier!
import collections
import re
# Open the file
with open("./file.txt", 'r') as f:
txt = f.read()
# Find all the alphabetic characters
letters = re.findall("[a-zA-Z]", txt)
# Count them
counts = collections.Counter(letters)
# Print the star histogram
for i in 'abcdefghijklmnopqrstuvwxyz':
if i in counts:
print(f"{i} | {'*' * counts[i]}")
else: print(f"{i} | ")

Related

How to perform letter frequency?

This problem requires me to find the frequency analysis of a .txt file.
This is my code so far:
This finds the frequency of the words, but how would I get the frequency of the actual letters?
f = open('cipher.txt', 'r')
word_count = []
for c in f:
word_count.append(c)
word_count.sort()
decoding = {}
for i in word_count:
decoding[i] = word_count.count(i)
for n in decoding:
print(decoding)
This outputs (as a short example, since the txt file is pretty long):
{'\n': 12, 'vlvf zev jvg jrgs gvzef\n': 1, 'z uvfgriv sbhfv bu wboof!\n': 1, "gsv yrewf zoo nbhea zaw urfsvf'\n": 1, 'xbhow ube gsv avj bjave yv\n': 1, ' gsv fcerat rf czffrat -\n': 1, 'viva gsrf tezff shg\n': 1, 'bph ab sbfbnrxsr (azeebj ebzw gb gsv wvvc abegs)\n': 1, 'cbfg rafrwv gsv shg.\n': 1, 'fb gszg lvze -- gsv fvxbaw lvze bu tvaebph [1689] -- r szw fhwwvaol gzpva\n': 1, 'fb r czgxsvw hc nl gebhfvef, chg avj xbewf ra nl fgezj szg, zaw\n': 1, 'fcrergf bu gsv ebzw yvxpbavw nv, zaw r xbhow abg xbaxvagezgv ba zalgsrat.\n': 1, 'fgbbw zg gsv xebffebzwf bu czegrat, r jvcg tbbwylv.\n': 1,
It gives me the words, but how would I get the letters, such as how many "a"'s there are, or how many "b"'s there are?

Counter is quite a useful class native to Python, which can be used to solve your problem elegantly.
# count the letter freqency
from collections import Counter
with open('cipher.txt', 'r') as f:
s = f.read()
c = Counter(s) # the type of c is collection.Counter
# if you want dict as your output type
decoding = dict(c)
print(decoding)
If you put "every parting from you is like a little eternity" to your cipher.txt, you'll get the following result with the code above:
{'e': 6, 'v': 1, 'r': 4, 'y': 3, ' ': 8, 'p': 1, 'a': 2, 't': 5, 'i': 5, 'n': 2, 'g': 1, 'f': 1, 'o': 2, 'm': 1, 'u': 1, 's': 1, 'l': 3, 'k': 1}
However, if you want to implement the counting by yourself, here's a possible solution, providing the same result as using Counter.
# count the letter freqency, manually, without using collections.Counter
with open('cipher.txt', 'r') as f:
s = f.read()
decoding = {}
for c in s:
if c in decoding:
decoding[c] += 1
else:
decoding[c] = 1
print(decoding)

You can use a Counter from the collections standard library, it'll generate a dictionary of results:
from collections import Counter
s = """
This problem requires me to find the frequency analysis of a .txt file.
This is my code so far: This finds the frequency of the words, but how would I get the frequency of the actual letters?"""
c = Counter(s)
print(c.most_common(5))
This will print:
[(' ', 35), ('e', 20), ('t', 13), ('s', 11), ('o', 10)]
EDIT: Without using a Counter, we can use a dictionary and keep incrementing the count:
c = {}
for character in s:
try:
c[character] += 1
except KeyError:
c[character] = 1
print(c)
This will print:
{'\n': 4, 'T': 3, 'h': 9, 'i': 9, 's': 11, ' ': 35, 'p': 1, 'r': 9, 'o': 10, 'b': 2, 'l': 6, 'e': 20, 'm': 3, 'q': 4, 'u': 7, 't': 13, 'f': 10, 'n': 6, 'd': 5, 'c': 5, 'y': 5, 'a': 6, '.': 2, 'x': 1, ':': 1, 'w': 3, ',': 1, 'I': 1, 'g': 1, '?': 1}

How do i work on Checksum of Singapore Car License Plate with Python

I have researched and searched internet on the checksum of Singapore Car License Plate. For the license plate of SBA 1234, I need to convert all the digits excluding the S to numbers. A being 1, B being 2, and so on. SBA 1234 is in a string in a text format. How do i convert B and A to numbers for the calculation for the checksum while making sure that the value B and A do not change. The conversion of B and A to numbers is only for the calculation.
How do i do the conversion for this with Python. Please help out. Thank you.

There are multiple ways to create a dictionary with values A thru Z representing values 1 thru 26. One of the simple way to do it will be:
value = dict(zip("ABCDEFGHIJKLMNOPQRSTUVWXYZ", range(1,27)))
An alternate way to it would be using the ord() function.
ord('A') is 65. You can create a dictionary with values A thru Z representing values 1 thru 26. To do that, you can use simple code like this.
atoz = {chr(i): i - 64 for i in range(ord("A"), ord("A") + 26)}
This will provide an output with a dictionary
{'A': 1, 'B': 2, 'C': 3, 'D': 4, 'E': 5, 'F': 6, 'G': 7, 'H': 8, 'I': 9, 'J': 10, 'K': 11, 'L': 12, 'M': 13, 'N': 14, 'O': 15, 'P': 16, 'Q': 17, 'R': 18, 'S': 19, 'T': 20, 'U': 21, 'V': 22, 'W': 23, 'X': 24, 'Y': 25, 'Z': 26}
You can search for the char in the dictionary to get 1 thru 26.
Alternate, you can directly use ord(x) - 64 to get a value of the alphabet. If x is A, you will get 1. Similarly, if x is Z, the value will be 26.
So you can write the code directly to calculate the value of the Singapore Number Plate as:
snp = 'SBA 1234'
then you can get a value of
snp_num = [ord(snp[1]) - 64,ord(snp[2]) - 64, int(snp[4]), int(snp[5]), int(snp[6])]
This will result in
[2, 1, 1, 2, 3]
I hope one of these options will work for you. Then use the checksum function to do your calculation. I hope this is what you are looking for.

Indexing for a changing number python

I'm trying to figure out the index for this program. I want it to print a number for each letter entered in the input, for example the string "Jon" would be:
"10 15 14"
but I keep getting an error with the for loop I created with the indexes. If anyone has any thoughts on how to fix this it would be great help!
a = 1
b = 2
c = 3
d = 4
e = 5
f = 6
g = 7
h = 8
i = 9
j = 10
k = 11
l = 12
m = 13
n = 14
o = 15
p = 16
q = 17
r = 18
s = 19
t = 20
u = 21
v = 22
w = 23
x = 24
y = 25
z = 26
name = input("Name: ")
lowercase = name.lower()
print("Your 'cleaned up' name is:", lowercase)
print("Your 'cleaned up name reduces to:")
length = len(name)
name1 = 0
for x in range(name[0], name[length]):
print(name[name1])
name1 += 1

You could save yourself all those variables, and not even need a dictionary by just utilizing ord here and calculating the numerical position in the alphabet:
Example: Taking letter c, which using the following should give us 3:
>>> ord('c') - 96
3
ord will:
Return the integer ordinal of a one-character string.
The 96 is used because of the positions of the alphabet on the ascii table.
So, with that in mind. When you enter a word, using your example: "Jon"
word = input("enter name")
print(*(ord(c) - 96 for c in name.lower()))
# 10 15 14

You can store each letter and the indices in a dict so you can easily retrieve the ones in name:
>>> from string import ascii_lowercase
>>> letters = dict(zip(ascii_lowercase, range(1, len(ascii_lowercase) + 1)))
>>> for c in name:
... print(letters[c])
If you want indices lined up in the string:
>>> print(" ".join(str(letters[c]) for c in name))
"10 15 14"

You are currently passing characters to range(). range(name[0], name[length]) with a name of 'Jon' is equivalent to range('J', 'n')... or it would be if strings were 1-indexed. Unfortunately for this code snippet, a sequence does not have an element with an index equal to the sequence's length. The last element of a three-character string has an index of two, not three. Your algorithm also has zero interaction with the letter values you defined above. It has little chance of succeeding.
Rather than defining each letter's value separately, store it in a string and then look up each letter's index in that string:
name = input('Name: ')
s = 'abcdefghijklmnopqrstuvwxyz'
print(*(s.index(c)+1 for c in name.lower()))
A generator that produces the index of each of the name's characters in the master string (plus one, because you want it one-indexed) is unpacked and sent to print(), which, with the default separator of a space, produces the desired output.

Rather than define 26 different variables, how about using a dictionary? Then you can write something like:
mapping = {
'a': 1,
# etc
}
name_in_numbers = ' '.join(mapping[letter] for letter in name)
Note that this will break for any input that doesn't only contain letters.

First of all you must use a dict to store the mapping of characters to int, you may use, string module to access all the lowercase characters, it makes your code less error prone. Secondly you just need to iterate over the characters in the lowercase string and access the mapped values as int from the given mapping:
import string
mapping = dict(zip(string.ascii_lowercase, range(1, len(string.ascii_lowercase)+1)))
name = "Anmol"
lowercase = name.lower()
print("Your 'cleaned up' name is:", lowercase)
print("Your 'cleaned up name reduces to:")
for char in lowercase:
print mapping[char],

# make up a data structure to not pollute the namespace with 26 variable names; a dictionary does well for this
dict={}
for i in range(97,123):
dict[chr(i)]=i-96
print dict
name=raw_input("Name: ")
name=name.lower()
for i in name:
print dict[i],
Output:
{'a': 1, 'c': 3, 'b': 2, 'e': 5, 'd': 4, 'g': 7, 'f': 6, 'i': 9, 'h': 8, 'k': 11, 'j': 10, 'm': 13, 'l': 12, 'o': 15, 'n': 14, 'q': 17, 'p': 16, 's': 19, 'r': 18, 'u': 21, 't': 20, 'w': 23, 'v': 22, 'y': 25, 'x': 24, 'z': 26}
Name: Jon
10 15 14

Personally, I prefer that you build a mapping of char:index, that you can always refer to later, this way:
>>> ascii_chrs = 'abcdefghijklmnopqrstuvwxyz'
>>> d = {x:i for i,x in enumerate(ascii_chrs, 1)}
>>> d
{'q': 17, 'i': 9, 'u': 21, 'x': 24, 'a': 1, 's': 19, 'm': 13, 'n': 14, 'e': 5, 'v': 22, 'b': 2, 'p': 16, 'g': 7, 'o': 15, 'j': 10, 't': 20, 'h': 8, 'f': 6, 'r': 18, 'y': 25, 'c': 3, 'k': 11, 'd': 4, 'z': 26, 'w': 23, 'l': 12}
>>>
>>> word = 'Salam'
>>> print(*(d[c] for c in word.lower()))
19 1 12 1 13

Hierarchical summing in python

Given the following array:
a = []
a.append({'c': 1, 'v': 10, 'p': 4})
a.append({'c': 2, 'v': 10, 'p': 4})
a.append({'c': 3, 'v': 10, 'p': None})
a.append({'c': 4, 'v': 0, 'p': None})
a.append({'c': 5, 'v': 10, 'p': 1})
a.append({'c': 6, 'v': 10, 'p': 1})
Where c = code, v= value and p=parent
table looks like that:
c v p
1 4
2 10 4
3 10
4
5 10 1
6 10 1
I have to sum up each parent with the values of it's children
Expected result table would be:
c v p
1 20 4
2 10 4
3 10
4 30
5 10 1
6 10 1
How to achieve this?

First, you should derive another dictionary, mapping parents to lists of their children, instead of children to their parents. You can use collections.defaultdict for this.
from collections import defaultdict
children = defaultdict(list)
for d in a:
children[d["p"]].append(d["c"])
Also, I suggest another dictionary, mapping codes to their values, so you don't have to search the entire list each time:
values = {}
for d in a:
values[d["c"]] = d["v"]
Now you can very easily define a recursive function for calculating the total value. Note, however, that this will do some redundant calculations. If your data is much larger, you might want to circumvent this by using a bit of memoization.
def total_value(x):
v = values[x]
for c in children[x]:
v += total_value(c)
return v
Finally, using this function in a dict comprehension gives you the total values for each code:
>>> {x: total_value(x) for x in values}
{1: 30, 2: 10, 3: 10, 4: 40, 5: 10, 6: 10}

Python dictionary, adding a new value to key if present multiple times in list

Im trying to split a string of letters into a dictionary which automatically adds the value +1 for every letter present more than once.
The only problem is that my code adds the value +1 for every key...For example if i input: "aasf" the dict will be: a:2, s:2, f:2... Whats wrong??
word = raw_input("Write letters: ")
chars = {}
for c in word:
chars[c] = c.count(c)
if c in chars:
chars[c] += 1
print chars

Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53)
>>> from collections import Counter
>>> Counter('count letters in this sentence')
Counter({'e': 5, 't': 5, ' ': 4, 'n': 4, 's': 3, 'c': 2, 'i': 2, 'h': 1, 'l': 1, 'o': 1, 'r': 1, 'u': 1})
>>>

you must use either
chars[c] = words.count(c)
OR
chars[c] += 1
but not both.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

count letters of a file and create a histogram - python

Related

How to perform letter frequency?

How do i work on Checksum of Singapore Car License Plate with Python

Indexing for a changing number python

Hierarchical summing in python

Python dictionary, adding a new value to key if present multiple times in list

Categories

Resources