Python letter frequency mapping [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I have a Python script that reads in a encrypted text file and decrypts it in various ways. The last 2 options I am trying to add are to map out the most frequent letters of the file and the most frequent letters in the English language.
Here are my previous functions that display frequency:
def functOne:
Crypt = input("what file would you like to select? ")
filehandle = open(Crypt, "r")
data = filehandle.read().upper()
char_counter = collections.Counter(data)
for char, count in char_counter.most_common():
if char in string.ascii_uppercase:
print(char, count)
def FunctTwo:
print "Relative letter Freq of letters in English Language A-Z; ENGLISH = (0.0749, 0.0129, 0.0354, 0.0362, 0.1400, 0.0218, 0.0174, 0.0422, 0.0665, 0.0027, 0.0047, 0.0357, 0.0339, 0.0674, 0.0737, 0.0243, 0.0026, 0.0614, 0.0695, 0.0985, 0.0300, 0.0116, 0.0169, 0.0028, 0.0164, 0.0004)"
Here's the description of what I need to do for the next two:
Function 3:
Map the most frequent letter in text to the most frequent in the English language in descending order.
[letter in cryptogram] -> [letter in english language]
Function 4:
Allow user to manually edit frequency maps
How would I go about doing this? I'm kinda lost on the mapping part, at least combing the two frequencies and allow editing.

First, you have to turn your code into actual valid Python code. For example, your functions have to be defined with a list of arguments.
Then, you have to do is return values rather than just printing them.
Also, you don't want a string representation of a tuple of frequencies, but an actual tuple of them that you can use.
And finally, you're going to have to put the two collections into some kind of format that can be compared. ENGLISH is just a sequence of 26 frequencies; the value computed by functOne is a sequence of up to 26 (letter, count) pairs in descending order of frequency. But really, we don't need the counts or the frequencies at all; we just need the letters in descending order of frequency.
In fact, if you look at it, functTwo is completely unnecessary—it's effectively computing a constant, so you might as well just do that at module level.
While we're at it, I'd reorganize functOne so it takes the input as an argument. And close the file instead of leaking it. And give the functions meaningful names.
def count_letters(data):
data = data.upper()
char_counter = collections.Counter(data)
return [char for char, count in char_counter.most_common()]
english_freqs = (0.0749, 0.0129, 0.0354, 0.0362, 0.1400, 0.0218, 0.0174,
0.0422, 0.0665, 0.0027, 0.0047, 0.0357, 0.0339, 0.0674,
0.0737, 0.0243, 0.0026, 0.0614, 0.0695, 0.0985, 0.0300,
0.0116, 0.0169, 0.0028, 0.0164, 0.0004)
pairs = zip(english_freqs, string.ascii_uppercase)
english_letters = [char for count, char in sorted(pairs, reversed=True)]
def decrypt(data):
input_letters = count_letters(data)
return {input_letter: english_letter
for input_datum, english_datum in zip(input_letters, english_letters)}
crypt = input("what file would you like to select? ")
with open(crypt, "r") as f:
data = f.read()
mapping = decrypt(data)
For the editing feature… you'll have to design what you want the interface to be, before you can implement it. But presumably you're going to edit the english_freqs object (which means you may want to use a list instead of a tuple) and rebuild english_letters from it (which means you may want that in a function after all).

Related

Replacing multiple words in string from a dictionary

There's a dictionary of abbreviations, the key being the abbreviation and the value being its definition ("TTYL","Talk To You Later"). When user inputs something with more than 1 abbreviation , I want a program to replace the abbreviations with the definition as an addition to the original input. I got the program to work, but only for 1 abbreviation. I want it to be able to handle more than 1 abbreviation in a string. I believe the solution has something to do with the nested for loop, however I'm uncertain and need some help.
Python Code:
abbreviationsDictionary = {
"ADBA":"As Directed By Arborist",
"CRTS":"Crown Reduced To Shape",
"NWIC":"Noob Will Improve Company"
}
note = input("Enter note: ")
listOfWordsInNote = note.split()
updatedNote = ""
for key in abbreviationsDictionary:
for word in listOfWordsInNote:
if (key==word):
updatedNote = note.replace(key,abbreviationsDictionary[key])
print(updatedNote)
Current Output (only works for 1 abbreviation):
Enter note: mike is going to do whatever ADBA because he knows NWIC
mike is going to do whatever ADBA because he knows Noob Will Improve Company
Desired Output
Enter note: mike is going to do whatever ADBA because he knows NWIC
mike is going to do whatever As Directed By Arborist because he knows Noob Will Improve Company
Your error is that you use
updatedNote = note.replace(key,abbreviationsDictionary[key])
So, each time a new key is found, you restart with note (which has not changed )
just replace by :
note = note.replace(key,abbreviationsDictionary[key])
and print (note) :
mike is going to do whatever As Directed By Arborist because he knows Noob Will Improve Company
Rather than replacing in the input string, get the [whitespace delimited] tokens from the user input then use a simple generator to reconstruct:
abbreviationsDictionary = {
"ADBA": "As Directed By Arborist",
"CRTS": "Crown Reduced To Shape",
"NWIC": "Noob Will Improve Company"
}
note = input("Enter note: ")
print(' '.join(abbreviationsDictionary.get(loin, loin) for loin in note.split()))

How to solve the error in the following program which is written Functional Programming way? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
Question: Write a program that reads table with given columns from input stream. Columns are name, amount, debt. Then filter the table (condition: debt is equal to 0). After that increase debt by 42% then print results.
I am a beginner in Python and have tried multiple times but still couldn't fixed the problem. Help will be much appreciated.
Input:
10
Tatiana Santos 411889 36881
Yuvraj Holden 121877 0
Theia Nicholson 783887 591951
Raife Padilla 445511 0
Hamaad Millington 818507 276592
Maksim Whitehead 310884 0
Iosif Portillo 773233 0
Lachlan Daniels 115100 0
Evie-Grace Reese 545083 0
Ashlea Cooper 68771 0
Required Output:
Tatiana Santos 411889 52371.02
Theia Nicholson 783887 840570.42
Hamaad Millington 818507 392760.64
My Solution:
def input_data(n):
tup = []
if n>0:
tup.append(tuple(map(str,input().split(" "))))
input_data(n-1) #I know there's a problem in the recursion. I am not #doing anything with the return value. Please help
return tup
def filtertuple(* tup): # After debugged I got to know at this point only one row is passed to function
newtuple = filter(lambda i: i[2]!=0,tup)
return tuple(newtuple)
def increasedebt(newtuple):
newtuple1 = tuple(map(lambda i:(i[2])*(142/100)),newtuple)
return (newtuple1)
def output_data():
n=int(input())
return n
print(increasedebt(filtertuple(input_data(output_data()))))
Error: Traceback (most recent call last):
File "C:\Users\msi-pc\PycharmProjects\ProgramminglanguageTask3\main.py",
line 28, in <module>
print(increasedebt(filtertuple(input_data(output_data()))))
File "C:\Users\msi-pc\PycharmProjects\ProgramminglanguageTask3\main.py",
line 14, in filtertuple
return tuple(newtuple)
File "C:\Users\msi-pc\PycharmProjects\ProgramminglanguageTask3\main.py",
line 12, in <lambda>
newtuple = filter(lambda i: i[2] != 0, tup)
IndexError: list index out of range
I see two main issues with how your code passes the data from input_data to filtertuple.
The first issue is that your recursion in input_data is messed up, you never do anything with the results of the recursive calls so only the first row of input data gets included in the final return value. Recursion really isn't an ideal approach to this problem, a loop would be a lot simpler and cleaner. But you could make the recursion work, if you do something with the value returned to you, like tup.extend(intput_data(n-1)). If you stick with recursion, you'll also need to make the base case return something appropriate (or add an extra check for None), like an empty list (or tuple).
The second issue is that filtertuple is written to expect many arguments, but you're only passing it one. So tup will always be a 1-tuple containing the actual argument. If you're expecting the one argument to be a list of tuples (or tuple of tuples, I'm not sure exactly what API you're aiming for), you shouldn't use *tup in the argument list, just tup is good without the star. You could call filtertuple(*input_data(...)) which would unpack your tuple of tuples into many arguments, but that would be silly if the function is just going to pack them back up into tup again.
There may be other issues further along in the code, I was only focused on the input_data and filtertuple interactions, since that's what you were asking about.
Here's my take on solving your problem:
def gather_data(num_lines):
if num_lines == 0: # base case
return [] # returns an empty list
data = gather_data(num_lines-1) # recursive case, always gives us a list
row = tuple(map(int, input().split(" "))) # get one new row
data.append(row) # add it to the existing list
return data
def filter_zeros(data): # note, we only expect one argument (a list of tuples)
return list(filter(lambda i: i[1] != 0, data))
def adjust_debt(data): # this only returns a single column, should it return
return list(map(lambda i: (i[1]) * (142 / 100), data)) # the whole table?
# calling code:
num_lines = int(input()) # this code really didn't deserve its own function
data = gather_data(num_lines) # extra variables help debugging
filtered = filter_zeros(data) # but they could be dropped later
adjusted = adjust_debt(filtered)
print(adjusted)
I did find one extra issue, you had the parentheses wrong in the function I renamed to adjust_debt.

Getting word count of doc/docx files in R

I have a stream of doc/docx documents that I need to get the word count of.
The procedure so far is to manually open the document and write down the word count offered by MS Word itself, and I am trying to automate it using R.
This is what I tried:
library(textreadr)
library(stringr)
myDocx = read_docx(myDocxFile)
docText = str_c(myDocx , collapse = " ")
wordCount = str_count(test, "\\s+") + 1
Unfortunately, wordCount is NOT what MS Word suggests.
For example, I noticed that MS Word counts the numbers in numbered lists, whereas textreadr does not even import them.
Is there a workaround? I don't mind trying something in Python, too, although I'm less experienced there.
Any help would be greatly appreciated.
This should be able to be done using the tidytext package in R.
library(textreadr)
library(tidytext)
library(dplyr)
#read in word file without password protection
x <- read_docx(myDocxFile)
#convert string to dataframe
text_df <-tibble(line = 1:length(x),text = x)
#tokenize dataframe to isolate separate words
words_df <- text_df %>%
unnest_tokens(word,text)
#calculate number of words in passage
word_count <- nrow(words_df)
I tried reading the docx files with a different library (the officer) and, even though it doesn't agree 100%, it does significantly better this time.
Another small fix would be to copy MS Word's strategy on what is a Word and what isn't. The naive method of counting all spaces can be improved by ignoring the "En Dash" (U+2013) character as well.
Here is my improved function:
getDocxWordCount = function(docxFile) {
docxObject = officer::read_docx(docxFile)
myFixedText = as.data.table(officer::docx_summary(docxObject))[nchar(str_trim(text)) > 1, str_trim(text)]
wordBd = sapply(as.list(myFixedText), function(z) 1 + str_count(z, "\\s+([\u{2013}]\\s+)?"))
return(sum(wordBd))
}
This still has a weakness that prevents 100% accuracy:
The officer library doesn't read list separators (like bullets or hyphens), but MS Word considers those as words. So in any list, this function currently returns X words less where X is the number of listed items. I haven't experimented too much with the attributes of the docxObject, but if it somehow holds the number of listed items, then a definite improvement can be made.

struggling with python homework [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I got a .txt file with some lines in it:
325255, Jan Jansen
334343, Erik Materus
235434, Ali Ahson
645345, Eva Versteeg
534545, Jan de Wilde
345355, Henk de Vries
Write a program that starts with opening the file kaartnummers.txt
Determine the number of lines and the largest card number in the file. Then print these data.
my code isnt finished yet but i tried atleast!:
def kaartinfo():
lst = []
infile = open('kaartnummers.txt', 'r')
content = infile.readlines()
print(len(content))
for i in content:
print(i.split())
kaartinfo()
I know that my program opens the file and counts the number of lines in it.. all after that is wrong <3
I can't figure out how to get the max number in the list.. Please if you got an answer use simple readable Python Language.
I'm not good at python, and there are probably much more elegant solutions, but this is how I would do it. Some may say this is like C++/Java in python, which many tend to avoid.
def kaartinfo():
lst = []
infile = open('kaartnummers.txt', 'r')
content = infile.readlines()
for i in content:
value = i.split(',')
value[0] = int(value[0])
lst.append(value)
return lst
Use the kaartinfo() function to retrieve a list
my_list = kaartinfo()
Assume first value is the maximum
maximumValue = my_list[0][0]
Go through every value in the list, check if they are greater than the current maximum
# if they are, set them as the new current maximum
for ele in my_list:
if ele[0] > maximumValue:
maximumValue = ele[0]
when the above loop finishes, maximum value will be the largest value in the list.
#Convert the integer back to a string, and print the result
print(str(maximumValue) + ' is the maximum value in the file!')
This should be enough to do the job:
with open('kaartnummers.txt', 'r') as f:
data = f.readlines()
print('There are %d lines in the file.' % len(data))
print('Max value is %s.' % max(line.split(',')[0] for line in data))
Given the input file you provided, the output would be:
There are 6 lines in the file.
Max value is 645345.
Of course, you can put it in a function if you like.

Python: How to print my simple poem [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I would like to know how I could print out sentences using my triplet poem program.
My program randomly picks a list of nouns to use.
My program:
import random
def nouns():
nounPersons = ["cow","crowd","hound","clown"];
nounPlace = ["town","pound","battleground","playground"];
rhymes = ["crowned","round","gowned","found","drowned"];
nounPersons2 = ["dog","frog","hog"];
nounPlace2 = ["fog","Prague","smog"];
rhymes2 = ["log","eggnog","hotdog"];
nounList1 = [nounPersons,nounPlace,rhymes]
nounList2 = [nounPersons2,nounPlace2,rhymes2]
nounsList = [nounList1, nounList2]
randomPick = random.choice(nounsList)
return(randomPick)
verbs = ["walked","ran","rolled","biked","crawled"];
nouns()
For example, I could have "The cow walked to the town. But then it was drowned." And just replace the nouns/rhyme(cow, town,drowned) and verb(walked) with my randomizer.
Would I use random.randint in some way?
I just basically need a general print statement like the example I showed using my randomizer to randomly pick between the nouns/rhymes.
As usual (for me), there may be a more Pythonic approach, but to get what you have working, I did three things:
assigned your call to the nouns() function to 'chosen_list' variable. That way the returned 'randomPick' gets used.
built in a selection step to get individual words from the lists in 'chosen_list' and your verb list
added a final print statement with formatting to assemble the words in to a sentence
the code:
import random
def nouns():
nounPersons = ["cow","crowd","hound","clown"];
nounPlace = ["town","pound","battleground","playground"];
rhymes = ["crowned","round","gowned","found","drowned"];
nounPersons2 = ["dog","frog","hog"];
nounPlace2 = ["fog","Prague","smog"];
rhymes2 = ["log","eggnog","hotdog"];
nounList1 = [nounPersons,nounPlace,rhymes]
nounList2 = [nounPersons2,nounPlace2,rhymes2]
nounsList = [nounList1, nounList2]
randomPick = random.choice(nounsList)
return randomPick
verbs = ["walked","ran","rolled","biked","crawled"]
# this is change 1.
chosen_list = nouns()
# select single words from lists - this is change 2.
noun_subj = random.choice(chosen_list[0])
noun_obj = random.choice(chosen_list[1])
rhyme_word = random.choice(chosen_list[2])
verb_word = random.choice(verbs)
# insert words in to text line - this is change 3.
print ("The {} {} to the {}. But then it was {}.".format(noun_subj, verb_word, noun_obj, rhyme_word))

Categories

Resources