I am currently working on a beginner problem
(https://www.reddit.com/r/beginnerprojects/comments/1i6sax/challenge_count_and_fix_green_eggs_and_ham/).
The challenge is to read through a file, replacing lower case 'i' with 'I' and writing a new corrected file.
I am at a point where the program reads the input file, replaces the relevant lower case characters, and writes a new corrected file. However, I need to also count the number of corrections.
I have looked through the .replace() documentation and I cannot see that it is possible to find out the number of replacements made. Is it possible to count corrections using the replace method?
def capitalize_i(file):
file = file.replace('i ', 'I ')
file = file.replace('-i-', '-I-')
return file
with open("green_eggs.txt", "r") as f_open:
file_1 = f_open.read()
file_2 = open("result.txt", "w")
file_2.write(capitalize_i(file_1))
You can just use the count function:
i_count = file.count('i ')
file = file.replace('i ', 'I ')
i_count += file.count('-i-')
file = file.replace('-i-', '-I-')
i_count will have the total amount of replacements made. You can also separate them by creating new variables if you want.
Related
I have many .txt files full of data that need to be read into python and then converted to one excel document. I have the code working to read the data, remove first 5 lines and last two lines and then also export all txt files into an excel file but the problem I'm running into now is that the TXT file has inconsistent use of white space so a simple white space delimiter is not working as it should.
Here is an example of the txt file.
2020MAY16 215015 2.0 1004.4 30.0 2.0 2.0 2.0 NO LIMIT OFF OFF OFF OFF -25.84 -32.50 CRVBND N/A -0.0 28.52 78.54 FCST GOES16 33.4
*This is all on one line in the text file
Id like to be able to take this and make it look like this,
2020MAY16, 215015, 2.0, 1004.4, 30.0, 2.0, 2.0, 2.0, NO LIMIT, OFF, OFF, OFF, OFF, -25.84, -32.50, CRVBND, N/A, -0.0, 28.52, 78.54, FCST, GOES16, 33.4,
I have added the portion of code below that grabs the file from the URL address the user enters, iterates through the amount of storms to change the URL text. This also removes the top 5 lines and bottom 2. So if anyone has any suggestion on adding commas in that would be great to allow for an easy conversion to CSV file later.
for i in range(1,10,1):
url = mod_string+str(i)+"L-list.txt"
storm_abrev = url[54:57:1] #Grabs the designator from the URL to allow for simplistic naming of files
File_Name = (storm_abrev)+".txt" #Names the file
print(url) #Prints URL to allow user to confirm URLS are correct
urllib.request.urlretrieve(url,File_Name) #Sends a request to the URL from above, grabs the file, and then saves it as the designator.txt
file = open(File_Name,"r")
Counter = 0
Content = file.read()
CoList = Content.split("\n")
for j in CoList:
if j:
Counter += 1
print("This is the number of lines in the file")
print(Counter)
Final_Count = (Counter-2)
print(Final_Count)
with open(File_Name,'r') as f:
new_lines = []
for idx, line in enumerate(f):
if idx in [x for x in range(5,Final_Count)]:
new_lines.append(line)
with open(File_Name,"w") as f:
for line in new_lines:
f.write(line)
Edit: Fixed the issue caught by #DarryIG.
Create a list of phrases that need to remain intact. Let's call it phrase_list.
Identify a character or string that will never be used in the input file. For example, here I am assuming that an underscore will never be found in the input file and assign it to the variable forbidden_str. We could also use something like %$#%$#^%&%_#^ - chances of something like that occurring is very rare.
Replace multiple spaces with single spaces. Then replace spaces in phrases to _ (forbidden_str). Then replace all spaces with commas. Finally, replace _s back to spaces.
You could also simplify the reading lines part of your code using readlines().
...
phrase_list = ['NO LIMIT']
forbidden_str = "_"
with open(File_Name,'r') as f:
new_lines = f.readlines()
new_lines = new_lines[5:Final_Count]
with open(File_Name,"w") as f:
for line in new_lines:
for phrase in phrase_list:
if phrase in line:
line = line.replace(phrase, phrase.replace(" ", forbidden_str))
line = line.replace(" ", " ") # replaces multiple spaces with single spaces
line = line.replace(" ", ",")
line = line.replace(forbidden_str, " ")
f.write(line)
Output:
2020MAY16,215015,2.0,1004.4,30.0,2.0,2.0,2.0,NO LIMIT,OFF,OFF,OFF,OFF,-25.84,-32.50,CRVBND,N/A,-0.0,28.52,78.54,FCST,GOES16,33.4,
Also, a quick suggestion. It's a good practice to name variables in lower case. For example, final_count instead of Final_Count. Upper cases are usually used for classes, instances, etc. It just helps in readability and debugging.
If I understand your code correctly, you have a list of lines where you want to replace a space with a comma followed by a space. In python you can do this very simple like this:
lines = [x.replace(" ", ", ") for x in lines]
Im pretty new to this and i was trying to write a program which counts the words in txt files. There is probably a better way of doing this, but this was the idea i came up with, so i wanted to go through with it. I just don´t understand, why i, or any variable, does´nt work for as an index for the string of the page, that i´m counting on...
Do you guys have a solution or should i just take a different approach?
page = open("venv\harrry_potter.txt", "r")
alphabet = "qwertzuiopüasdfghjklöäyxcvbnmßQWERTZUIOPÜASDFGHJKLÖÄYXCVBNM"
# Counting the characters
list_of_lines = page.readlines()
characternum = 0
textstr = "" # to convert the .txt file to string
for line in list_of_lines:
for character in line:
characternum += 1
textstr += character
# Counting the words
i = 0
wordnum = 1
while i <= characternum:
if textstr[i] not in alphabet and textstr[i+1] in alphabet:
wordnum += 1
i += 1
print(wordnum)
page.close()
Counting the characters and converting the .txt file to string is done a bit weird, because i thought the other way could be the source of the problem...
Can you help me please?
Typically you want to use split for simplistically counting words. They way you are doing it you will get right-minded as two words, or don't as 2 words. If you can just rely on spaces then you can just use split like this:
book = "Hello, my name is Inigo Montoya, you killed my father, prepare to die."
words = book.split()
print(f'word count = {len(words)}')
you can also use parameters to split to add more options if the given doesn't suit you.
https://pythonexamples.org/python-count-number-of-words-in-text-file/
You want to get the word count of a text file
The shortest code is this (that I could come up with):
with open('lorem.txt', 'r') as file:
print(len(file.read().split()))
First of for smaller files this is fine but this loads all of the data into the memory so not that great for large files. First of use a context manager (with), it helps with error handling an other stuff. What happens is you print the length of the whole file read and split by space so file.read() reads the whole file and returns a string, so you use .split() on it and it splits the whole string by space and returns a list of each word in between spaces so you get the lenght of that.
A better approach would be this:
word_count = 0
with open('lorem.txt', 'r') as file:
for line in file:
word_count += len(line.split())
print(word_count)
Because here the whole file is not saved into memory, you read each line separately and overwrite the previous in the memory. Here again for each line you split it by space and measure the length of the returned list, then add to the total word count. At the end simply print out the total word count.
Useful sources:
about with
Context Managers - Efficiently Managing Resources (to learn how they work a bit in detail) by Corey Schafer
.split() "docs"
After a long time researching and asking friends, I am still a dumb-dumb and don't know how to solve this.
So, for homework, we are supposed to define a function which accesses two files, the first of which is a text file with the following sentence, from which we are to calculate the word frequencies:
In a Berlin divided by the Berlin Wall , two angels , Damiel and Cassiel , watch the city , unseen and unheard by its human inhabitants .
We are also to include commas and periods: each single item has already been tokenised (individual items are surrounded by whitespaces - including the commas and periods). Then, the word frequencies must be entered into a new txt-file as "word:count", and in the order in which the words appear, i.e.:
In:1
a:1
Berlin:2
divided:1
etc.
I have tried the following:
def find_token_frequency(x, y):
with open(x, encoding='utf-8') as fobj_1:
with open(y, 'w', encoding='utf-8') as fobj_2:
fobj_1list = fobj_1.split()
unique_string = []
for i in fobj_1list:
if i not in unique_string:
unique_string.append(i)
for i in range(0, len(unique_string)):
fobj_2.write("{}: {}".format(unique_string[i], fobj_1list.count(unique_string[i])))
I am not sure I need to actually use .split() at all, but I don't know what else to do, and it does not work anyway, since it tells me I cannot split that object.
I am told:
Traceback (most recent call last):
[...]
fobj_1list = fobj_1.split()
AttributeError: '_io.TextIOWrapper' object has no attribute 'split'
When I remove the .split(), the displayed error is:
fobj_2.write("{}: {}".format(unique_string[i], fobj_1list.count(unique_string[i])))
AttributeError: '_io.TextIOWrapper' object has no attribute 'count'
Let's divide your problem into smaller problems so we can more easily solve this.
First we need to read a file, so let's do so and save it into a variable:
with open("myfile.txt") as fobj_1:
sentences = fobj_1.read()
Ok, so now we have your file as a string stored in sentences. Let's turn it into a list and count the occurrence of each word:
words = sentence.split(" ")
frequency = {word:words.count(word) for word in set(words)}
Here frequency is a dictionary where each word in the sentences is a key with the value being how many times they appear on the sentence. Note the usage of set(words). A set does not have repeated elements, that's why we are iterating over the set of words and not the word list.
Finally, we can save the word frequencies into a file
with open("results.txt", 'w') as fobj_2:
for word in frequency: fobj_2.write(f"{word}:{frequency[word]}\n")
Here we use f strings to format each line into the desired output. Note that f-strings are available for python3.6+.
I'm unable to comment as I don't have the required reputation, but the reason split() isn't working is because you're calling it on the file object itself, not a string. Try calling:
fobj_1list = fobj_1.readline().split()
instead. Also, when I ran this locally, I got an error saying that TypeError: 'encoding' is an invalid keyword argument for this function. You may want to remove the encoding argument from your function calls.
I think that should be enough to get you going.
The following script should do what you want.
#!/usr/local/bin/python3
def find_token_frequency(inputFileName, outputFileName):
# wordOrderList to maintain order
# dict to keep track of count
wordOrderList = []
wordCountDict = dict()
# read the file
inputFile = open(inputFileName, encoding='utf-8')
lines = inputFile.readlines()
inputFile.close()
# iterate over all lines in the file
for line in lines:
# and split them into words
words = line.split()
# now, iterate over all words
for word in words:
# and add them to the list and dict
if word not in wordOrderList:
wordOrderList.append(word)
wordCountDict[word] = 1
else:
# or increment their count
wordCountDict[word] = wordCountDict[word] +1
# store result in outputFile
outputFile = open(outputFileName, 'w', encoding='utf-8')
for index in range(0, len(wordOrderList)):
word = wordOrderList[index]
outputFile.write(f'{word}:{wordCountDict[word]}\n')
outputFile.close()
find_token_frequency("input.txt", "output.txt")
I changed your variable names a bit to make the code more readable.
I'm new to python and attempting to do an exercise where I open a txt file and then read the contents of it (probably straight forward for most but I will admit I am struggling a bit).
I opened my file and used .read() to read the file. I then proceeded to remove the file of any punctation.
Next I created a for loop. In this loop I began my using .split() and adding to an expression:
words = words + len(characters)
words being previously defined as 0 outside the loop and characters being what was split at the beginning of the loop.
Very long story short, the problem that I'm having now is that instead of adding the entire word to my counter, each individual character is being added. Anything I can do to fix that in my for loop?
my_document = open("book.txt")
readTheDocument = my_document.read
comma = readTheDocument.replace(",", "")
period = comma.replace(".", "")
stripDocument = period.strip()
numberOfWords = 0
for line in my_document:
splitDocument = line.split()
numberOfWords = numberOfWords + len(splitDocument)
print(numberOfWords)
A more Pythonic way is to use with:
with open("book.txt") as infile:
count = len(infile.read().split())
You've got to understand that by using .split() you are not really getting real grammatical words. You are getting word-like fragments. If you want proper words, use module nltk:
import nltk
with open("book.txt") as infile:
count = len(nltk.word_tokenize(infile.read()))
Just open the file and split to get the count of words.
file=open("path/to/file/name.txt","r+")
count=0
for word in file.read().split():
count = count + 1
print(count)
I am having a silly issue whereby I have a text file with user inputs structured as follows:
x = variable1
y = variable2
and so on. I want to grab the variables, to do this I was going to just import the text file and then grab out the UserInputs[2], UserInputs[5] etc. I have spent a lot of time reading through how to do this and the closest I got was initially with the csv package but this resulted in just getting the '=' signs when I printed it so I went back to just using the open command and readlines and then trying to iterate through the lines and splitting by ' '.
So far I have the following code:
Text_File_Import = open('USER_INPUTS.txt', 'r')
Text_lines = Text_File_Import.readlines()
for line in Text_lines:
User_Inputs = line.split(' ')
print User_Inputs
However this only outputs the first line from my text file (i.e I get 'x', '=', 'variable1'but it never enters the next line. How would I iterate this code through the imported text file?
I have bodged it a bit for the time being and rearranged the text file to be variable1 = x and so on. This way I can still import the variable and the x has the /n after it if I just import them with the folloing code:
def ReadTextFile(textfilename):
Text_File_Import = open(textfilename, 'r')
Text_lines = Text_File_Import.readlines()
User_Inputs = Text_lines[1].split(' ')
User_Inputs_clength = User_Inputs[0]
#print User_Inputs[2] + User_Inputs_clength
User_Inputs = Text_lines[2].split(' ')
User_Inputs_cradius = User_Inputs[0]
#print User_Inputs[2], ' ', User_Inputs_cradius
return User_Inputs_clength, User_Inputs_cradius
Thanks
I don't quite understand the question. If you want to store the variables:
As long as the variables in the text file are valid python syntax (eg. strings surrounded by parentheses), here is an easy but very insecure method:
file=open('file.txt')
exec(file.read())
It will store all the variables, with their names.
If you want to split a text file between the spaces:
file=open('file.txt')
output=file.read().split(' ')
And if you want to replace newlines with spaces:
file=open('file.txt')
output=file.read().replace('\n', ' ')
You have a lot in indentation issues. To read lines and split by space the below snippet should help.
Demo
with open('USER_INPUTS.txt', 'r') as infile:
data = infile.readlines()
for i in data:
print(i.split())