Python Runtime error counting words in file - python

After being faced with a syntax error for some noticeable time and having realised I made a foolish mistake, I proceeded to correct my way only to encounter a runtime error. So far I'm trying to produce a program which is able to read the amount of words from a file, however, instead of counting the number of words the program seems to count the number of letters which is not benefital for the outcome of my program. Please find the appropriate code below. Thanks for any and all contributions!
def GameStage02():
global FileSelection
global ReadFile
global WordCount
global WrdCount
FileSelection = filedialog.askopenfilename(filetypes=(("*.txt files", ".txt"),("*.txt files", "")))
with open(FileSelection, 'r') as file:
ReadFile = file.read()
SelectTextLabel.destroy()
WrdCount=0
for line in ReadFile:
Words=line.split()
WrdCount=WrdCount+len(Words)
print(WrdCount)
GameStage01Button.config(state=NORMAL)

Let's break it down:
ReadFile = file.read() will give you a string.
for line in ReadFile will iterate over the characters in that string.
Words=line.split() will give you a list with one or zero characters in it.
That's probably not what you want. Change
ReadFile = file.read()
to
ReadFile = file.readlines()
This will give you a list of lines, which you can iterate over and/or split into lists of words.
In addition, note that file is not a good variable name (in Python2), because that's already the name of a builtin.

As a continuation of timgeb's answer, here is a working piece of code that does this:
import re
#open file.txt, read and
#split the file content with \n character as the delimiter(basically as lines)
lines = open('file.txt').read().splitlines()
count = 0
for line in lines:
#split the line with whitespace delimiter and get the list of words in the line
words = re.split(r'\s', line)
count += len(words)
print count

Related

Want to remove \n from len [duplicate]

This question already has answers here:
How to read a file line-by-line into a list?
(28 answers)
Closed 1 year ago.
here is my task:
You are provided a books.txt file, which includes the book titles, each one written on a separate line.
Read the title one by one and output the code for each book on a separate line.
For example, if the books.txt file contains:
Some book
Another book
Your program should output:
S9
A12
file = open("/usercode/files/books.txt", "r")
with file as f:
lines = f.readlines()
for i in lines:
count = len(i)
count = str(count - 1)
print(i[0]+count)
file.close()
and this outputs everything correct but the last line because i[#lastLine] is done after the last count if that makes any sense?(could be completely wrong I am learning)
Basically I want to know where I am going wrong in my code. I believe it is the way I structured the for i in lines part and should have handled the \n in a different way to
count = len(i)
count = str(count - 1)
ANSWER
Thank you for informing me, adding i = i.strip() strips new lines aka \n which sorted the problem!
Working code:
file = open("/usercode/files/books.txt", "r")
with file as f:
lines = f.readlines()
for i in lines:
i = i.strip('\n') #Strips new lines aka \n
count = str(len(i))
print(i[0]+count)
file.close()
You can use strip() on i to remove the new lines.
strip - Returns a copy of the string with the leading and trailing characters removed.
Python docs strip
you can cast your count as a str() to print.
str()
Returns a string containing a printable representation of an object.
Python docs str
As comments point out, suggest also changing i to line for readability.
file = open("books.txt", "r")
with file as f:
lines = f.readlines()
for line in lines:
count = str(len(line.strip()))
print(line[0]+count)
file.close()

File Reading and Variable Assignments

So I am trying to make a game where the 'GameMaster' picks the first word from a .txt file, then the user tries to guess the word. Once the user correctly guess the word, the GameMaster looks to the next line in the file and the user has to guess again, so on and so forth...
The problem I am having, is getting the program to assign variables as the game continues. The program should iteratively look until there are no more words to choose from, whether that be 2 or infinity.
Since I don't have much experience working with file interaction in python, the best example I have is something like this:
file "input.txt" will contain:
dog
cat
bird
rat
mouse
And I am looking at what in in the .txt file with this:
def file_read():
with open ('/Users/someone/Desktop/input.txt', 'r') as myfile:
data = myfile.read()
for line in data:
line.rstrip()
return data
Your function returns the entire contents of the file, unaltered. myfile.read() returns the data from the file as a string. The for loop then iterates over every character in that string, not the lines. Furthermore, rstrip() operates only on each character. It does not affect the contents of data because data is an immutable string and the return value of rstrip() is not stored anywhere.
Something like this would better suit:
def file_read():
with open('/Users/someone/Desktop/input.txt') as myfile:
return [line.rstrip() for line in myfile]
This will return a list of the stripped lines from the file. Your word guessing code would then iterate over the list.
The above will work, however, it is not very efficient if the input file is large because all of the file would be read into memory to construct the list. A better way is to use a generator which yields a stripped line one at a time:
def file_read():
with open('/Users/someone/Desktop/input.txt') as myfile:
for line in myfile:
yield line.rstrip()
Now that function is so simple, it seems pointless to bother with it. Your code could simply be:
with open('/Users/someone/Desktop/input.txt') as myfile:
for line in myfile:
user_guess_word(line.rstrip())
where user_guess_word() is a function that interacts with the user to guess what the word is, and returns once the guess it correct.
This way uses readlines to get file contents in a list line by line. readlines returns a list containing lines.
Now iterate through list to check if user input matches with line content (which is a word in this case).
with open ('/Users/someone/Desktop/input.txt', 'r') as myfile:
words = myfile.readlines()
while x < len(words):
if words[x] == input('Enter word to guess'):
print('Predicted word correctly')
else:
print('Wrong word. Try again')
x -= 1
x += 1
You can do it like,
def fun():
data = open('filename', 'r').readlines()
user_guess, i = None, 0
while i < len(data):
user_guess = input()
if user_guess not None and user_guess == data[i]:
i = i + 1
Please trim() / strip() also while you compare user_guess and data[i]

Replace words of a long document in Python

I have a dictionary dict with some words (2000) and I have a huge text, like Wikipedia corpus, in text format. For each word that is both in the dictionary and in the text file, I would like to replace it with word_1.
with open("wiki.txt",'r') as original, open("new.txt",'w') as mod:
for line in original:
new_line = line
for word in line.split():
if (dict.get(word.lower()) is not None):
new_line = new_line.replace(word,word+"_1")
mod.write(new_line)
This code creates a new file called new.txt with the words that appear in the dictionary replaced as I want.
This works for short files, but for the longer that I am using as input, it "freezes" my computer.
Is there a more efficient way to do that?
Edit for Adi219:
Your code seems working, but there is a problem:
if a line is like that: Albert is a friend of Albert and in my dictionary I have Albert, after the for cycle, the line will be like this:Albert_1_1 is a friend of Albert_1. How can I replace only the exact word that I want, to avoid repetitions like _1_1_1_1?
Edit2:
To solve the previous problem, I changed your code:
with open("wiki.txt", "r") as original, open("new.txt", "w") as mod:
for line in original:
words = line.split()
for word in words:
if dict.get(word.lower()) is not None:
mod.write(word+"_1 ")
else:
mod.write(word+" ")
mod.write("\n")
Now everything should work
A few things:
You could remove the declaration of new_line. Then, change new_line = new_line.replace(...) line with line = line.replace(...). You would also have to write(line) afterwards.
You could add words = line.split() and use for word in words: for the for loop, as this removes a call to .split() for every iteration through the words.
You could (manually(?)) split your large .txt file into multiple smaller files and have multiple instances of your program running on each file, and then you could combine the multiple outputs into one file. Note: You would have to remember to change the filename for each file you're reading/writing to.
So, your code would look like:
with open("wiki.txt", "r") as original, open("new.txt", "w") as mod:
for line in original:
words = line.split()
for word in words:
if dict.get(word.lower()) is not None:
line = line.replace(word, word + "_1")
mod.write(line)

Python 3.4.3: Iterating over each line and each character in each line in a text file

I have to write a program that iterates over each line in a text file and then over each character in each line in order to count the number of entries in each line.
Here is a segment of the text file:
N00000031,B,,D,D,C,B,D,A,A,C,D,C,A,B,A,C,B,C,A,C,C,A,B,D,D,D,B,A,B,A,C,B,,,C,A,A,B,D,D
N00000032,B,A,D,D,C,B,D,A,C,C,D,,A,A,A,C,B,D,A,C,,A,B,D,D
N00000033,B,A,D,D,C,,D,A,C,B,D,B,A,B,C,C,C,D,A,C,A,,B,D,D
N00000034,B,,D,,C,B,A,A,C,C,D,B,A,,A,C,B,A,B,C,A,,B,D,D
The first and last lines are "unusable lines" because they contain too many entries (more or less than 25). I would like to count the amount of unusable lines in the file.
Here is my code:
for line in file:
answers=line.split(",")
i=0
for i in answers:
i+=1
unusable_line=0
for line in file:
if i!=26:
unusable_line+=1
print("Unusable lines in the file:", unusable_line)
I tried using this method as well:
alldata=file.read()
for line in file:
student=alldata.split("\n")
answer=student.split(",")
My problem is each variable I create doesn't exist when I try to run the program. I get a "students" is not defined error.
I know my coding is awful but I'm a beginner. Sorry!!! Thank you and any help at all is appreciated!!!
A simplified code for your method using list,count and if condition
Code:
unusable_line = 0
for line in file:
answers = line.strip().split(",")
if len(answers) < 26:
unusable_line += 1
print("Unusable lines in the file:", unusable_line)
Notes:
Initially I have created a variable to store count of unstable lines unusable_line.
Then I iterate over the lines of the file object.
Then I split the lines at , to create a list.
Then I check if the count of list is less then 26. If so I increment the unusable_line varaiable.
Finally I print it.
You could use something like this and wrap it into a function. You don't need to re-iterate the items in the line, str.split() returns a list[] that has your elements in it, you can count the number of its elements with len()
my_file = open('temp.txt', 'r')
lines_count = usable = ununsable = 0
for line in my_file:
lines_count+=1
if len(line.split(',')) == 26:
usable+=1
else:
ununsable+=1
my_file.close()
print("Processed %d lines, %d usable and %d ununsable" % (lines_count, usable, ununsable))
You can do it much shorter:
with open('my_fike.txt') as fobj:
unusable = sum(1 for line in fobj if len(line.split(',')) != 26)
The line with open('my_fike.txt') as fobj: opens the file for reading and closes it automatically after leaving the indented block. I use a generator expression to go through all lines and add up all that have a length different from 26.

How do I remove duplicate entries in my output file in Python?

I'm very new to Python. I am trying to extract data from a text file in the format:
85729 block addressing index approximate text retrieval
85730 automatic query expansion based divergence
etc...
The output text file is a list of the words but with no duplicate entries. The text file that is input can have duplicates. The output will look like this:
block
addressing
index
approximate
etc....
With my code so far, I am able to get the list of words but the duplicates are included. I try to check for duplicates before I enter a word into the output file but the output does not reflect that. Any suggestions? My code:
infile = open("paper.txt", 'r')
outfile = open("vocab.txt", 'r+a')
lines = infile.readlines()
for i in lines:
thisline = i.split()
for word in thisline:
digit = word.isdigit()
found = False
for line in outfile:
if word in line:
found = True
break
if (digit == False) and (found == False ):
outfile.write(word);
outfile.write("\n");
I don't understand how for loops are closed in Python. In C++ or Java, the curly braces can be used to define the body of a for loop but I'm not sure how its done in Python. Can anyone help?
Python loops are closed by dedenting; the whitespace on the left has semantic meaning. This saves you from furiously typing curly braces or do/od or whatever, and eliminates a class of errors where your indentation accidentally doesn't reflect your control flow accurately.
Your input doesn't appear to be large enough to justify a loop over your output file (and if it did I'd probably use a gdbm table anyway), so you can probably do something like this (tested very briefly):
#!/usr/local/cpython-3.3/bin/python
with open('/etc/crontab', 'r') as infile, open('output.txt', 'w') as outfile:
seen = set()
for line in infile:
for word in line.split():
if word not in seen:
seen.add(word)
outfile.write('{}\n'.format(word))

Categories

Resources