How to count number of occurred exceptions and print it? - python

I'm trying to do something. I want to open multiple files and count the words in it for example, but I want to know how many of files couldn't be open.
Its what I tried:
i = 0
def word_count(file_name):
try:
with open(file_name) as f:
content = f.read()
except FileNotFoundError:
pass
i = 0
i += 1
else:
words = content.split()
word_count = len(words)
print(f'file {file_name} has {word_count} words.')
file_name = ['data1.txt','a.txt','data2w.txt','b.txt','data3w.txt','data4w.txt']
for names in file_name:
word_count(names)
print(len(file_name) - i , 'files weren\'t found')
print (i)
So, I get this error:
runfile('D:/~/my')
file data1.txt has 13 words.
file data2w.txt has 24 words.
file data3w.txt has 21 words.
file data4w.txt has 108 words.
Traceback (most recent call last):
File "D:\~\my\readtrydeffunc.py", line 27, in <module>
print(len(file_name) - i , 'files weren\'t found')
NameError: name 'i' is not defined
I tried something else also, but I think I don't understand the meaning of scopes well. I think its because i is assigned out of except scope, but when I assign i = 0 in except scope, I can't print it at the end, because it will be destroyed after execution.

Yes, you're on the right track. You need to define and increment i outside the function, or pass the value through the function, increment, and return the new value. Defining i outside the function is more common, and more Pythonic.
def count_words(file_name):
with open(file_name) as f:
content = f.read()
words = content.split()
word_count = len(words)
#print(f'file {file_name} has {word_count} words.')
return word_count
file_name = ['data1.txt','a.txt','data2w.txt','b.txt','data3w.txt','data4w.txt']
i = 0
for names in file_name:
try:
result = count_words(names)
except FileNotFoundError:
i += 1
print(i, 'files weren\'t found')

I would recommend breaking this into 2 functions; One to handle the word counting and a second to control the flow of the script. The control one should handle any errors that arise as well as handle and the feedback from said errors.
def word_count(file_name):
with open(file_name) as f:
content = f.read()
words = content.split()
word_count = len(words)
print(f'file {file_name} has {word_count} words.')
def file_parser(files):
i = 0
for file in files:
try:
word_count(file)
except FileNotFoundError:
i+=1
if i > 0:
print(f'{i} files were not found')
file_names = ['data1.txt','a.txt','data2w.txt','b.txt','data3w.txt','data4w.txt']
file_parser(file_names)

While refactoring your code to not use global variables should be the preferred approach (see edit for a possible refactoring), the minimal modification to get your code running is to remove pass and i = 0 within the except clause, and ask i to be used globally inside your function:
def word_count(file_name):
global i # use a `i` variable defined globally
try:
with open(file_name) as f:
content = f.read()
except FileNotFoundError:
i += 1 # increment `i` when the file is not found
else:
words = content.split()
word_count = len(words)
print(f'file {file_name} has {word_count} words.')
i = 0
file_name = ['data1.txt','a.txt','data2w.txt','b.txt','data3w.txt','data4w.txt']
for names in file_name:
word_count(names)
print(i, 'files weren\'t found')
Note that i will contain the number of files not found.
EDIT
A reasonably refactored code could look something like:
def word_count(filepath):
result = 0
with open(filepath) as file_obj:
for line in file_obj:
result += len(line.split())
return result
def process_files(filepaths):
result = {}
num_missing = 0
for filepath in filepaths:
try:
num_words = word_count(filepath)
except FileNotFoundError:
num_missing += 1
else:
result[filepath] = num_words
return result, num_missing
filenames = [
'data1.txt', 'a.txt', 'data2w.txt', 'b.txt', 'data3w.txt', 'data4w.txt']
wordcounts, num_missing = process_files(filenames)
for filepath, num_words in wordcounts.items():
print(f'File {filepath} has {num_words} words.')
print(f'{i} files weren\'t found')
Notes:
the word_count() function now only does one thing: word counting. This is done on a line by line basis to better handle potentially long files, which could fill the memory if loaded at once.
the process_files() function extract the essential information and stores them in a dict
all the printing of the results is done in one place, and could be easily wrapped up in a main() function.
num_missing (formerly i, circa) is now a local variable.
Finally note that while explicitly counting the number of exception is one way, the other being just getting this information by subtracting the number of elements in result from the number of input filepaths.
This could be done anywhere, there is no need to do this in process_files().

Related

Python: Counting words from a directory of txt files and writing word counts to a separate txt file

New to Python and I'm trying to count the words in a directory of text files and write the output to a separate text file. However, I want to specify conditions. So if word count is > 0 is would like to write the count and file path to one file and if the count is == 0. I would like to write the count and file path to a separate file. Below is my code so far. I think I'm close, but I'm hung up on how to do the conditions and separate files. Thanks.
import sys
import os
from collections import Counter
import glob
stdoutOrigin=sys.stdout
sys.stdout = open("log.txt", "w")
def count_words_in_dir(dirpath, words, action=None):
for filepath in glob.iglob(os.path.join("path", '*.txt')):
with open(filepath) as f:
data = f.read()
for key,val in words.items():
#print("key is " + key + "\n")
ct = data.count(key)
words[key] = ct
if action:
action(filepath, words)
def print_summary(filepath, words):
for key,val in sorted(words.items()):
print(filepath)
if val > 0:
print('{0}:\t{1}'.format(
key,
val))
filepath = sys.argv[1]
keys = ["x", "y"]
words = dict.fromkeys(keys,0)
count_words_in_dir(filepath, words, action=print_summary)
sys.stdout.close()
sys.stdout=stdoutOrigin
I would strongly urge you to not repurpose stdout for writing data to a file as part of the normal course of your program. I also wonder how you can ever have a word "count < 0". I assume you meant "count == 0".
The main problem that your code has is in this line:
for filepath in glob.iglob(os.path.join("path", '*.txt')):
The string constant "path" I'm pretty sure doesn't belong there. I think you want filepath there instead. I would think that this problem would prevent your code from working at all.
Here's a version of your code where I fixed these issues and added the logic to write to two different output files based on the count:
import sys
import os
import glob
out1 = open("/tmp/so/seen.txt", "w")
out2 = open("/tmp/so/missing.txt", "w")
def count_words_in_dir(dirpath, words, action=None):
for filepath in glob.iglob(os.path.join(dirpath, '*.txt')):
with open(filepath) as f:
data = f.read()
for key, val in words.items():
# print("key is " + key + "\n")
ct = data.count(key)
words[key] = ct
if action:
action(filepath, words)
def print_summary(filepath, words):
for key, val in sorted(words.items()):
whichout = out1 if val > 0 else out2
print(filepath, file=whichout)
print('{0}: {1}'.format(key, val), file=whichout)
filepath = sys.argv[1]
keys = ["country", "friend", "turnip"]
words = dict.fromkeys(keys, 0)
count_words_in_dir(filepath, words, action=print_summary)
out1.close()
out2.close()
Result:
file seen.txt:
/Users/steve/tmp/so/dir/data2.txt
friend: 1
/Users/steve/tmp/so/dir/data.txt
country: 2
/Users/steve/tmp/so/dir/data.txt
friend: 1
file missing.txt:
/Users/steve/tmp/so/dir/data2.txt
country: 0
/Users/steve/tmp/so/dir/data2.txt
turnip: 0
/Users/steve/tmp/so/dir/data.txt
turnip: 0
(excuse me for using some search words that were a bit more interesting than yours)
Hello I hope I understood your question correctly, this code will count how many different words are in your file and depending on the conditions will do something you want.
import os
all_words = {}
def count(file_path):
with open(file_path, "r") as f:
# for better performance it is a good idea to go line by line through file
for line in f:
# singles out all the words, by splitting string around spaces
words = line.split(" ")
# and checks if word already exists in all_words dictionary...
for word in words:
try:
# ...if it does increment number of repetitions
all_words[word.replace(",", "").replace(".", "").lower()] += 1
except Exception:
# ...if it doesn't create it and give it number of repetitions 1
all_words[word.replace(",", "").replace(".", "").lower()] = 1
if __name__ == '__main__':
# for every text file in your current directory count how many words it has
for file in os.listdir("."):
if file.endswith(".txt"):
all_words = {}
count(file)
n = len(all_words)
# depending on the number of words do something
if n > 0:
with open("count1.txt", "a") as f:
f.write(file + "\n" + str(n) + "\n")
else:
with open("count2.txt", "a") as f:
f.write(file + "\n" + str(n) + "\n")
if you want to count same word multiple times you can add up all values from dictionary or you can eliminate try-except block and count every word there.

How to use better 'try' and 'except' in my code?

now i have this code and i need to use better the function try and except and improve the code, like which parts i should change of place
this is the beginning of my code:
contador = 0
name = input("Put the name of the file:")
while name != "close":
validation=0
try:
file = open(name,"r",1,"utf-8")
validation = validation + 1
except FileNotFoundError:
validation = validation
if validation >= 1:
Games=[]
countrylist = []
lines = 0
File = open(name,"r")
line = File.readline().strip()
while line != "":
parts= line.split(";")
country=parts[0]
game= parts[1]
sales= int(parts[2])
price= float(parts[3])
format= parts[4]
Games.append(parts)
countrylist.append(country)
line = File.readline().strip()
lines = lines + 1
contador = contador + 1
I don't know exactly the objective of the code, however.
I had to work out how would the file be structured by the code Correct me if I'm wrong but I believe that the file is meant to have a list of parameters separated by ";" and each line being an entry in that list.
You do nothing with the data, in any case just breaking the file into a list of parameters and sending said list of lists back would be enough for a function and then you could do the separation later
So that I could see that the code was doing what I wanted I added a print at the end to get the result
This is the code I ended with I tried to explain most of the issues in comment (probably a bad idea and I shall be berated by this till the end of ages)
# Why is there a global counter
# contador = 0
name = None # you need to declare the name before the loop
# check if the name is empty instead of an arbitrary name
while name != "":
name = input("Put the name of the file:")
# have the call defenition of the name in the loop so you can run the
# loop until the anme is "" (nothing)
# otherwhise if you don't break on the catch block it will loop forever
# since the name will be constant inside the loop
try:
File = open(file=name,encoding="utf-8").read()
# when using a function and you don't want to use the arguments
Games=[]
countrylist = []
# lines = 0
lst = File.strip().split("\n") # break the whole text into lines
for line in lst: # iterate over the list of lines
# seperate it into a list of data
parts= line.strip().split(";") #make each line into a list that you can adress
# elem[0] -> county
countrylist.append(parts[0]) # here you can just append directly isntead of saving extra variables
# same as the previous example
Games.append(parts[1])
sales= int(parts[2])
price= float(parts[3].replace(",","."))
style = parts[4] # format is already an existing function you shoudn't name your variable like that
# line = File.readline().strip() -> you don't need to prepare the next line since all lines are
# already in the array lst
# lines += 1
# contador += 1
# you don't need to count the lines let the language do that for you
# and why do you need a counter in the first place
# you were using no for loops or doing any logic based around the number of lines
# the only logic you were doing is based on their
print(parts)
except FileNotFoundError as e0:
print("File not found: " + str(e0))
except ValueError as e1 :
print("Value Error: " + str(e1))
For a text file with the format:
Portugal;Soccer;1000;12.5;dd/mm/yyyy
England;Cricket;2000;13,5;mm/dd/yyyy
Spain;Ruggby;1500;11;yyyy/dd/mm
I got an output in the form of:
['Portugal', 'Soccer', '1000', '12.5', 'dd/mm/yyyy']
['England', 'Cricket', '2000', '13,5', 'mm/dd/yyyy']
['Spain', 'Ruggby', '1500', '11', 'yyyy/dd/mm']

How do I perform error handling with two files?

So , I am having two files , so to checks its validity I am performing try and except two times . But I don't thinks this is a good method, can you suggest a better way?
Here is my code:
def form_density_dictionary(self,word_file,fp_exclude):
self.freq_dictionary={}
try:
with open(fp_exclude,'r')as fp2:
words_excluded=fp2.read().split() #words to be excluded stored in a list
print("**Read file successfully :" + fp_exclude + "**")
words_excluded=[words.lower() for words in words_excluded] # converted to lowercase
except IOError:
print("**Could not read file:", fp_exclude, " :Please check file name**")
sys.exit()
try:
with open(word_file,'r') as file:
print("**Read file successfully :" + word_file + "**")
words_list=file.read()
if not words_list:
print("**No data in file:",word_file +":**")
sys.exit()
words_list=words_list.split()
words_list=[words.lower() for words in words_list] # lowercasing entire list
unique_words=list((set(words_list)-set(words_excluded)))
self.freq_dictionary= {word:("%6.2f"%(float((words_list.count(word))/len(words_list))*100)) for word in unique_words}
#print((len(self.freq_dictionary)))
except IOError:
print("**Could not read file:", word_file, " :Please check file name**")
sys.exit()
Any other suggestion is also welcomed to make it more pythonic.
The first thing that jumps out is the lack of consistency and readability: in some lines you indent with 4 spaces, on others you only use two; in some places you put a space after a comma, in others you don't, in most places you don't have spaces around the assignment operator (=)...
Be consistent and make your code readable. The most commonly used formatting is to use four spaces for indenting and to always have a space after a comma but even more important than that is to be consistent, meaning that whatever you choose, stick with it throughout your code. It makes it much easier to read for everyone, including yourself.
Here are a few other things I think you could improve:
Have a single exception handling block instead of two.
You can also open both files in a single line.
Even better, combine both previous suggestions and have a separate method to read data from the files, thus eliminating code repetition and making the main method easier to read.
For string formatting it's preferred to use .format() instead of %. Check this out: https://pyformat.info/
Overall try to avoid repetition in your code. If there's something you're doing more than once, extract it to a separate function or method and use that instead.
Here's your code quickly modified to how I'd probably write it, and taking these things into account:
import sys
class AtifImam:
def __init__(self):
self.freq_dictionary = {}
def form_density_dictionary(self, word_file, exclude_file):
words_excluded = self.read_words_list(exclude_file)
words_excluded = self.lowercase(words_excluded)
words_list = self.read_words_list(word_file)
if len(words_list) == 0:
print("** No data in file: {} **".format(word_file))
sys.exit()
words_list = self.lowercase(words_list)
unique_words = list((set(words_list) - set(words_excluded)))
self.freq_dictionary = {
word: ("{:6.2f}".format(
float((words_list.count(word)) / len(words_list)) * 100))
for word in unique_words
}
#staticmethod
def read_words_list(file_name):
try:
with open(file_name, 'r') as file:
data = file.read()
print("** Read file successfully: {} **".format(file_name))
return data.split()
except IOError as e:
print("** Could not read file: {0.filename} **".format(e))
sys.exit()
#staticmethod
def lowercase(word_list):
return [word.lower() for word in word_list]
Exceptions thrown that involve a file system path have a filename attribute that can be used instead of explicit attributes word_file and fp_exclude as you do.
This means you can wrap these IO operations in the same try-except and use the exception_instance.filename which will indicate in which file the operation couldn't be performed.
For example:
try:
with open('unknown_file1.py') as f1, open('known_file.py') as f2:
f1.read()
f2.read()
except IOError as e:
print("No such file: {0.filename}".format(e))
Eventually prints out:
No such file: unknown_file1.py
While the opposite:
try:
with open('known_file.py') as f1, open('unknown_file2.py') as f2:
f1.read()
f2.read()
except IOError as e:
print("No such file: {0.filename}".format(e))
Prints out:
No such file: unknown_file2.py
To be more 'pythonic' you could use something what is callec Counter, from collections library.
from collections import Counter
def form_density_dictionary(self, word_file, fp_exclude):
success_msg = '*Read file succesfully : {filename}'
fail_msg = '**Could not read file: {filename}: Please check filename'
empty_file_msg = '*No data in file :{filename}:**'
exclude_read = self._file_open(fp_exclude, success_msg, fail_msg, '')
exclude = Counter([word.lower() for word in exclude_read.split()])
word_file_read = self._file_open(word_file, success_msg, fail_msg, empty_file_msg)
words = Counter([word.lower() for word in word_file_read.split()])
unique_words = words - excluded
self.freq_dictionary = {word: '{.2f}'.format(count / len(unique_words))
for word, count in unique_words.items()}
Also it would be better if you would just create the open_file method, like:
def _open_file(self, filename, success_msg, fails_msg, empty_file_msg):
try:
with open(filename, 'r') as file:
if success_msg:
print(success_msg.format(filename= filename))
data = file.read()
if empty_file_msg:
print(empty_file_msg.format(filename= filename))
return data
except IOError:
if fail_msg:
print(fail_msg.format(filename= filename))
sys.exit()

Calculating the average in python

Am Writing a program that prompts for a file name, then opens that file and reads through the file, looking for lines of the form:
X-DSPAM-Confidence: 0.8475
I want to count these lines and extract the floating point values from each of the lines and compute the average of those values. Can I please get some help. I just started programming so I need something very simple. This is the code I have already written.
fname = raw_input("Enter file name: ")
if len(fname) == 0:
fname = 'mbox-short.txt'
fh = open(fname,'r')
count = 0
total = 0
#Average = total/num of lines
for line in fh:
if not line.startswith("X-DSPAM-Confidence:"): continue
count = count+1
print line
Try:
total += float(line.split(' ')[1])
so that total / count gives you the answer.
Iterate over the file (using the context manager ("with") handles the closing automatically), looking for such lines (like you did), and then read them in like this:
fname = raw_input("Enter file name:")
if not fname:
fname = "mbox-short.txt"
scores = []
with open(fname) as f:
for line in f:
if not line.startswith("X-DSPAM-Confidence:"):
continue
_, score = line.split()
scores.append(float(score))
print sum(scores)/len(scores)
Or a bit more compact:
mean = lambda x: sum(x)/len(x)
with open(fname) as f:
result = mean([float(l.split()[1]) if line.startswith("X-DSPAM-Confidence:") for l in f])
A program like the following should satisfy your needs. If you need to change what the program is looking for, just change the PATTERN variable to describe what you are trying to match. The code is written for Python 3.x but can be adapted for Python 2.x without much difficulty if needed.
Program:
#! /usr/bin/env python3
import re
import statistics
import sys
PATTERN = r'X-DSPAM-Confidence:\s*(?P<float>[+-]?\d*\.\d+)'
def main(argv):
"""Calculate the average X-DSPAM-Confidence from a file."""
filename = argv[1] if len(argv) > 1 else input('Filename: ')
if filename in {'', 'default'}:
filename = 'mbox-short.txt'
print('Average:', statistics.mean(get_numbers(filename)))
return 0
def get_numbers(filename):
"""Extract all X-DSPAM-Confidence values from the named file."""
with open(filename) as file:
for line in file:
for match in re.finditer(PATTERN, line, re.IGNORECASE):
yield float(match.groupdict()['float'])
if __name__ == '__main__':
sys.exit(main(sys.argv))
You may also implement the get_numbers generator in the following way if desired.
Alternative:
def get_numbers(filename):
"""Extract all X-DSPAM-Confidence values from the named file."""
with open(filename) as file:
yield from (float(match.groupdict()['float'])
for line in file
for match in re.finditer(PATTERN, line, re.IGNORECASE))

inputting a words.txt file python 3

I am stuck why the words.txt is not showing the full grid, below is the tasks i must carry out:
write code to prompt the user for a filename, and attempt to open the file whose name is supplied. If the file cannot be opened the user should be asked to supply another filename; this should continue until a file has been successfully opened.
The file will contain on each line a row from the words grid. Write code to read, in turn, each line of the file, remove the newline character and append the resulting string to a list of strings.After the input is complete the grid should be displayed on the screen.
Below is the code i have carried out so far, any help would be appreciated:
file = input("Enter a filename: ")
try:
a = open(file)
with open(file) as a:
x = [line.strip() for line in a]
print (a)
except IOError as e:
print ("File Does Not Exist")
Note: Always avoid using variable names like file, list as they are built in python types
while True:
filename = raw_input(' filename: ')
try:
lines = [line.strip() for line in open(filename)]
print lines
break
except IOError as e:
print 'No file found'
continue
The below implementation should work:
# loop
while(True):
# don't use name 'file', it's a data type
the_file = raw_input("Enter a filename: ")
try:
with open(the_file) as a:
x = [line.strip() for line in a]
# I think you meant to print x, not a
print(x)
break
except IOError as e:
print("File Does Not Exist")
You need a while loop?
while True:
file = input("Enter a filename: ")
try:
a = open(file)
with open(file) as a:
x = [line.strip() for line in a]
print (a)
break
except IOError:
pass
This will keep asking untill a valid file is provided.

Categories

Resources