How to find and extract values from a txt file? - python

Write a program that prompts for a file name, then opens that file and reads through the file, looking for lines of the form:
X-DSPAM-Confidence: 0.8475
Count these lines, extract the floating point values from each of the lines, and compute the average of those values and produce an output as shown below. Do not use the sum() function or a variable named sum in your solution.*
This is my code:
fname = input("Enter a file name:",)
fh = open(fname)
count = 0
# this variable is to add together all the 0.8745's in every line
num = 0
for ln in fh:
ln = ln.rstrip()
count += 1
if not ln.startswith("X-DSPAM-Confidence: ") : continue
for num in fh:
if ln.find(float(0.8475)) == -1:
num += float(0.8475)
if not ln.find(float(0.8475)) : break
# problem: values aren't adding together and gq variable ends up being zero
gq = int(num)
jp = int(count)
avr = (gq)/(jp)
print ("Average spam confidence:",float(avr))
The problem is when I run the code it says there is an error because the value of num is zero. So I then receive this:
ZeroDivisionError: division by zero
When I change the initial value of num to None a similar problem occurs:
int() argument must be a string or a number, not 'NoneType'
This is also not accepted by the python COURSERA autograder when I put it at the top of the code:
from __future__ import division
The file name for the sample data they have given us is "mbox-short.txt". Here's a link http://www.py4e.com/code3/mbox-short.txt

I edited your code like below. I think your task is to find numbers next to X-DSPAM-Confidence:. And i used your code to identify the X-DSPAM-Confidence: line. Then I splitted the string by ':' then I took the 1st index and I converted to float.
fname = input("Enter a file name:",)
fh = open(fname)
count = 0
# this variable is to add together all the 0.8745's in every line
num = 0
for ln in fh:
ln = ln.rstrip()
if not ln.startswith("X-DSPAM-Confidence:") : continue
count+=1
num += float(ln.split(":")[1])
gq = num
jp = count
avr = (gq)/(jp)
print ("Average spam confidence:",float(avr))

Open files using with, so the file is automatically closed.
See the in-line comments.
Desired lines are in the form X-DSPAM-Confidence: 0.6961, so split them on the space.
'X-DSPAM-Confidence: 0.6961'.split(' ') creates a list with the number is at list index 1.
fname = input("Enter a file name:",)
with open(fname) as fh:
count = 0
num = 0 # collect and add each found value
for ln in fh:
ln = ln.rstrip()
if not ln.startswith("X-DSPAM-Confidence:"): # find this string or continue to next ln
continue
num += float(ln.split(' ')[1]) # split on the space and add the float
count += 1 # increment count for each matching line
avr = num / count # compute average
print(f"Average spam confidence: {avr}") # print value

Related

Bad input warning in python

The question for the code is attached below .
Write a program that prompts for a file name, then opens that file and reads through the file, looking for lines of the form:
X-DSPAM-Confidence: 0.8475
Count these lines and extract the floating point values from each of the lines and compute the average of those values and produce an output as shown below. Do not use the sum() function or a variable named sum in your solution.
You can download the sample data at http://www.py4e.com/code3/mbox-short.txt when you are testing below enter mbox-short.txt as the file name.
fname = input("Enter file name: ")
fh = open(fname,'r')
count=0
val=0
for line in fh:
if line.find("X-DSPAM-Confidence:")==-1:
continue
else:
count+=1
pos=line.find(':')
val+=float(line[pos+1:]
res=float(val/count)
print('Average spam confidence: ',res)
fh.close()
I am getting an error stating "bad input on line 13".
can anyone help me out as to why this is happening? Thank you in advance
You're missing a closing parentheses. It should be "val+=float(line[pos+1:])"
The code below works (with the text example you have provided)
import requests
total = 0
cnt = 0
r = requests.get('https://www.py4e.com/code3/mbox-short.txt')
if r.status_code == 200:
lines = r.text.split('\n')
length = len('X-DSPAM-Confidence:')
for line in lines:
idx = line.find('X-DSPAM-Confidence:')
if idx != -1:
cnt += 1
val = float(line[length + 1:])
total += val
res = float(total / cnt)
print('Average spam confidence: ', res)

Having some problem trying to troubleshoot my code on an assignment on python 3

I've been attempting this assignment but I've encountered a few problems which I am still unable to resolve. Firstly, I am unable to collect the correct sum of numbers from the text so my average value is very off. Secondly, for line 14 it does feel quite strange to have to define my sum as a string before changing it back to float, although it does not give me a Traceback. Lastly, the questions states to not use the sum() function but I'm having trouble not using it. If possible, I would like to understand what is the rationale behind the question restricting us from using the sum() function.
Some help would be greatly appreciated!
file name: https://www.py4e.com/code3/mbox-short.txt , input should be mbox-short.txt
P.S : I added the count as the final output just to see how many lines did it register.
Assignment :
Write a program that prompts for a file name, then opens that file and reads through the file, looking for lines of the form:
X-DSPAM-Confidence: 0.8475
Count these lines and extract the floating point values from each of the lines and compute the average of those values and produce an output as shown below. Do not use the sum() function or a variable named sum in your solution.
You can download the sample data at http://www.py4e.com/code3/mbox-short.txt when you are testing below enter mbox-short.txt as the file name.
fname =input("Enter file name: ")
fhand = open(fname)
for lx in fhand :
if not lx.startswith("X-DSPAM-Confidence:") :
continue
ly = lx.replace("X-DSPAM-Confidence:"," ")
ly = ly.strip()
def avg():
sum = 0
count = 0
count = count
for values in ly :
count = count + 1
sum = str(sum) + values
return print("Average spam confidence:", count, float(sum) / count)
avg()
I have made some changes with your code. Store each float numbers into a list and iterate over this list when you perform addition operation to find the total sum.
fname =input("Enter file name: ")
fhand = open(fname)
num_list = []
for lx in fhand :
if not lx.startswith("X-DSPAM-Confidence:") :
continue
ly = lx.replace("X-DSPAM-Confidence:","")
num_list.append(float(ly))
def avg():
total = 0
count = 0
for values in num_list:
count = count + 1
total += values
return print("Average spam confidence:", count, total / count)
avg()
Output:
Average spam confidence: 27 0.7507185185185187
This worked For Me
summition = 0
fname =input("Enter file name: ")
count = 0
fhand = open(fname)
for lx in fhand :
if not lx.startswith("X-DSPAM-Confidence:") :
continue
ly = lx.replace("X-DSPAM-Confidence:"," ")
ly = ly.strip()
summition += float(ly)
count = count + 1
fhand.close()
print("Average Spam " + str(count)+ " " + str(summition/count))
Bad Code Hints :-
Always Close File Handle
return print() // Returns None

How to prompt the user to input a file into a function along with a min and max?

I need to make it so rather than this function relying on an parameters from the user when they call the function, it instead gets called, and then prompts the user to enter a FILE name for it to read (ex. they enter "dna.txt"), and then prompts them to enter a mink and a maxk and then it runs through the code of going through this file and finding the most common substring within the given mink and maxk. This is my current code:
def mostCommonSubstring(dna, mink, maxk):
count = 0
check = 0
answer = ""
k = mink
while k <= maxk:
for i in range(len(dna)-k+1):
sub = dna[i:i+k]
count = 0
for i in range(len(dna)-k+1):
if dna[i:i+k] == sub:
count = count + 1
if count >= check:
answer = sub
check = count
k=k+1
print(answer)
print(check)
I am under the impression that is needs to look something like this (but this code doesn't work?):
def mostCommonSubstring():
dnaFile = input("Enter file: ")
dna = open(dnaFile, "r")
mink = input("Enter a min: ")
maxk = input("Enter a max: ")
count = 0
check = 0
answer = ""
k = mink
while k <= maxk:
for i in range(len(dna)-k+1):
sub = dna[i:i+k]
count = 0
for i in range(len(dna)-k+1):
if dna[i:i+k] == sub:
count = count + 1
if count >= check:
answer = sub
check = count
k=k+1
print(answer)
print(check)
(The DNA file is a large file that contains many many a, g, t, and c, sequences. I wanted to be able to have the user input this file along with a min and max and then have the program find the longest common string.)
I know I have a high chance of being wrong here but I'll try to help anyway.
As a beginner I examined your code, and I think you could more use something like this:
with open(dna, 'r') as dnaFile:
# Do whatever you want here...
this will let you to run the file as a string.
IF I am not wrong, your problem was that you indeed opened the file, but you have not actually read it into a string. Thus you tried to access a file as if its contents we're already pressed into a string.
EDIT:
You could also do something like:
dna = open(dnaFile, 'r') # This is from your code.
dnaString = dna.read()
This way you also would read the file's content into a string and continue to run on your code.
Good luck and best regards!

How to read an input file of integers separated by a space using readlines in Python 3?

I need to read an input file (input.txt) which contains one line of integers (13 34 14 53 56 76) and then compute the sum of the squares of each number.
This is my code:
# define main program function
def main():
print("\nThis is the last function: sum_of_squares")
print("Please include the path if the input file is not in the root directory")
fname = input("Please enter a filename : ")
sum_of_squares(fname)
def sum_of_squares(fname):
infile = open(fname, 'r')
sum2 = 0
for items in infile.readlines():
items = int(items)
sum2 += items**2
print("The sum of the squares is:", sum2)
infile.close()
# execute main program function
main()
If each number is on its own line, it works fine.
But, I can't figure out how to do it when all the numbers are on one line separated by a space. In that case, I receive the error: ValueError: invalid literal for int() with base 10: '13 34 14 53 56 76'
You can use file.read() to get a string and then use str.split to split by whitespace.
You'll need to convert each number from a string to an int first and then use the built in sum function to calculate the sum.
As an aside, you should use the with statement to open and close your file for you:
def sum_of_squares(fname):
with open(fname, 'r') as myFile: # This closes the file for you when you are done
contents = myFile.read()
sumOfSquares = sum(int(i)**2 for i in contents.split())
print("The sum of the squares is: ", sumOfSquares)
Output:
The sum of the squares is: 13242
You are trying to turn a string with spaces in it, into an integer.
What you want to do is use the split method (here, it would be items.split(' '), that will return a list of strings, containing numbers, without any space this time. You will then iterate through this list, convert each element to an int as you are already trying to do.
I believe you will find what to do next. :)
Here is a short code example, with more pythonic methods to achieve what you are trying to do.
# The `with` statement is the proper way to open a file.
# It opens the file, and closes it accordingly when you leave it.
with open('foo.txt', 'r') as file:
# You can directly iterate your lines through the file.
for line in file:
# You want a new sum number for each line.
sum_2 = 0
# Creating your list of numbers from your string.
lineNumbers = line.split(' ')
for number in lineNumbers:
# Casting EACH number that is still a string to an integer...
sum_2 += int(number) ** 2
print 'For this line, the sum of the squares is {}.'.format(sum_2)
You could try splitting your items on space using the split() function.
From the doc: For example, ' 1 2 3 '.split() returns ['1', '2', '3'].
def sum_of_squares(fname):
infile = open(fname, 'r')
sum2 = 0
for items in infile.readlines():
sum2 = sum(int(i)**2 for i in items.split())
print("The sum of the squares is:", sum2)
infile.close()
Just keep it really simple, no need for anything complicated. Here is a commented step by step solution:
def sum_of_squares(filename):
# create a summing variable
sum_squares = 0
# open file
with open(filename) as file:
# loop over each line in file
for line in file.readlines():
# create a list of strings splitted by whitespace
numbers = line.split()
# loop over potential numbers
for number in numbers:
# check if string is a number
if number.isdigit():
# add square to accumulated sum
sum_squares += int(number) ** 2
# when we reach here, we're done, and exit the function
return sum_squares
print("The sum of the squares is:", sum_of_squares("numbers.txt"))
Which outputs:
The sum of the squares is: 13242

converting float to sum in python

fname = input("Enter file name: ")
count=0
fh = open(fname)
for line in fh:
if not line.startswith("X-DSPAM-Confidence:") : continue
count=count+1
halo=line.find("0")
gh=line[halo:]
tg=gh.rstrip()
ha=float(tg)
total=0
for value in range(ha):
total=total+value
print total
its like a list of decimal number in file ok
0.1235
0.1236
0.1678
I convert it into float where 'tg' have not an array like a list
ha=float(tg)
total=0
for value in range(ha):
total=total+value
print total
error: start must be an integer
I know it's a mistake of using range what should I use instead of range?
If you want to get a sum of floats, just use the code:
fname = input("Enter file name: ")
count = 0
total = 0
fh = open(fname)
for line in fh:
if not line.startswith("X-DSPAM-Confidence:"): continue
count += 1
halo = line.find("0")
gh = line[halo:]
tg = gh.rstrip()
ha = float(tg)
total += ha
print total
You are passing a float as argument to range, which does not make sense. range returns a list with n elements when n is the only argument of range. For example:
>>> range(3)
[0, 1, 2]
So you can see that range of a float does not make sense.
If I understand your code correctly, I think you want to replace:
for value in range(ha):
total=total+value
By
total += ha
On a separate note, and trying not to be too pedantic, I am pretty impressed by how many principles of PEP 8 your code violates. You may think it's not a big deal, but if you care, I would suggest you read it (https://www.python.org/dev/peps/pep-0008/)

Categories

Resources