The question for the code is attached below .
Write a program that prompts for a file name, then opens that file and reads through the file, looking for lines of the form:
X-DSPAM-Confidence: 0.8475
Count these lines and extract the floating point values from each of the lines and compute the average of those values and produce an output as shown below. Do not use the sum() function or a variable named sum in your solution.
You can download the sample data at http://www.py4e.com/code3/mbox-short.txt when you are testing below enter mbox-short.txt as the file name.
fname = input("Enter file name: ")
fh = open(fname,'r')
count=0
val=0
for line in fh:
if line.find("X-DSPAM-Confidence:")==-1:
continue
else:
count+=1
pos=line.find(':')
val+=float(line[pos+1:]
res=float(val/count)
print('Average spam confidence: ',res)
fh.close()
I am getting an error stating "bad input on line 13".
can anyone help me out as to why this is happening? Thank you in advance
You're missing a closing parentheses. It should be "val+=float(line[pos+1:])"
The code below works (with the text example you have provided)
import requests
total = 0
cnt = 0
r = requests.get('https://www.py4e.com/code3/mbox-short.txt')
if r.status_code == 200:
lines = r.text.split('\n')
length = len('X-DSPAM-Confidence:')
for line in lines:
idx = line.find('X-DSPAM-Confidence:')
if idx != -1:
cnt += 1
val = float(line[length + 1:])
total += val
res = float(total / cnt)
print('Average spam confidence: ', res)
Related
I have started my code and am on at a very good start, however, I have come to a road block when it comes to adding sum, average, minimum, and maximum to my code, I'm sure this is a pretty easy fix to someone who knows what there are doing. Any help would be greatly appreciated. The numbers in my file are 14, 22, and -99.
Here is my code so far:
def main ():
contents=''
try:
infile = openFile()
count, sum = readFile(infile)
closeFile(infile)
display(count, sum)
except IOError:
print('Error, input file not opened properly')
except ValueError:
print('Error, data within the file is corrupt')
def openFile():
infile=open('numbers.txt', 'r')
return infile
def readFile(inf):
count = 0
sum = 0
line = inf.readline()
while line != '':
number = int(line)
sum += number
count += 1
line = inf.readline()
return count, sum
def closeFile(inF):
inF.close()
def display(count, total):
print('count = ', count)
print('Sum = ', total)
main()
In the while line!=' ': statement, it will iterate one-one single element in the file, i.e. it will add 1+4 and break the loop when we get " " according to your example. Instead, you can use .split() function and use for loop. Your code (Assuming that all numbers are in a single line):
def read_file():
f=open("numbers.txt","r")
line=f.readline()
l=[int(g) for g in line.split(",")] #there should be no gap between number and comma
s=sum(l)
avg=sum(l)/len(l)
maximum=max(l)
minimum=min(l)
f.close()
return s, avg, maximum, minimum
read_file()
Your code contains a number of antipatterns: you apparently tried to structure it OO-like but without using a class... But this:
line = inf.readline()
while line != '':
number = int(line)
sum += number
count += 1
line = inf.readline()
is the worst part and probably the culprit.
Idiomatic Python seldom use readline and just iterate the file object, but good practices recommend to strip input lines to ignore trailing blank characters:
for line in inf:
if line.strip() == '':
break
sum += number
count += 1
Write a program that prompts for a file name, then opens that file and reads through the file, looking for lines of the form:
X-DSPAM-Confidence: 0.8475
Count these lines, extract the floating point values from each of the lines, and compute the average of those values and produce an output as shown below. Do not use the sum() function or a variable named sum in your solution.*
This is my code:
fname = input("Enter a file name:",)
fh = open(fname)
count = 0
# this variable is to add together all the 0.8745's in every line
num = 0
for ln in fh:
ln = ln.rstrip()
count += 1
if not ln.startswith("X-DSPAM-Confidence: ") : continue
for num in fh:
if ln.find(float(0.8475)) == -1:
num += float(0.8475)
if not ln.find(float(0.8475)) : break
# problem: values aren't adding together and gq variable ends up being zero
gq = int(num)
jp = int(count)
avr = (gq)/(jp)
print ("Average spam confidence:",float(avr))
The problem is when I run the code it says there is an error because the value of num is zero. So I then receive this:
ZeroDivisionError: division by zero
When I change the initial value of num to None a similar problem occurs:
int() argument must be a string or a number, not 'NoneType'
This is also not accepted by the python COURSERA autograder when I put it at the top of the code:
from __future__ import division
The file name for the sample data they have given us is "mbox-short.txt". Here's a link http://www.py4e.com/code3/mbox-short.txt
I edited your code like below. I think your task is to find numbers next to X-DSPAM-Confidence:. And i used your code to identify the X-DSPAM-Confidence: line. Then I splitted the string by ':' then I took the 1st index and I converted to float.
fname = input("Enter a file name:",)
fh = open(fname)
count = 0
# this variable is to add together all the 0.8745's in every line
num = 0
for ln in fh:
ln = ln.rstrip()
if not ln.startswith("X-DSPAM-Confidence:") : continue
count+=1
num += float(ln.split(":")[1])
gq = num
jp = count
avr = (gq)/(jp)
print ("Average spam confidence:",float(avr))
Open files using with, so the file is automatically closed.
See the in-line comments.
Desired lines are in the form X-DSPAM-Confidence: 0.6961, so split them on the space.
'X-DSPAM-Confidence: 0.6961'.split(' ') creates a list with the number is at list index 1.
fname = input("Enter a file name:",)
with open(fname) as fh:
count = 0
num = 0 # collect and add each found value
for ln in fh:
ln = ln.rstrip()
if not ln.startswith("X-DSPAM-Confidence:"): # find this string or continue to next ln
continue
num += float(ln.split(' ')[1]) # split on the space and add the float
count += 1 # increment count for each matching line
avr = num / count # compute average
print(f"Average spam confidence: {avr}") # print value
I've been attempting this assignment but I've encountered a few problems which I am still unable to resolve. Firstly, I am unable to collect the correct sum of numbers from the text so my average value is very off. Secondly, for line 14 it does feel quite strange to have to define my sum as a string before changing it back to float, although it does not give me a Traceback. Lastly, the questions states to not use the sum() function but I'm having trouble not using it. If possible, I would like to understand what is the rationale behind the question restricting us from using the sum() function.
Some help would be greatly appreciated!
file name: https://www.py4e.com/code3/mbox-short.txt , input should be mbox-short.txt
P.S : I added the count as the final output just to see how many lines did it register.
Assignment :
Write a program that prompts for a file name, then opens that file and reads through the file, looking for lines of the form:
X-DSPAM-Confidence: 0.8475
Count these lines and extract the floating point values from each of the lines and compute the average of those values and produce an output as shown below. Do not use the sum() function or a variable named sum in your solution.
You can download the sample data at http://www.py4e.com/code3/mbox-short.txt when you are testing below enter mbox-short.txt as the file name.
fname =input("Enter file name: ")
fhand = open(fname)
for lx in fhand :
if not lx.startswith("X-DSPAM-Confidence:") :
continue
ly = lx.replace("X-DSPAM-Confidence:"," ")
ly = ly.strip()
def avg():
sum = 0
count = 0
count = count
for values in ly :
count = count + 1
sum = str(sum) + values
return print("Average spam confidence:", count, float(sum) / count)
avg()
I have made some changes with your code. Store each float numbers into a list and iterate over this list when you perform addition operation to find the total sum.
fname =input("Enter file name: ")
fhand = open(fname)
num_list = []
for lx in fhand :
if not lx.startswith("X-DSPAM-Confidence:") :
continue
ly = lx.replace("X-DSPAM-Confidence:","")
num_list.append(float(ly))
def avg():
total = 0
count = 0
for values in num_list:
count = count + 1
total += values
return print("Average spam confidence:", count, total / count)
avg()
Output:
Average spam confidence: 27 0.7507185185185187
This worked For Me
summition = 0
fname =input("Enter file name: ")
count = 0
fhand = open(fname)
for lx in fhand :
if not lx.startswith("X-DSPAM-Confidence:") :
continue
ly = lx.replace("X-DSPAM-Confidence:"," ")
ly = ly.strip()
summition += float(ly)
count = count + 1
fhand.close()
print("Average Spam " + str(count)+ " " + str(summition/count))
Bad Code Hints :-
Always Close File Handle
return print() // Returns None
The main issue is I cannot identify what is causing the code to produce this value. It is supposed to read the values in the text file and then calculate the average confidence of the values. But I've recieved repeated errors. the one here and another which states 'could not convert string into float' if I have it tell me which line it will be the first one.
I'm using Repl.it to run python and it is v3 of it. I've tried doing this on my computer I get similar results, however, it is very hard to read the error so I've moved it there to see better.
# Asks usr input
usrin = input("Enter in file name: ")
# establishes variabls
count = 0
try:
fmbox = open(usrin, 'r')
rd = fmbox.readlines()
# loops through each line and reads the file
for line in rd:
# line that is being read
fmLen = len(rd)
srchD = rd.find("X-DSPAM-Confidence: ")
fmNum = rd[srchD + 1:fmLen] # extracts numeric val
fltNum = float(fmNum.strip().replace(' ', ''))
#only increments if there is a value
if (fltNum > 0.0):
count += 1
total = fltNum + count
avg = total / count
print("The average confiedence is: ", avg)
print("lines w pattern ", count)
The return should be the average of the numbers stripped from the file and the count of how many had values above 0.
if you need to view the txt file here it is http://www.pythonlearn.com/code3/mbox.txt
There are several problems with your code:
you're using string methods like find() and strip() on the list rd instead of parsing the individual line.
find() returns the lowest index of the substring if there's a match (since "X-DSPAM-Confidence: " seems to occur at the beginning of the line in the text file, it will return index 0), otherwise it returns -1. However, you're not checking the return value (so you're always assuming there's a match), and rd[srchD + 1:fmLen] should be line[srchD + len("X-DSPAM-Confidence: "):fmLen-1] since you want to extract everything after the substring till the end of the line.
count and total are not defined, although they might be somewhere else in your code
with total = fltNum + count, you're replacing the total at each iteration with fltNum + count... you should be adding fltNum to the total every time a match is found
Working implementation:
try:
fmbox = open('mbox.txt', 'r')
rd = fmbox.readlines()
# loops through each line and reads the file
count = 0
total = 0.0
for line in rd:
# line that is being read
fmLen = len(line)
srchD = line.find("X-DSPAM-Confidence: ")
# only parse the confidence value if there is a match
if srchD != -1:
fmNum = line[srchD + len("X-DSPAM-Confidence: "):fmLen-1] # extracts numeric val
fltNum = float(fmNum.strip().replace(' ', ''))
#only increment if value if non-zero
if fltNum > 0.0:
count += 1
total += fltNum
avg = total / count
print("The average confidence is: ", avg)
print("lines w pattern ", count)
except Exception as e:
print(e)
Output:
The average confidence is: 0.8941280467445736
lines w pattern 1797
Demo: https://repl.it/#glhr/55679157
I need to make it so rather than this function relying on an parameters from the user when they call the function, it instead gets called, and then prompts the user to enter a FILE name for it to read (ex. they enter "dna.txt"), and then prompts them to enter a mink and a maxk and then it runs through the code of going through this file and finding the most common substring within the given mink and maxk. This is my current code:
def mostCommonSubstring(dna, mink, maxk):
count = 0
check = 0
answer = ""
k = mink
while k <= maxk:
for i in range(len(dna)-k+1):
sub = dna[i:i+k]
count = 0
for i in range(len(dna)-k+1):
if dna[i:i+k] == sub:
count = count + 1
if count >= check:
answer = sub
check = count
k=k+1
print(answer)
print(check)
I am under the impression that is needs to look something like this (but this code doesn't work?):
def mostCommonSubstring():
dnaFile = input("Enter file: ")
dna = open(dnaFile, "r")
mink = input("Enter a min: ")
maxk = input("Enter a max: ")
count = 0
check = 0
answer = ""
k = mink
while k <= maxk:
for i in range(len(dna)-k+1):
sub = dna[i:i+k]
count = 0
for i in range(len(dna)-k+1):
if dna[i:i+k] == sub:
count = count + 1
if count >= check:
answer = sub
check = count
k=k+1
print(answer)
print(check)
(The DNA file is a large file that contains many many a, g, t, and c, sequences. I wanted to be able to have the user input this file along with a min and max and then have the program find the longest common string.)
I know I have a high chance of being wrong here but I'll try to help anyway.
As a beginner I examined your code, and I think you could more use something like this:
with open(dna, 'r') as dnaFile:
# Do whatever you want here...
this will let you to run the file as a string.
IF I am not wrong, your problem was that you indeed opened the file, but you have not actually read it into a string. Thus you tried to access a file as if its contents we're already pressed into a string.
EDIT:
You could also do something like:
dna = open(dnaFile, 'r') # This is from your code.
dnaString = dna.read()
This way you also would read the file's content into a string and continue to run on your code.
Good luck and best regards!