Ok, so I'm new to python and I'm currently taking the python for everybody course (py4e).
Our lesson 7.2 assignment is to do the following:
7.2 Write a program that prompts for a file name, then opens that file and reads through the file, looking for lines of the form:
X-DSPAM-Confidence: 0.8475
Count these lines and extract the floating point values from each of the lines and compute the average of those values and produce an output as shown below. Do not use the sum() function or a variable named sum in your solution.
You can download the sample data at http://www.py4e.com/code3/mbox-short.txt when you are testing below enter mbox-short.txt as the file name.
I can't figure it out. I keep getting this error
ValueError: float: Argument: . is not number on line 12
when I run this code (see screenshot): https://gyazo.com/a61768894299970692155c819509db54
Line 12 which is num = float(balue) + float(num) keeps acting up. When I remove the float from balue then I get another which says
"TypeError: cannot concatenate 'str' and 'float' objects on line 12".
Can arguments be converted into floats or is it only a string? That might be the problem but I don't know if it's true and even if it is I don't know how to fix my code after that.
Your approach was not so bad I would say. However, I do not get what you intended with for balue in linez, as this is iterating over the characters contained in linez. What you rather would have wanted float(linez). I came up with a close solution looking like this:
fname = raw_input("Enter file name: ")
print(fname)
count = 0
num = 0
with open(fname, "r") as ins:
for line in ins:
line = line.rstrip()
if line.startswith("X-DSPAM-Confidence:"):
num += float(line[20:])
count += 1
print(num/count)
This is only intended to get you on the right track and I have NOT verified the answer or the correctness of the script, as this should be contained your homework.
I realise this is answered - I am doing the same course and I had the same error message on that exact question. It was caused by the line being read as a non float and I had to read it as
number =float((line[19:26]))
By the way, the Python environment in the course is very sensitive to spaces in strings - I just got the right code and it rejected it since I had ": " and the correct answer was ":" - just spent half an hour on colons.
Just for the sake of it, here is the answer I reached, which has been accepted as the correct one. Hope you got there in the end.
# Use the file name mbox-short.txt as the file name
count = 0
average = 0
filename = input("Enter file name: ")
filehandle = open(filename)
for line in filehandle:
if not line.startswith("X-DSPAM-Confidence:") : continue
line = line.rstrip()
number =float((line[19:26]))
count = count + 1
average = average + number
average = (average / count)
average = float(average)
print("Average spam confidence:",average)
#Take input from user
fname = input("Please inser file name:")
#try and except
try:
fhand = open(fname)
except:
print("File can not be found", fname)
quit()
#count and total
count= 0
total = 0
for line in fhand:
line = line.rstrip()
if not line.startswith("X-DSPAM-Confidence:"):
continue
else:
line = line[20:]
uline = line.upper()
fline = float(uline)
count = count+1
total = total + fline
avr = total / count
print("Average spam confidence:" ,avr )
The short answer to your problem is in fixing the slicing of the string to extract the number.
If you are looking for colon character, ":" to find the floating number,
remember you have to slice the string with +1 added to the slicing. If you do not do that you end up getting-----> : 0.8475
which is not a number.
so, slice the string with +1 added to the starting index of the string and you shall have a fix.
Related
So I was able to fix the first issue with you guys help but now that th program is running without any errors, it's not calculating the average correctly and I'm not sure why. Here's what it looks like:
def calcAverage():
with open('numbers.dat', 'r') as numbers_file:
numbers = 0
amount = 0
for line in numbers_file:
amount = amount + float(line)
numbers += 1
average = amount / numbers
print("The average of the numbers in the file is:",average)
You're reassigning line after you check whether line is empty in the while. The while test is testing the previous line in the file.
So when you get to the end, you read a blank line and try to add it to the amount, but get an error.
You're also never adding the first line, since you read it before the loop and never add that to amount.
Use a for loop instead of while, it will stop automatically when it reaches the end.
def calcAverage():
with open('numbers.dat', 'r') as numbers_file:
numbers = 0
amount = 0
for line in numbers_file:
amount = amount + float(line)
numbers += 1
average = amount / numbers
print("The average of the numbers in the file is:",average)
If you do want to use a while loop, do it like this:
while True:
line = numbers_file.readline()
if not line:
break
# rest of loop
Error shows that you have empty string in line.
You can get the same error with float('')
You run code in wrong order - you read new line before converting previous line.
And you should strip line because it still have \n
You need
line = numbers_file.readline()
line = line.strip() # remove `\n` (and `\t` and spaces)
while line != '':
# convert current line
amount = amount + float(line)
numbers += 1
# read next line
line = numbers_file.readline()
line = line.strip() # remove `\n` (and `\t` and spaces)
You could also use for-loop for this
numbers = []
for line in numbers_file:
line = line.strip() # remove `\n` (and `\t` and spaces)
if line:
numbers.append( float(line) )
#else:
# break
average = sum(numbers) / len(numbers)
and this could be reduced to
numbers = [float(line) for line in numbers_file if line.strip() != '']
average = sum(numbers) / len(numbers)
Other answers illustrated how to read a file line-by-line.
Some of them dealt with issues like reaching end-of-file (EOF) and invalid input for conversion to float.
Since any I/O and UI can be expected to give invalid input, I want to stress on validation before processing the input as expected.
Validation
For each line read from an input console or file alike, you should consider validating.
What happens if the string is empty?
Converting an empty string input to float
What happens if the string contains special chars ? like commented by Mat
What happens if the number does not fit your expected float? (value range, decimal precision)
What happens if nothing read or reached end-of-file ? (empty file or EOF)
Advice: To make it robust, expect input errors and catch them.
(a) using try-except construct for error-handling in Python:
for line in numbers_file:
try:
amount = amount + float(line)
numbers += 1
except ValueError:
print "Read line was not a float, but: '{}'".format(line)
(b) testing on input ahead:
In a more simple way, you could also test manually using basic if statements like:
if line == "": # on empty string received
print("WARNING: read an empty line! This won't be used for calculating average.") # show problem and consequences
continue # stop here and continue with next for-iteration (jump to next line)
I am a beginner in python, and I have a problem when I want to read my numeric data file that contains more lines. My data in the input file contains rows that include a counter number, three float numbers, and finally, a character letter that all of them separated by space.it look like this:
1 12344567.143 12345678.154 1234.123 w
2 23456789.231 23413456.342 4321.321 f
I want to assign each item in the line to a specific parameter that I can use them to other steps.
like this "NO"=first item "X"=second item "Y"=third item "code"=forth item
I am trying to write it as follow:
f1=open('t1.txt','r')
line: float
for line in f1:
print(line.split(', ',4))
f1=float
select(line.split('')
nob: object(1)=int(line[1, 1])
cnt = 0
print(nob)
cnt +=1
but received more error each time when I run the program. Anyone can help me?
The error is probably due to the wrong indentation: in Python indentation is part of the syntax. It would be helpful if you also included the error message in your question.
How about this:
all_first_numbers = []
with open('t1.txt', 'r') as f:
for line in f:
values = line.split()
first_number = int(values[0])
second_number = float(values[1])
letter_code = values[4]
# If you want to save all the first numbers in one array:
all_first_numbers.append(first_number)
So I have a text file which contains a list of numbers which I want to create a running total of. I need the first number to add to the second number and then the third number to add to the newly created second value and so on...
Like this:
Old List
0.1
0.25
0.35
0.2
0.3
New List
0.35
0.7
0.9
1.2
Here is what I have so far
import itertools
from itertools import zip_longest
open('newfile.txt','w').writelines([ line for line in open("Test1.txt") if "WIDTH" in line])
open('newfile2.txt','w').writelines([ line for line in open("Test1.txt") if "DEPTH" in line])
with open('compiled.txt', 'w') as res, open("newfile.txt") as f1, open("newfile2.txt") as f2:
for line1, line2 in zip_longest(f1, f2, fillvalue=""):
res.write("{} : {}\n".format(line1.rstrip(), line2.rstrip()))
for line in open("compiled.txt"):
line = line.strip(', \n')
parts = line.split(":")
category = parts[0]
value = parts[1]
category2 = parts[2]
value2 = parts[3]
total = sum([int(num) for num in value])
print (total)
However it gives me this error:
total = sum([int(num) for num in value])
ValueError: invalid literal for int() with base 10: ' '
What am I doing wrong here? I am new to python so any help would be greatly appreciated.
The answer is right in the error message:
total = sum([int(num) for num in value])
ValueError: invalid literal for int() with base 10: ' '
num has a value of ' ', and you're trying to convert it to an integer with int(). Obviously int(' ') is problematic, so Python throws an error.
What this means is that you have a bug in your stripping and splitting. Your code suggests that the input file format is a little more complex than what you said. If you post the actual input file (with the colons and whatnot) I'd be happy to help debug that step.
There is a number of things wrong, so that it's hard to understand what you want & what's going on.
Firstly, I suggest that instead of complicated file operations, just paste the first 5 lines or so of your 'compiled.txt' in here as
numbers_string = """ 1 2
3 whatever
"""
previous_val = 0
for line in numbers_string.splitlines():
#whatever, I assume you know what you're doing
line = line.strip(', \n')
parts = line.split(":")
category = parts[0]
value = parts[1]
category2 = parts[2]
value2 = parts[3]
# Calc sum of previous total and current value:
total = previous_val + int(value)#don't you mean float?
previous_val= total
print( total )
Something like this?
The rest of the code you posted is a bit convoluted and it's hard to figure out what you're trying to, especially since we don't have the full data you're working with. But the answer to your specific question about just adding up the numbers from one file and writing them to another one is as follows:
f = open("old.txt", "r")
lines = f.read().split("\n")
f.close()
total = float(lines[0])
f = open("new.txt", "w")
for line in lines[1:]:
total += float(line)
f.write(str(total)+"\n")
f.close()
Input:txt file
iam working with file contains lot of data and i have to get some numbers after specified sentence then calc avrg of these number
# Use the file name mbox-short.txt as the file name
count = 0
fname = raw_input("Enter file name: ")
fh = open(fname)
for lines in fh:
if lines.startswith ("X-DSPAM-Confidence:"):
lines = float (lines [20:50])
count = count +1
print lines
print count
what i get from here is
0.8475
0.6178
0.6961
0.7565
0.7626
0.7556
0.7002
0.7615
0.7601
0.7605
0.6959
0.7606
0.7559
0.7605
0.6932
0.7558
0.6526
0.6948
0.6528
0.7002
0.7554
0.6956
0.6959
0.7556
0.9846
0.8509
0.9907
that loop get to lines start with that txt "X-DSPAM-Confidence:"
and strip it from 20:50 (end of it)
then get me to 2 things get the list of numbers needed and the count which will help later, now i need to sum number to calc avrg. the sum / count
how can i do that? looking for simplest way ever not a problem if i get long code
well i just improved code removed unwanted things sorry for that
print things not important but just to see what i am doing as i am a new to python
using your own code just keep track of the total and divide at the end:
count = 0
total = 0
fname = raw_input("Enter file name: ")
fh = open(fname)
for lines in fh:
if lines.startswith ("X-DSPAM-Confidence:"):
count += 1
total += float (lines [20:50])
print lines
print count
print(total/count)
If you need to store all the data then a list comp would be the best approach to store all the floats then sum and divide the length to get the average:
fname = raw_input("Enter file name: ")
with open(fname) as f:
all_data = [float(line[20:50]) for line in f if line.startswith ("X-DSPAM-Confidence:")]
avg = sum(all_data) / len(all_data)
print(all_data)
print(avg)
It would help if we saw a sample of your data, but you should be able to do this:
sum_lines = sum(lines)
avg_lines = sum_lines / count
sum() is a built in function which will sum an iterable.
I am also wondering why you are reassigning your value to lines when you do
lines = float (lines [20:50])
I would think if those are multiple comma separated floating point numbers, you would want to assign it to a list variable like float_list and then sum using the sum() function.
If you do not want to save the average, you could put a third print that says
print sum(float_list) / count
Updated to Reflect OP Update
Yes you definitely want to create a list. instead of lines = float (lines [20:50]) do this:
float_list = []
float_list = float(line[20:50])
A better way to do this would be to do it with list comprehension.
float_list = [float(lines[20:50] for lines in fh if lines.startwith("X-DSPAM-Confidence:")]
Update...
I think that I misunderstood your original use of the slice [20:50] as representing multiple numbers per line.
If it is only one number, then it would be this, which is basically the answer that Padraic Cunningham posted:
# Use the file name mbox-short.txt as the file name
fname = raw_input("Enter file name: ")
fh = open(fname)
float_list = [float(lines[20:50] for lines in fh if lines.startwith("X-DSPAM-Confidence:")]
list_sum = sum(float_list)
count = len(float_list)
list_avg = list_sum / count
For future reference, it is helpful to post an example of your input data along with your code and desired output in your original question.
Depends mostly on the formatting of your numbers, are they CSV, are they on their own line etc. In any case, a general solution:
As some of the comments have pointed out, your first loop will ruin your iterator, so I combined the two loops. The if else is redundant as if not will work fine.
count = 0
sum = 0
fname = raw_input("Enter file name: ")
fh = open(fname,'r')
lines = fh.readlines()
fh.close()
for line in lines:
if line.startswith ("X-DSPAM-Confidence:"):
continue
else:
line = float(line)
count = count +1
sum += line
avg = sum/count
print avg
so previously I created a code that prompts the user to enter 5 different test scores and then to have that list saved to test.txt. the code works wonderful!
this is my code:
scorefile=open('test.txt','w')
for count in range(5):
print('Please enter test scores')
score=input('Test score:')
scorefile.write(str(score)+'%' + '\n')
scorefile.close()
BUT now, i encounter a problem. I have the code to read the file. it works great! but when I try to get the average from the list, all I keep getting is 0.0. I've been reading thru my book about python to figure out how to make this work, but Im seriously stuck now. help?
here is my code:
scorefile=open('test.txt', 'r')
for line in scorefile:
print(line, end='')
score=scorefile
average = sum(score) / 5
print('The test average is', average)
scorefile.close()
This line, score=scorefile doesn't do what you think it does. In fact, it doesn't do anything useful at all.
Perhaps you want:
with open('test.txt') as scorefile:
scores = [int(line.replace('%','')) for line in scorefile]
average = sum(scores) / len(scores)
score=scorefile just assigns the file descriptor to score. It does not actually read the contents and assign them to the score variable as you expected.
You need to read the lines in the file, strip the '%' character, convert each line to an float(since they were percentages I am assuming), sum them up and take the average.
Like this:
with open('input') as in_file:
data = in_file.readlines()
average = sum([float(a.strip('%')) for a in data]) / len(data)
print(average)
[float(a.strip('%')) for a in data] is a short hand notation (also called list comprehension) for doing:
a_list = []
for a in data:
a_list.append(float(a.strip('%')))