So I was able to fix the first issue with you guys help but now that th program is running without any errors, it's not calculating the average correctly and I'm not sure why. Here's what it looks like:
def calcAverage():
with open('numbers.dat', 'r') as numbers_file:
numbers = 0
amount = 0
for line in numbers_file:
amount = amount + float(line)
numbers += 1
average = amount / numbers
print("The average of the numbers in the file is:",average)
You're reassigning line after you check whether line is empty in the while. The while test is testing the previous line in the file.
So when you get to the end, you read a blank line and try to add it to the amount, but get an error.
You're also never adding the first line, since you read it before the loop and never add that to amount.
Use a for loop instead of while, it will stop automatically when it reaches the end.
def calcAverage():
with open('numbers.dat', 'r') as numbers_file:
numbers = 0
amount = 0
for line in numbers_file:
amount = amount + float(line)
numbers += 1
average = amount / numbers
print("The average of the numbers in the file is:",average)
If you do want to use a while loop, do it like this:
while True:
line = numbers_file.readline()
if not line:
break
# rest of loop
Error shows that you have empty string in line.
You can get the same error with float('')
You run code in wrong order - you read new line before converting previous line.
And you should strip line because it still have \n
You need
line = numbers_file.readline()
line = line.strip() # remove `\n` (and `\t` and spaces)
while line != '':
# convert current line
amount = amount + float(line)
numbers += 1
# read next line
line = numbers_file.readline()
line = line.strip() # remove `\n` (and `\t` and spaces)
You could also use for-loop for this
numbers = []
for line in numbers_file:
line = line.strip() # remove `\n` (and `\t` and spaces)
if line:
numbers.append( float(line) )
#else:
# break
average = sum(numbers) / len(numbers)
and this could be reduced to
numbers = [float(line) for line in numbers_file if line.strip() != '']
average = sum(numbers) / len(numbers)
Other answers illustrated how to read a file line-by-line.
Some of them dealt with issues like reaching end-of-file (EOF) and invalid input for conversion to float.
Since any I/O and UI can be expected to give invalid input, I want to stress on validation before processing the input as expected.
Validation
For each line read from an input console or file alike, you should consider validating.
What happens if the string is empty?
Converting an empty string input to float
What happens if the string contains special chars ? like commented by Mat
What happens if the number does not fit your expected float? (value range, decimal precision)
What happens if nothing read or reached end-of-file ? (empty file or EOF)
Advice: To make it robust, expect input errors and catch them.
(a) using try-except construct for error-handling in Python:
for line in numbers_file:
try:
amount = amount + float(line)
numbers += 1
except ValueError:
print "Read line was not a float, but: '{}'".format(line)
(b) testing on input ahead:
In a more simple way, you could also test manually using basic if statements like:
if line == "": # on empty string received
print("WARNING: read an empty line! This won't be used for calculating average.") # show problem and consequences
continue # stop here and continue with next for-iteration (jump to next line)
Related
I'm in a first year CompSci class and learning Python, so bear with me here. The assignment is to open a file using Python, read it, and find the maximum value in that file without using any built in functions or concepts we haven't discussed in class. I can read through the file and get the values, but my issue is that my code will consider the value "30" to instead be "3" and "0" instead of thirty. This is what I have so far:
def maxValueInFile(fileName):
inputFile = open(fileName, "r")
currentMax = int(0)
for value in inputFile.read():
if value.isdigit():
value = int(value)
if value > currentMax:
currentMax = value
inputFile.close()
return currentMax
When I run the file, it won't return a number higher than 9, presumably becaus
If you want to read digit by digit, you can build up the real numbers by continously multiplying the temporary result by 10, and then adding the value of the last digit read.
def maxValueInFile(fileName):
inputFile = open(fileName, "r")
currentMax = int(0)
number = 0
for value in inputFile.read():
if value.isdigit(): # again found a digit
number = number * 10 + int(value)
else: # found a non-digit
if number > currentMax:
currentMax = number
number = 0
if number > currentMax: # in case the for-loop ended with a digit, the last number still needs to be tested
currentMax = number
inputFile.close()
return currentMax
You're trying to do too much in this piece of your code (which I'll comment so you can see where it's going wrong):
# inputFile.read() returns something like "30\n40\n50"
for value in inputFile.read():
# now value is something like "3" or "0"
if value.isdigit():
value = int(value)
# and now it's 3 or 0
There's no benefit to splitting the string up into digits -- so don't do that. :)
currentMax = 0
for line in inputFile.readLine():
value = int(line)
if value > currentMax:
currentMax = value
Note that this code will raise a ValueError exception if line isn't convertible to an int!
A couple of style notes (which I've applied to the code above):
You don't need to say int(0) because 0 is already an int all on its own.
When you're converting types, it's better to assign a new variable to the new type (when you start using static type checking, this is required, so it's a good habit to get into). In the code above I called the line of text we read from the file line to help me remember that it's a line of text (i.e. a str), and then I name the numeric value value to help me remember that that's the actual number that I can use in a comparison (i.e. an int).
inputFile.read() returns a string, which gets broken into characters when you iterate over it. Assuming your input file has each value on a separate line, you want inputFile.read().splitlines().
However, here's how I'd write it, with notes:
def maxValueInFile(fileName):
currentMax = 0
with open(fileName) as inputFile: # Avoid manually opening and closing files
for line in inputFile: # Avoid reading the whole file
# If we can assume the file only contains numbers,
# we can skip the ".isdigit" check.
value = int(line) # "int()" doesn't care about whitespace (newline)
if value > currentMax:
currentMax = value
return currentMax
This is a subroutine which reads from studentNamesfile.txt
def calculate_average():
'''Calculates and displays average mark.'''
test_results_file = open('studentNamesfile.txt', 'r')
total = 0
num_recs = 0
line = ' '
while line != '':
line = test_results_file.readline()
# Convert everything after the delimiting pipe character to an integer, and add it to total.
total += int(line[line.find('|') + 1:])
num_recs += 1
test_results_file.close()
[num_recs holds the number of records read from the file.]
The format of studentNamesfile.txt is as follows:
Student 01|10
Student 02|20
Student 03|30
and so on. This subroutine is designed to read the mark for all the student records in the file, but I get this error when it runs:
Traceback (most recent call last):
File "python", line 65, in <module>
File "python", line 42, in calculate_average
ValueError: invalid literal for int() with base 10: ''
This error is pretty explicit, but I can't figure out why it's being thrown. I tried tracing the value of line[line.find('|') + 1:], but Python insists it has the correct value (e.g. 10) when I use print(line[line.find('|') + 1:] on the previous line. What's wrong?
Update: I'm considering the possibility that line[line.find('|') + 1:] includes the newline, which is breaking int(). But using line[line.find('|') + 1:line.find('\\')] doesn't fix the problem - the same error is thrown.
Here:
while line != '':
line = test_results_file.readline()
When you hit the end of the file, .readline() returns an empty string, but since this happens after the while line != '' test, you still try to process this line.
The canonical (and much simpler) way to iterate over a file line by line, which is to, well, iterate over the file, would avoid this problem:
for line in test_result_file:
do_something_with(line)
You'll just have to take care of calling .rstrip() on line if you want to get rid of the ending newline character (which is the case for your code).
Also, you want to make sure that the file is properly closed whatever happens. The canonical way is to use open() as a context manager:
with open("path/to/file.txt") as f:
for line in test_result_file:
do_something_with(line)
This will call f.close() when exiting the with block, however it's exited (whether the for loop just finished or an exception happened).
Also, instead of doing complex computation to find the part after the pipe, you can just split your string:
for line in test_results_file:
total = int(line.strip().split("|")[1])
num_recs += 1
And finally, you could use the stdlib's csv module to parse your file instead of doing it manually...
Because it's not a numeric value. So, python throws the ValueError if it is not able convert it into integer. You can below code to check it.
def calculate_average():
test_results_file = open('studentNamesfile.txt', 'r')
total = 0
num_recs = 0
for line in test_results_file.readlines():
try:
total += int(line[line.find('|') + 1:])
num_recs += 1
except ValueError:
print("Invalid Data: ", line[line.find('|') + 1:])
test_results_file.close()
print("total:", total)
print("num_recs:", num_recs)
print("Average:", float(total)/num_recs)
readlines vs readline
from io import StringIO
s = 'hello\n hi\n how are you\n'
f = StringIO(unicode(s))
l = f.readlines()
print(l)
# OUTPUT: [u'hello\n', u' hi\n', u' how are you\n']
f = StringIO(unicode(s))
l1 = f.readline()
# u'hello\n'
l2 = f.readline()
# u' hi\n'
l3 = f.readline()
# u' how are you\n'
l4 = f.readline()
# u''
l5 = f.readline()
# u''
readlines
If we use readlines then it will return a list based on \n character.
readline
From above code we can see that we have only 3 lines in the stringIO but when we access readline it will always gives us an empty string. so, in your code you are converting it into an integer because of that you are getting the ValueError exception.
A simpler approach.
Demo:
total = 0
num_recs = 0
with open(filename) as infile: #Read File
for line in infile: #Iterate Each line
if "|" in line: #Check if | in line
total += int(line.strip().split("|")[-1]) #Extract value and sum
num_recs += 1
print(total, num_recs)
Ok, so I'm new to python and I'm currently taking the python for everybody course (py4e).
Our lesson 7.2 assignment is to do the following:
7.2 Write a program that prompts for a file name, then opens that file and reads through the file, looking for lines of the form:
X-DSPAM-Confidence: 0.8475
Count these lines and extract the floating point values from each of the lines and compute the average of those values and produce an output as shown below. Do not use the sum() function or a variable named sum in your solution.
You can download the sample data at http://www.py4e.com/code3/mbox-short.txt when you are testing below enter mbox-short.txt as the file name.
I can't figure it out. I keep getting this error
ValueError: float: Argument: . is not number on line 12
when I run this code (see screenshot): https://gyazo.com/a61768894299970692155c819509db54
Line 12 which is num = float(balue) + float(num) keeps acting up. When I remove the float from balue then I get another which says
"TypeError: cannot concatenate 'str' and 'float' objects on line 12".
Can arguments be converted into floats or is it only a string? That might be the problem but I don't know if it's true and even if it is I don't know how to fix my code after that.
Your approach was not so bad I would say. However, I do not get what you intended with for balue in linez, as this is iterating over the characters contained in linez. What you rather would have wanted float(linez). I came up with a close solution looking like this:
fname = raw_input("Enter file name: ")
print(fname)
count = 0
num = 0
with open(fname, "r") as ins:
for line in ins:
line = line.rstrip()
if line.startswith("X-DSPAM-Confidence:"):
num += float(line[20:])
count += 1
print(num/count)
This is only intended to get you on the right track and I have NOT verified the answer or the correctness of the script, as this should be contained your homework.
I realise this is answered - I am doing the same course and I had the same error message on that exact question. It was caused by the line being read as a non float and I had to read it as
number =float((line[19:26]))
By the way, the Python environment in the course is very sensitive to spaces in strings - I just got the right code and it rejected it since I had ": " and the correct answer was ":" - just spent half an hour on colons.
Just for the sake of it, here is the answer I reached, which has been accepted as the correct one. Hope you got there in the end.
# Use the file name mbox-short.txt as the file name
count = 0
average = 0
filename = input("Enter file name: ")
filehandle = open(filename)
for line in filehandle:
if not line.startswith("X-DSPAM-Confidence:") : continue
line = line.rstrip()
number =float((line[19:26]))
count = count + 1
average = average + number
average = (average / count)
average = float(average)
print("Average spam confidence:",average)
#Take input from user
fname = input("Please inser file name:")
#try and except
try:
fhand = open(fname)
except:
print("File can not be found", fname)
quit()
#count and total
count= 0
total = 0
for line in fhand:
line = line.rstrip()
if not line.startswith("X-DSPAM-Confidence:"):
continue
else:
line = line[20:]
uline = line.upper()
fline = float(uline)
count = count+1
total = total + fline
avr = total / count
print("Average spam confidence:" ,avr )
The short answer to your problem is in fixing the slicing of the string to extract the number.
If you are looking for colon character, ":" to find the floating number,
remember you have to slice the string with +1 added to the slicing. If you do not do that you end up getting-----> : 0.8475
which is not a number.
so, slice the string with +1 added to the starting index of the string and you shall have a fix.
I have created a method which reads a file line by line and checks if they all contain the same number of delimiters (see below code). The trouble with the solution is that it works on a line per line basis. Given that some of the files I am dealing with are gigabytes in size, this will take a while to process, is there a better solution which will 1) validate whether all lines contain the same number of delimiters 2) not cause any out of memory issues. Thanks in advance.
def isValid(fileName):
with open(fileName,'rb') as infile:
for lineNumber,line in enumerate(infile,1):
count = line.count(',')
if lineNumber > 1 and prevCount != count:
# this line does not contain the same number of delimiters
return False
prevCount = count
return True
You can use all instead and a generator expression:
with open(file_name) as your_file:
start = your_file.readline().count(',') # initial count
print all(i.count(',') == start for i in your_file)
I propose a different approach (without code):
1. read the file as binary, and in chunks of, say, 64 KB
2. count the number of end-of-line tokens in the chunk
3. count the number of delimiters in the chunk but only up to the position of the last EOL token
4. if both number do not divide evenly, stop and return False
5. At EOF, return True
As you'd have to handle the 'overlap' between the last EOL token and the end of the chunk the logic is a bit more complicated than the 'brute-force' approach. But in dealing with GBs it might pay off.
I just noticed that - if you would want to stick with simple logic - the original code can be deflated a bit:
def isValid(fileName):
with open(fileName,'r') as infile:
count = infile.readline().count(',')
for line in infile:
if line.count(',') != count:
return False
return True
There is no need to keep the previous line's count as one single difference will decide it. So keep only the delim count of the first line.
Then, the file needs to be opened as a text file ('r'), not as a binary.
Lastly, by prefetching the very first line just before the loop we can discard the call to enumerate.
So I have to write this program which has to basicaly read string long few lines. Here's an example of string I need to check:
Let's say this is first line and let's check line 4 In this
second line there are number 10 and 8 Third line doesn't have any
numbers This is fourth line and we'll go on in line 12 This
is fifth line and we go on in line 8 In sixth line there is
number 6 This seventh line contains number 5, which means 5th
line This is eighth and the last line, it ends here. These
three lines are boring. They really are In eleventh line we
can see that line 12 is really interesting This is twelfth line
but we go back to line 7.
I need a function that will read first line. It'll find number 4 in it. This means nex line to check will line 4. In line 4 there is number 12. So it goes to line 12. There's number 7 so it goes to line 7. There's number 5, so 5th line, there's number 8 and so 8th line. In 8th line, there's no more numbers.
So as a result I have to get number of line where there are no more numbers to go on.
Also, if there are 2 numbers in 1 line it should acknowledge only the first one and this should be done by another function that I wrote:
def find_number(s):
s = s.split()
m = None
for word in s:
if word.isdigit():
m = int(word)
return word
So basicaly I need to use this function to solve the string with multiple lines. So my question is how can I "jump" from one line to another by utilisting written function?
If I understand your problem correctly (which I think I do, you stated it quite clearly), you want to find the first number in each line of a string, and then go to that line.
The first thing you need to do is split the string into lines with str.splitlines:
s = """Let's say this is first line and let's check line 4
In this second line there are number 10 and 8
Third line doesn't have any numbers
This is fourth line and we'll go on in line 12
This is fifth line and we go on in line 8
In sixth line there is number 6
This seventh line contains number 5, which means 5th line
This is eighth and the last line, it ends here.
These three lines are boring.
They really are
In eleventh line we can see that line 12 is really interesting
This is twelfth line but we go back to line 7."""
lines = s.splitlines()
Then you need to get the first integer in the first line. This is what your function does.
current_line = lines[0]
number = find_number(current_line)
Then you need to do the same thing, but with a different current_line. To get the next line, you might do this:
if number is None: # No numbers were found
end_line = current_line
else:
current_line = lines[first_number]
number = find_number(current_line)
You want do this over and over again, an indefinite number of times, so you need either a while loop, or recursion. This sounds like homework, so I won't give you the code for this (correct me if I'm wrong), you will have to work it out yourself. This shouldn't be too hard.
For future reference - a recursive solution:
def get_line(line):
number = find_number(line)
if number is None: # No numbers were found
return line
else:
return get_line(find_number(lines[number]))
I need a function that will read first line.
If you're using a list of lines, rather than a file, you don't need linecache at all; just list_of_lines[0] gets the first line.
If you're using a single long string, the splitlines method will turn it into a list of lines.
If you're using a file, you could read the whole file in: with open(filename) as f: list_of_lines = list(f). Or, the stdlib has a function, linecache.getline, that lets you get lines from a file in random-access order, without reading the whole thing into memory.
Whichever one you use, just remember that Python uses 0-based indices, not 1-based. So, to read the first line, you need linecache.getline(filename, 0).
I'll use linecache just to show that even the most complicated version of the problem still isn't very complicated. You should be able to adapt it yourself, if you don't need that.
It'll find number 4 in it. This means nex line to check will line 4.
Let's translate that logic into Python. You've already got the find_number function, and getline is the only other tricky part. So:
line = linecache.getline(filename, linenumber - 1)
linenumber = find_number(line)
if linenumber is None:
# We're done
else:
# We got a number.
In line 4 there is number 12. So it goes to line 12. There's number 7 so it goes to line 7. There's number 5, so 5th line, there's number 8 and so 8th line. In 8th line, there's no more numbers.
So you just need to loop until linenumber is None. You do that with a while statement:
linenumber = 1
while linenumber is not None:
line = linecache.getline(filename, linenumber - 1)
linenumber = find_number(line)
The only problem is that when linenumber is None, you want to be able to return the last linenumber, the one that pointed to None. That's easy:
linenumber = 1
while linenumber is not None:
line = linecache.getline(filename, linenumber - 1)
new_linenumber = find_number(line)
if new_linenumber is None:
return linenumber
else:
linenumber = new_linenumber
Of course once you've done that, you don't need to re-check the linenumber at the top of the loop, so you can just change it to while True:.
Now you just need to wrap this up in a function so it can get the starting values as parameters, and you're done.
However, it's worth noting that find_number doesn't quite work. While you do compute a number, m, you don't actually return m, you return something else. You'll need to fix that to get this all working.
Here is my approach:
def check_line( line = None):
assert(line != None )
for word in line.split():
if word.isdigit():
return int(word)
return -1
next = 0
while next >= 0:
last = next
next = check_line(text[next]) -1
if next >= 0:
print "next line:", next +1
else:
print "The last line with number is:", last +1
Its not the most efficient in the world, but...