Finding a maximum value in a file - python

I'm in a first year CompSci class and learning Python, so bear with me here. The assignment is to open a file using Python, read it, and find the maximum value in that file without using any built in functions or concepts we haven't discussed in class. I can read through the file and get the values, but my issue is that my code will consider the value "30" to instead be "3" and "0" instead of thirty. This is what I have so far:
def maxValueInFile(fileName):
inputFile = open(fileName, "r")
currentMax = int(0)
for value in inputFile.read():
if value.isdigit():
value = int(value)
if value > currentMax:
currentMax = value
inputFile.close()
return currentMax
When I run the file, it won't return a number higher than 9, presumably becaus

If you want to read digit by digit, you can build up the real numbers by continously multiplying the temporary result by 10, and then adding the value of the last digit read.
def maxValueInFile(fileName):
inputFile = open(fileName, "r")
currentMax = int(0)
number = 0
for value in inputFile.read():
if value.isdigit(): # again found a digit
number = number * 10 + int(value)
else: # found a non-digit
if number > currentMax:
currentMax = number
number = 0
if number > currentMax: # in case the for-loop ended with a digit, the last number still needs to be tested
currentMax = number
inputFile.close()
return currentMax

You're trying to do too much in this piece of your code (which I'll comment so you can see where it's going wrong):
# inputFile.read() returns something like "30\n40\n50"
for value in inputFile.read():
# now value is something like "3" or "0"
if value.isdigit():
value = int(value)
# and now it's 3 or 0
There's no benefit to splitting the string up into digits -- so don't do that. :)
currentMax = 0
for line in inputFile.readLine():
value = int(line)
if value > currentMax:
currentMax = value
Note that this code will raise a ValueError exception if line isn't convertible to an int!
A couple of style notes (which I've applied to the code above):
You don't need to say int(0) because 0 is already an int all on its own.
When you're converting types, it's better to assign a new variable to the new type (when you start using static type checking, this is required, so it's a good habit to get into). In the code above I called the line of text we read from the file line to help me remember that it's a line of text (i.e. a str), and then I name the numeric value value to help me remember that that's the actual number that I can use in a comparison (i.e. an int).

inputFile.read() returns a string, which gets broken into characters when you iterate over it. Assuming your input file has each value on a separate line, you want inputFile.read().splitlines().
However, here's how I'd write it, with notes:
def maxValueInFile(fileName):
currentMax = 0
with open(fileName) as inputFile: # Avoid manually opening and closing files
for line in inputFile: # Avoid reading the whole file
# If we can assume the file only contains numbers,
# we can skip the ".isdigit" check.
value = int(line) # "int()" doesn't care about whitespace (newline)
if value > currentMax:
currentMax = value
return currentMax

Related

Iterating through tables in text file

everyone.
I would say this is the first task I have not a clear idea where to start with:
Create a text file (using an editor, not necessarily Python)
containing two tab- separated columns, with each column containing a
number. Then use Python to read through the file you’ve created. For
each line, multiply each first number by the second, and then sum
the results from all the lines. Ignore any line that doesn’t contain
two numeric columns.
so far I wrote a couple of lines, but I am not sure where would I need to go next:
filename = 'path'
def sum_columns(filename):
sum = 0
multiply = 0
with open (filename) as f:
Should I split my file with 2 columns and create a list of them, or should I do something else?
Thank you in advance
Here is a short solution:
def sum_columns(filename):
counter = 0
with open(filename) as file:
for line in file:
try:
a, b = [int(x) for x in line.split('\t')]
counter += a * b
except ValueError:
continue
return counter
file_name = 'myfile.txt'
print(sum_columns(file_name))
This is what a lot of people (#martineau to be the first) suggested to use in comments (also this is something I learned just now) so I decided to put it in an answer.
Basically what happens, the loop iterates over each line and for each line creates a list of two integers (the list comprehension is for just that since otherwise both numbers are strings which will raise a ValueError if you try multiplying them), then also unpack the two values, which is great since then you only need one except since the only reasonable error thrown is ValueError (either because couldn't unpack or character couldn't be converted to integer) then multiply both values and add to the counter and at the end of the loop return the counter
You can pretty much do a lot of things, given the exercise text. In my opinion, the best way would be to do something like this:
filename = 'path'
def sum_columns(filename):
sum = 0
multiply = 0
with open (filename) as f:
all_lines = f.readlines()
f.close()
for line in all_lines:
splitted = line.split("\t")
sum += int(splitted[0]) * int(splitted[1])
return sum
You'll get all lines of the file listed into all_lines, then you can iterate through every line and split them from the tab, then multiply them and sum them to the sum variable you initialized to 0, which you'll return at the end. As hinted by someone else, you could also read the file line by line without memorizing every line into a list, but if the file is relatively small, you can go with my option.
If you have a file like this:
1 2
2 4
4 8
You can do the following:
from functools import reduce
def is_int(s):
try:
int(s)
return True
except ValueError:
return False
filename = 'path'
def sum_columns(filename):
with open (filename) as f:
lines = f.readlines()
return sum([
reduce(lambda x, y: x * y, map(int,line.split("\t")))
for line in lines
if len(list(filter(is_int, line.split("\t")))) == 2
])
Explanation:
At the top I define a helper function, that determins if a string can be converted into an int or not. This will be used later to ignore lines that don't have 2 numbers. It's based on this answer
def is_int(s):
try:
int(s)
return True
except ValueError:
return False
Then, we open the file, and read all lines into a variable. This is not the most efficient, as it can be processed line by line without storing the while file, however, for smaller files this is negligable.
with open (filename) as f:
lines = f.readlines()
Next, is a single operation to perform your query, but let's break it down:
First, we iterate through all the lines:
for line in lines
Next, we only keep the lines that have exactly two numbers separated by tabs:
if len(list(filter(is_int, line.split("\t")))) == 2
Finally, we turn each number in the line into ints, and multiply them all together:
reduce(lambda x, y: x * y, map(int,line.split("\t")))
We then sum all of these and return the result
Performance consideration
If performance is a concern, you can achieve the same thing, reading the contents line by line, instead of pulling the whole file into a variable. It is less elegant, but more efficicient:
def sum_columns(filename):
total = 0
with open (filename) as f:
for line in f:
if len(list(filter(is_int, line.split("\t")))) != 2:
continue
total += reduce((lambda x, y: x * y), map(int,line.split("\t")))
return total
(Note, that you still need the import and helpers from the above example)
input.txt
1 3
2 6
3 7
7 12
8
script.py
with open('input.txt') as f:
total = 0
for line in f:
numbers = line.read().split('\t')
try:
line_value = int(numbers[0]) * int(numbers[1])
except IndexError as e:
# the line doesn't contain two numbers
continue
except ValueError as e:
# a value couldn't be converted to a number
continue
total += line_value

Can't convert strings to int or float

So I was able to fix the first issue with you guys help but now that th program is running without any errors, it's not calculating the average correctly and I'm not sure why. Here's what it looks like:
def calcAverage():
with open('numbers.dat', 'r') as numbers_file:
numbers = 0
amount = 0
for line in numbers_file:
amount = amount + float(line)
numbers += 1
average = amount / numbers
print("The average of the numbers in the file is:",average)
You're reassigning line after you check whether line is empty in the while. The while test is testing the previous line in the file.
So when you get to the end, you read a blank line and try to add it to the amount, but get an error.
You're also never adding the first line, since you read it before the loop and never add that to amount.
Use a for loop instead of while, it will stop automatically when it reaches the end.
def calcAverage():
with open('numbers.dat', 'r') as numbers_file:
numbers = 0
amount = 0
for line in numbers_file:
amount = amount + float(line)
numbers += 1
average = amount / numbers
print("The average of the numbers in the file is:",average)
If you do want to use a while loop, do it like this:
while True:
line = numbers_file.readline()
if not line:
break
# rest of loop
Error shows that you have empty string in line.
You can get the same error with float('')
You run code in wrong order - you read new line before converting previous line.
And you should strip line because it still have \n
You need
line = numbers_file.readline()
line = line.strip() # remove `\n` (and `\t` and spaces)
while line != '':
# convert current line
amount = amount + float(line)
numbers += 1
# read next line
line = numbers_file.readline()
line = line.strip() # remove `\n` (and `\t` and spaces)
You could also use for-loop for this
numbers = []
for line in numbers_file:
line = line.strip() # remove `\n` (and `\t` and spaces)
if line:
numbers.append( float(line) )
#else:
# break
average = sum(numbers) / len(numbers)
and this could be reduced to
numbers = [float(line) for line in numbers_file if line.strip() != '']
average = sum(numbers) / len(numbers)
Other answers illustrated how to read a file line-by-line.
Some of them dealt with issues like reaching end-of-file (EOF) and invalid input for conversion to float.
Since any I/O and UI can be expected to give invalid input, I want to stress on validation before processing the input as expected.
Validation
For each line read from an input console or file alike, you should consider validating.
What happens if the string is empty?
Converting an empty string input to float
What happens if the string contains special chars ? like commented by Mat
What happens if the number does not fit your expected float? (value range, decimal precision)
What happens if nothing read or reached end-of-file ? (empty file or EOF)
Advice: To make it robust, expect input errors and catch them.
(a) using try-except construct for error-handling in Python:
for line in numbers_file:
try:
amount = amount + float(line)
numbers += 1
except ValueError:
print "Read line was not a float, but: '{}'".format(line)
(b) testing on input ahead:
In a more simple way, you could also test manually using basic if statements like:
if line == "": # on empty string received
print("WARNING: read an empty line! This won't be used for calculating average.") # show problem and consequences
continue # stop here and continue with next for-iteration (jump to next line)

Removing lines from a txt file based on the structure of the line

Code:
with open("filename.txt" 'r') as f: #I'm not sure about reading it as r because I would be removing lines.
lines = f.readlines() #stores each line in the txt into 'lines'.
invalid_line_count = 0
for line in lines: #this iterates through each line of the txt file.
if line is invalid:
# something which removes the invalid lines.
invalid_line_count += 1
print("There were " + invalid_line_count + " amount of invalid lines.")
I have a text file like so:
1,2,3,0,0
2,3,0,1,0
0,0,0,1,2
1,0,3,0,0
3,2,1,0,0
The valid line structure is 5 values split by commas.
For a line to be valid, it must have a 1, 2, 3 and two 0's. It doesn't matter in what position these numbers are.
An example of a valid line is 1,2,3,0,0
An example of an invalid line is 1,0,3,0,0, as it does not contain a 2 and has 3 0's instead of 2.
I would like to be able to iterate through the text file and remove invalid lines.
and maybe a little message saying "There were x amount of invalid lines."
Or maybe as suggested:
As you read each line from the original file, test it for validity. If it passes, write it out to the new file. When you're finished, rename the original file to something else, then rename the new file to the original file.
I think that the csv module may help so I read the documentation and it doesn't help me.
Any ideas?
You can't remove lines from a file, per se. Rather, you have to rewrite the file, including only the valid lines. Either close the file after you've read all the data, and reopen in mode "w", or write to a new file as you process the lines (which takes less memory in the short term.
Your main problem with detecting line validity seems to be handling the input. You want to convert the input text to a list of values; this is a skill you should get from learning your tools. The ones you need here are split to divide the line, and int to convert the values. For instance:
line_vals = line.split(',')
Now iterate through line_vals, and convert each to integer with int.
Validity: you need to count the quantity of each value you have in this list. You should be able to count things by value; if not back up to your prior lessons and review basic logic and data flow. If you want the advanced method for this, use collections.Counter, which is a convenient type of dictionary that accumulates counts from any sequence.
Does that get you moving? If you're still lost, I recommend some time with a local tutor.
One of the possible right approaches:
with open('filename.txt', 'r+') as f: # opening file in read/write mode
inv_lines_cnt = 0
valid_list = [0, 0, 1, 2, 3] # sorted list of valid values
lines = f.read().splitlines()
f.seek(0)
f.truncate(0) # truncating the initial file
for l in lines:
if sorted(map(int, l.split(','))) == valid_list:
f.write(l+'\n')
else:
inv_lines_cnt += 1
print("There were {} amount of invalid lines.".format(inv_lines_cnt))
The output:
There were 2 amount of invalid lines.
The final filename.txt contents:
1,2,3,0,0
2,3,0,1,0
3,2,1,0,0
This is a mostly language-independent problem. What you would do is open another file for writing. As you read each line from the original file, test it for validity. If it passes, write it out to the new file. When you're finished, rename the original file to something else, then rename the new file to the original file.
For a line to be valid, each line must have a 1, 2, 3 and 2 0's. It doesn't matter in what position these numbers are.
CHUNK_SIZE = 65536
def _is_valid(line):
"""Check if a line is valid.
A line is valid if it is of length 5 and contains '1', '2', '3',
in any order, as well as '0', twice.
:param list line: The line to check.
:return: True if the line is valid, else False.
:rtype: bool
"""
if len(line) != 5:
# If there's not exactly five elements in the line, return false
return False
if all(x in line for x in {"1", "2", "3"}) and line.count("0") == 2:
# Builtin `all` checks if a condition (in this case `x in line`)
# applies to all elements of a certain iterator.
# `list.count` returns the amount of times a specific
# element appears in it. If "0" appears exactly twice in the line
# and the `all` call returns True, the line is valid.
return True
# If the previous block doesn't execute, the line isn't valid.
return False
def get_valid_lines(path):
"""Get the valid lines from a file.
The valid lines will be written to `path`.
:param str path: The path to the file.
:return: None
:rtype: None
"""
invalid_lines = 0
contents = []
valid_lines = []
with open(path, "r") as f:
# Open the `path` parameter in reading mode.
while True:
chunk = f.read(CHUNK_SIZE)
# Read `CHUNK_SIZE` bytes (65536) from the file.
if not chunk:
# Reaching the end of the file, we get an EOF.
break
contents.append(chunk)
# If the chunk is not empty, add it to the contents.
contents = "".join(contents).split("\n")
# `contents` will be split in chunks of size 65536. We need to join
# them using `str.join`. We then split all of this by newlines, to get
# each individual line.
for line in contents:
if not _is_valid(line=line):
invalid_lines += 1
else:
valid_lines.append(line)
print("Found {} invalid lines".format(invalid_lines))
with open(path, "w") as f:
for line in valid_lines:
f.write(line)
f.write("\n")
I'm splitting this up into two functions, one to check if a line is valid according to your rules, and a second one to manipulate a file. If you want to return the valid lines instead, just remove the second with statement and replace it with return valid_lines.

Converting a String to a float and assigning that float as a value in a dictionary?

I have a file that has 'word'\t'num'\n as a string. I would like to convert it to a dictionary which I have done except how to a convert the value 'num' to a floating point number, so that the dict is of this format `{word : num} and the num is not a string but a floating point number.
Here is my script so far:
file_stream = open(infile)
file_list = file_stream.readlines()
dict_output = {}
for line in file_list:
tmp = line.split()
dict_output[tmp[0]] = float(tmp[1])
If I remove the float() the script runs fine and it creates a dictionary with the values as strings. When I try to cast the string as an int I get the error message:
"ValueError: could not convert string to float: stand"
You are converting values to floats correctly.
However, you have at least one line where the is more than just a tab character on the line or the second value is not a float. Try changing your code to:
key, value = line.rsplit('\t', 1)
try:
dict_output[key] = float(value)
except ValueError:
print 'Unexpected line: {!r}'.format(line)
This splits the line on just the last \t tab character instead of on any whitespace. This leaves lines that may have multiple tabs on one line intact and assumes that only the last value is a float.
If this still fails, the code prints out the problem line to show us what else we may need to fix.
Because your format is : word'\t'num'\n so between word and num is t (tab). you should change from line.split() to `line.split('\t'). So, full code should be:
file_stream = open(infile)
file_list = file_stream.readlines()
dict_output = {}
for line in file_list:
tmp = line.split('\t')
dict_output[tmp[0]] = float(tmp[1])

Python: read a file and replace it line by line with a certain condition

I have a file like this below.
0 0 0
0.00254 0.00047 0.00089
0.54230 0.87300 0.74500
0 0 0
I want to modify this file. If a value is less than 0.05, then a value is to be 1. Otherwise, a value is to be 0.
After python script runs, the file should be like
1 1 1
1 1 1
0 0 0
1 1 1
Would you please help me?
OK, since you're new to StackOverflow (welcome!) I'll walk you through this. I'm assuming your file is called test.txt.
with open("test.txt") as infile, open("new.txt", "w") as outfile:
opens the files we need, our input file and a new output file. The with statement ensures that the files will be closed after the block is exited.
for line in infile:
loops through the file line by line.
values = [float(value) for value in line.split()]
Now this is more complicated. Every line contains space-separated values. These can be split into a list of strings using line.split(). But they are still strings, so they must be converted to floats first. All this is done with a list comprehension. The result is that, for example, after the second line has been processed this way, values is now the following list: [0.00254, 0.00047, 0.00089].
results = ["1" if value < 0.05 else "0" for value in values]
Now we're creating a new list called results. Each element corresponds to an element of values, and it's going to be a "1" if that value < 0.05, or a "0" if it isn't.
outfile.write(" ".join(results))
converts the list of "integer strings" back to a string, separated by 7 spaces each.
outfile.write("\n")
adds a newline. Done.
The two list comprehensions could be combined into one, if you don't mind the extra complexity:
results = ["1" if float(value) < 0.05 else "0" for value in line.split()]
if you can use libraries I'd suggest numpy :
import numpy as np
myarray = np.genfromtxt("my_path_to_text_file.txt")
my_shape = myarray.shape()
out_array = np.where(my_array < 0.05, 1, 0)
np.savetxt(out_array)
You can add formating as arguments to the savetxt function. The docstrings of the function are pretty self explanatory.
If you are stuck with pure python :
with open("my_path_to_text_file") as my_file:
list_of_lines = my_file.readlines()
list_of_lines = [[int( float(x) < 0.05) for x in line.split()] for line in list_of_lines]
then write that list to file as you see fit.
You can use this code
f_in=open("file_in.txt", "r") #opens a file in the reading mode
in_lines=f_in.readlines() #reads it line by line
out=[]
for line in in_lines:
list_values=line.split() #separate elements by the spaces, returning a list with the numbers as strings
for i in range(len(list_values)):
list_values[i]=eval(list_values[i]) #converts them to floats
# print list_values[i],
if list_values[i]<0.05: #your condition
# print ">>", 1
list_values[i]=1
else:
# print ">>", 0
list_values[i]=0
out.append(list_values) #stores the numbers in a list, where each list corresponds to a lines' content
f_in.close() #closes the file
f_out=open("file_out.txt", "w") #opens a new file in the writing mode
for cur_list in out:
for i in cur_list:
f_out.write(str(i)+"\t") #writes each number, plus a tab
f_out.write("\n") #writes a newline
f_out.close() #closes the file
The following code performs the replacements in-place: for that , the file is opened in 'rb+' mode. It's absolutely mandatory to open it in binary mode b. The + in 'rb+' means that it's possible to write and to read in the file. Note that the mode can be written 'r+b' also.
But using 'rb+' is awkward:
if you read with for line in f , the file is read by chunks and several lines are kept in the buffer where they are really read one after the other, until another chunk of data is read and loaded in the buffer. That makes it harder to perform transformations, because one must follow the position of the file's pointer with the help of tell() and to move the pointer with seek() and in fact I've not completly understood how it must done.
.
Happily, there's a solution with replace(), because , I don't know why, but I believe the facts, when readline() reads a line, the file 's pointer doesn't go further on disk than the end of the line (that is to say it stops at the newline).
Now it's easy to move and know positions of the file's pointer
to make writing after reading, it's necessary to make seek() being executed , even if it should be to do seek(0,1), meaning a move of 0 caracters from the actual position. That must change the state of the file's pointer, something like that.
Well, for your problem, the code is as follows:
import re
from os import fsync
from os.path import getsize
reg = re.compile('[\d.]+')
def ripl(m):
g = m.group()
return ('1' if float(g)<0.5 else '0').ljust(len(g))
path = ...........'
print 'length of file before : %d' % getsize(path)
with open('Copie de tixti.txt','rb+') as f:
line = 'go'
while line:
line = f.readline()
lg = len(line)
f.seek(-lg,1)
f.write(reg.sub(ripl,line))
f.flush()
fsync(f.fileno())
print 'length of file after : %d' % getsize(path)
flush() and fsync() must be executed to ensure that the instruction f.write(reg.sub(ripl,line)) effectively writes at the moment it is ordred to.
Note that I've never managed a file encoded in unicode like. It's certainly still more dificult since every unicode character is encoded on several bytes (and in the case of UTF8 , variable number of bytes depending on the character)

Categories

Resources