max min and average looking up file python - python

I'm trying to create a program that asks for a name of a file, opens the file, and determines the maximum and minimum values in the files, and also computes the average of the numbers in the file. I want to print the max and min values, and return the average number of values in the file. The file has only one number per line, which consists of many different numbers top to bottom. Here is my program so far:
def summaryStats():
fileName = input("Enter the file name: ") #asking user for input of file
file = open(fileName)
highest = 1001
lowest = 0
sum = 0
for element in file:
if element.strip() > (lowest):
lowest = element
if element.strip() < (highest):
highest = element
sum += element
average = sum/(len(file))
print("the maximum number is ") + str(highest) + " ,and the minimum is " + str(lowest)
file.close()
return average
When I run my program, it is giving me this error:
summaryStats()
Enter the file name: myFile.txt
Traceback (most recent call last):
File "/Applications/Wing101.app/Contents/MacOS/src/debug/tserver/_sandbox.py", line 1, in <module>
# Used internally for debug sandbox under external interpreter
File "/Applications/Wing101.app/Contents/MacOS/src/debug/tserver/_sandbox.py", line 8, in summaryStats
builtins.TypeError: unorderable types: str() > int()
I think I'm struggling determining which part to make a string. What do you guys think?

You are comparing two incompatible types str and int. You need a make sure you are comparing similar types. You may want to rewrite your for loop to include a call to make sure you are comparing two int values.
for element in file:
element_value = int(element.strip())
if element_value > (lowest):
lowest = element
if element_value < (highest):
highest = element_value
sum += element_value
average = sum/(len(file))
When python reads in files, it reads them in as type str for the whole line. You make the call to strip to remove surrounding white space and newline characters. You then need to parse the remaining str into the correct type (int) for comparison and manipulation.
You should read through your error messages, they are there to enlighten you on where and why your code failed to run. The error message traces where the error took place. the line
File "/Applications/Wing101.app/Contents/MacOS/src/debug/tserver/_sandbox.py", line 8, in summaryStats
Tells you to examine line 8 which is the place for the error takes place.
The next line:
builtins.TypeError: unorderable types: str() > int()
Tells you what is going wrong. A quick search through the python docs locates the description of the error. An easy way to search for advice is to look in the documentation for the language and maybe search for the entire error message. It is likely you are not the first person with this problem and that there is probably a discussion and solution advice available to figure out your specific error.

Lines like these:
if element.strip() > (lowest):
Should probably be explicitly converting to a number. Currently you're comparing a str to and int. Converting using int will take whitespace into account, where int(' 1 ') is 1
if int(element.string()) > lowest:
Also, you could do this like so:
# Assuming test.txt is a file with a number on each line.
with open('test.txt') as f:
nums = [int(x) for x in f.readlines()]
print 'Max: {0}'.format(max(nums))
print 'Min: {0}'.format(min(nums))
print 'Average: {0}'.format(sum(nums) / float(len(nums)))

when you call open(filename), you are constructing a file object. You cannot iterate through this in a for loop.
If each value is on it's own line: after creating the file object, call:
lines = file.readlines()
Then loop through those lines and convert to int:
for line in lines:
value = int(line)

Related

Conversion to Logn Python 3.7

I have this code that works great and does what I want, however it does it in linear form which is way to slow for the size of my data files so I want to convert it to Log. I tried this code and many others posted here but still no luck at getting it to work. I will post both sets of code and give examples of what I expect.
import pandas
import fileinput
'''This code runs fine and does what I expect removing duplicates from big
file that are in small file, however it is a linear function.'''
with open('small.txt') as fin:
exclude = set(line.rstrip() for line in fin)
for line in fileinput.input('big.txt', inplace=True):
if line.rstrip() not in exclude:
print(line, end='')
else:
print('')
'''This code is my attempt at conversion to a log function.'''
def log_search(small, big):
first = 0
last = len(big.txt) - 1
while first <= last:
mid = (first + last) / 2
if str(mid) == small.txt:
return True
elif small.txt < str(mid):
last = mid - 1
else:
first = mid + 1
with open('small.txt') as fin:
exclude = set(line.rstrip() for line in fin)
for line in fileinput.input('big.txt', inplace=True):
if line.rstrip() not in exclude:
print(line, end='')
else:
print('')
return log_search(small, big)
big file has millions of lines of int data.
small file has hundreds of lines of int data.
compare data and remove duplicated data in big file but leave line number blank.
running the first block of code works but it takes too long to search through the big file. Maybe I am approaching the problem in a wrong way. My attempt at converting it to log runs without error but does nothing.
I don't think there is a better or faster way to do this that what you are currently doing in your first approach. (Update: There is, see below.) Storing the lines from small.txt in a set and iterating the lines in big.txt, checking whether they are in that set, will have complexity of O(b), with b being the number of lines in big.txt.
What you seem to be trying is to reduce this to O(s*logb), with s being the number of lines in small.txt, by using binary search to check for each line in small.txt whether it is in big.txt and removing/overwriting it then.
This would work well if all the lines were in a list with random access to any array, but you have just the file, which does not allow random access to any line. It does, however, allow random access to any character with file.seek, which (at least in some cases?) seems to be O(1). But then you will still have to find the previous line break to that position before you can actually read that line. Also, you can not just replace lines with empty lines, but you have to overwrite the number with the same number of characters, e.g. spaces.
So, yes, theoretically it can be done in O(s*logb), if you do the following:
implement binary search, searching not on the lines, but on the characters of the big file
for each position, backtrack to the last line break, then read the line to get the number
try again in the lower/upper half as usual with binary search
if the number is found, replace with as many spaces as there are digits in the number
repeat with the next number from the small file
On my system, reading and writing a file with 10 million lines of numbers only took 3 seconds each, or about 8 seconds with fileinput.input and print. Thus, IMHO, this is not really worth the effort, but of course this may depend on how often you have to do this operation.
Okay, so I got curious myself --and who needs a lunch break anyway?-- so I tried to implement this... and it works surprisingly well. This will find the given number in the file and replace it with an accordant number of - (not just a blank line, that's impossible without rewriting the entire file). Note that I did not thoroughly test the binary-search algorithm for edge cases, off-by-one erros etc.
import os
def getlineat(f, pos):
pos = f.seek(pos)
while pos > 0 and f.read(1) != "\n":
pos = f.seek(pos-1)
return pos+1 if pos > 0 else 0
def bsearch(f, num):
lower = 0
upper = os.stat(f.name).st_size - 1
while lower <= upper:
mid = (lower + upper) // 2
pos = getlineat(f, mid)
line = f.readline()
if not line: break # end of file
val = int(line)
if val == num:
return (pos, len(line.strip()))
elif num < val:
upper = mid - 1
elif num > val:
lower = mid + 1
return (-1, -1)
def overwrite(filename, to_remove):
with open(filename, "r+") as f:
positions = [bsearch(f, n) for n in to_remove]
for n, (pos, length) in sorted(zip(to_remove, positions)):
print(n, pos)
if pos != -1:
f.seek(pos)
f.write("-" * length)
import random
to_remove = [random.randint(-500, 1500) for _ in range(10)]
overwrite("test.txt", to_remove)
This will first collect all the positions to be overwritten, and then do the actual overwriting in a second stes, otherwise the binary search will have problems when it hits one of the previously "removed" lines. I tested this with a file holding all the numbers from 0 to 1,000 in sorted order and a list of random numbers (both in- and out-of-bounds) to be removed and it worked just fine.
Update: Also tested it with a file with random numbers from 0 to 100,000,000 in sorted order (944 MB) and overwriting 100 random numbers, and it finished immediately, so this should indeed be O(s*logb), at least on my system (the complexity of file.seek may depend on file system, file type, etc.).
The bsearch function could also be generalized to accept another parameter value_function instead of hardcoding val = int(line). Then it could be used for binary-searching in arbitrary files, e.g. huge dictionaries, gene databases, csv files, etc., as long as the lines are sorted by that same value function.

Problems reading a file in, and editing certain contents of it

So I have a file with
first name(space)last name(tab)a grade as such.
Example
Wanda Barber 96
I'm having trouble reading this in as a list and then editing the number.
My current code is,
def TopStudents(n):
original = open(n)
contents = original.readlines()
x = contents.split('/t')
for y in x[::2]:
y - 100
if y > 0: (????)
Here is the point where I'm confused. I am just trying to get the first and last names of students who scored over 100%. I thought of creating a new list for students that meet this qualification, but I'm not sure how I would write the corresponding first and last name. I know I need to take the stride of every other location in the list, as odd will always be the first and last names. Thank you in advance for the help!
There are several things wrong with your code:
- The open file must be closed (#1)
- Must be made a function call using to call it (#2)
- The split used is using the forwardslash (/) instead of the backslash () (#3)
- The way you decided to loop through your for loop is not optimal if you are looking to access all the members (#4)
- The for loops end in a : (#5)
- You must store the result of that calculation somewhere (#6)
def TopStudents(n):
original = open(n) #1
contents = original.readlines #2
x = contents.split('/t') #3
for y in x[::2] #4, #5
y - 100 #6
if y > 0:
That said, a fixed version could be:
original = open(n, 'r')
for line in original:
name, score = line.split('\t')
# If needed, you could split the name into first and last name:
# first_name, last_name = name.split(' ')
# 'score' is a string, we must convert it to an int before comparing to one, so...
score = int(score)
if score > 100:
print("The student " + name + " has the score " + str(score))
original.close() #1 - Closed the file
Note: I have focused on readability with several commentary to help you understand the code.
I always prefer to use ‘with open()’ because it closes the file automatically. I used a txt with comma separations for simplicity for me, but you can just replace the comma with \t.
def TopStudents():
with open('temp.txt', 'r') as original:
contents = list(filter(None, (line.strip().strip('\n') for line in original)))
x = list(part.split(',') for part in contents)
for y in x:
if int(y[1]) > 100:
print(y[0], y[1])
TopStudents()
This opens and loads all lines into contents as a list, removing blank lines and line breaks. Then it separates into a list of lists.
You then iterate through each list in x, looking for the second value (y[1]) which is your grade. If the int() is greater than 100, print each segment of y.

Python: Trying to convert string to int, get error to int invalid literal for int() with base 10: ''

I am new to Python, so I apologize if this is a simple fix. I have been stuck on a Codeval problem (Happy Numbers) for quite some time and I am not sure what is going wrong.
Problem Description:
Starting with any positive integer, replace the number by the sum of the squares of its digits, and repeat the process until the number equals 1, or it loops endlessly in a cycle which does not include 1.Those numbers for which this process ends in 1 are happy, while those that do not end in 1 are unhappy.
For example:
7 is a happy number (7->49->97->130->10->1)
22 is not a happy number (22->8->64->52->29->85->89->145->42->20->4->16->37->58->89 ...)
My test input and expected outcome:
1 --> 1
7 --> 1
22 --> 0
If the number is a happy number, print out 1. If not, print out 0.
Here is the full Traceback:
Traceback (most recent call last):
File "/happy_number.py", line 58, in <module>
happy_number_check("happy_numbers.txt")
File "/happy_number.py", line 55, in happy_number_check
happy_or_not(line)
File "/happy_number.py", line 33, in happy_or_not
i = int(i)
ValueError: invalid literal for int() with base 10: ''
Here is my code:
# test = 7
def happy_or_not(number):
number = str(number)
if number == 1:
print 1
else:
new_num = 0
for i in number:
i = int(i)
if i == " ":
continue
else:
new_num += i**2
if new_num == 10 or new_num == 10:
print 1
else:
try:
happy_or_not(new_num)
except RuntimeError:
print 0
# happy_or_not(test)
def happy_number_check(file):
f = open(file, 'r+')
for line in f:
if line == "":
continue
else:
happy_or_not(line)
happy_number_check("happy_numbers.txt")
What I have already tried:
Based on what I gathered from other similar questions, the issue may be that I am not able to convert a str into an int when I hit the line i = int(i). It is my understanding that I have to convert the str type into an int type before doing any math on it, yet it looks like that is where it is failing.
I tested the happy_or_not function by itself, and it does print out the value that I expect it to. It seems like to me that the issue comes when I try and call that happy_or_not function inside of the happy_number_check function, which is reading my txt file (containing a list of numbers to test). I must not be grasping a larger principle here, so any explanations would be helpful.
This is also my first real attempt at a recursive function and there probably is a better way to structure this, so any suggestions on how to change things up to be more effective is most welcome.
Thank you in advance!
Try changing happy_number_check like this (validate each line is an integer):
def happy_number_check(file):
with open(file, 'r+') as f: # Safer way to open a file. Will automatically close for you even if something goes wrong
for line in f:
if line.strip().isdigit():
happy_or_not(line.strip())
The strip() will also make it so that you can remove this code:
if i == " ":
continue
else:
By the way, you also have a bug in your logic. You are relying on a RuntimeError - a Stack Overflow interestingly enough :-) - to terminate a test. You should really keep track of what numbers have been tried, and if you try the same number again, return 0.
Don't click this link if you don't want a straight up solution to the problem, but if you do, here is an iterative solution: http://rosettacode.org/wiki/Happy_numbers#Python

Reading from file in python, with splitting

Trouble with python 3x with input and output from files
So, Im doing an assignment for my Computer science class and I'm having a slight problem. My professor wants us to add some lines of code that require the program to open up a .txt file and read the data from the .txt file through the program. In this case, my program is a monthly payment program.Usually, you'd ask the user to input how much he/she's borrowing, the interest rate, and the term in years. But, the data for all three of those are already pre-written in the .txt file, inwhich he wants us to read the data from. Now, I'm having trouble with my code.
This is my code:
import decimal
print("\t".join(s.rjust(15) for s in ("Payment", "Amount Paid", "Balance")))
print("-"*54)
filename = "LoanData.txt"
values = []
with open(filename) as f:
for line in f:
values.append([int(n) for n in line.strip().split(' ')])
for arr in values:
try:
balance,rate,term = arr[0],arr[1],arr[2]
except IndexError:
print ("Index error occured, a line doesn't have the crucial amount of entries.")
balance *= (1 + rate * term)
payment = balance / (12 * term)
total = 0
for month in range(12 * term):
if balance < payment:
payment = balance
print(("{: >15.2f}\t"*3)[:-1].format(payment, total, balance))
total += payment
balance -= payment
and this is the error im getting:
Traceback (most recent call last):
File "C:/Users/Python/Desktop/loan.py", line 11, in <module>
values.append([int(n) for n in line.strip().split(' ')])
File "C:/Users/Python/Desktop/loan.py", line 11, in <listcomp>
values.append([int(n) for n in line.strip().split(' ')])
ValueError: invalid literal for int() with base 10: '5.5'
This is what the file looks like:
5000 5.5 10
25000 10.0 10
100000 8.5 20
The reason this isn't working is because you are trying to convert a decimal value (such as 5.5) into an int. Now, even if you change it to convert to a float, an additional fix is still needed as you can't use a float as an iterator for a for loop:
1.Change
balance,rate,term = arr[0],arr[1],arr[2]
To
balance,rate,term = int(arr[0]),arr[1],int(arr[2])
2.Change:
values.append([int(n) for n in line.strip().split(' ')])
To
values.append([float(n) for n in line.strip().split(' ')])
This will should get your code working. What it does is convert all inputs into floats, and then convert balance and term into integers so that they can be used in your for loop. I tried the code on my PC, and it should be working.
See the exception traceback. The error is on line 11, and it basically says '5.5' is not an int. Which is correct - It's a float (a decimal).
Line 11 is currently:
values.append([int(n) for n in line.strip().split(' ')])
Try:
values.append([float(n) for n in line.strip().split(' ')])
Your trying to call int() on a decimal value, or float in python. That's why it's not working.
try int('5.5') in your interactive shell(IDLE), and you get the same error.
Try this:
values.append([int(float(n)) for n in line.strip().split()])
This will do the job if you don't mind losing the precision of your values, as int will round them to a whole number, and if you don't mind them all being floats then just use float instead of int

Python 'int' object is not subscriptable

Im trying to read a file and make sure that each value is in order. I dont think im converting the string into the integer correctly. Here is some of my code. I am also trying to use flags.
fileName = input("What file name? ")
infile = open(fileName,'r')
correct_order_flag = False
i = 0
line = infile.readline()
while line !="":
for xStr in line.split(" "):
if eval(xStr) [i] < i:
correct_order_flag = True
else:
correct_order_flag = False
i = i+1
if correct_order_flag:
print("Yes, the numbers were in order")
else:
print("No, the numbers were not in order")
count = i - 1
print("There were", count, "numbers.")
You are correct - you are indicating with eval(xStr)[i] that eval(xStr) is an array, and thus can be subscripted. What it looks like you may want (since you say you want to convert the string to an int) is just int(xStr), to make that whole line:
if int(xStr) < i:
For starters, you don't read the whole file at all. Try this:
with open(fileName) as f:
for line in f:
# here goes your code
Not sure though, what do you mean by "each value is in order", but using eval() is a VERY bad idea for any purpose.
I would like to add that because you are comparing xstr[i] to i that unless your first number is less than zero the flag will change, meaning that the sequence 1 2 3 4 5 would print out saying "NO, the numbers were not in order"
As Chris indicated, int(s) is the preferred way to convert a string to an integer. eval(s) is too broad and can be a security risk when evaluating data from an untrusted source.
In addition, there is another error in the script. The *correct_order_flag* is being set on every iteration, so one entry with incorrect order can be masked by a subsequent entry in the correct order. Accordingly, you should break out of the loop when incorrect ordering is found.

Categories

Resources