I'm having some trouble with my code in python. I just started learning input and output in class, and how to have python read in data from text files(barely. I'm still a huge beginner). Anyways, my assignment is that I have to have my program read in data from a file and run it through my program. Problem is, I don't have a good idea on how to do that and was wondering if you guys could help me out. The text file just contains a huge life of numbers for python to use in my program. My program finds the mean, median, and standard deviation of a list of numbers that are given to it. Now, instead of user input data, my professor wants python to use data from a file that was already pre-written.
My code:
import math
def mean(values):
average = sum(values)*1.0/len(values)
return average
def deviation(values):
length = len(values)
m = mean(values)
total_sum = 0
for i in range(length):
total_sum += (values[i]-m)**2
root = total_sum*1.0/length
return math.sqrt(root)
def median(values):
if len(values)%2 != 0:
return sorted(values)[len(values)//2]
else:
midavg = (sorted(values)[len(values)//2] + sorted(values)[len(values)//2-1])//2.0
return midavg
def main():
x = [15, 17, 40, 16, 9]
print (mean(x))
print (deviation(x))
print (median(x))
main()
Now, I have to edit my code so it opens the file, takes the data, and reads the data through my equations. Only problem is, I don't have a good idea on how to do that. Could anyone please help?
I've tried basic input and output myself, but it's done no justice in helping me with the bigger picture.
def main():
total=0
input = open('Stats.txt')
for nextline in input:
mylist = nextline.split()
for n in mylist:
total+=int(n)
print(total)
You have to fill your list from the file.
Open the file and iterate over the lines. Convert the content of the line to an integer and append it to your list. If you don't cxonvert the data you'll get strings and those won't work with mathematical operations. Close your file.
Now work with your list.
filename = 'newfile.txt'
data = []
source = open(filename)
for line in source:
data.append(int(line))
source.close()
print(mean(data))
print(deviation(data))
# more stuff with data
There is a way to let Python close the file for you so you won't have to remember it.
with open(filename) as source:
for line in source:
data.append(int(line))
According to your edit this might not be what you want. If the numbers are in one line, rather than one number per line, you'll have to take a different approach (split).
Related
I am working on a small project which is intended to read, create, and manipulate virtual to-do lists. There is a checkbox class, a function open_create_boxes to create a list of checkbox objects based on the contents of a plaintext file, and a function create_to_do_list_file to write a file which can be read by the other function.
# Define the checkbox class
class checkbox:
def __init__(self,label):
self.label = label
self.checked = False
def check(self):
self.checked = True
def read(self):
if self.checked:
return f"x | {self.label}"
if not self.checked:
return f"o | {self.label}"
def open_create_boxes(file):
# Open the to-do list
opened_list = open(f"{file}.dat", "r")
#Split the to-do list by line
parsed_line = opened_list.read().split("\n")
next_parsed = []
# Split each element in parsed list by the pipe symbol
for i in parsed_line:
next_parsed.append(i.split("|"))
to_do_list = []
# Iterates through the new list, creates checkbox object, checks to see if it is "checked" or not
for i in next_parsed:
b = checkbox(i[1])
if i[0] == "x":
b.check()
to_do_list.append(b)
return to_do_list
def create_to_do_list_file(list,label):
# Open or create a file label.dat
new_file = open(f"{label}.dat", "w")
for n in list:
new_file.write(f"\no|{n}")
create_to_do_list_file(["Task"],"filename")
open_create_boxes("filename")
Running this code gives me the error:
File "/home/genie/Desktop/checkboxes/checkboxes.py", line 41, in <module>
open_create_boxes("filename")
File "/home/genie/Desktop/checkboxes/checkboxes.py", line 28, in open_create_boxes
b = checkbox(i[1])
IndexError: list index out of range
So something is going wrong in my open_create_boxes function, where the list is coming out with <2 elements. I have re-written this code several times and get the same, or similar, errors.
Any help here? I'm a beginner, so I imagine there's an obvious fix, but I can't seem to manage.
Thanks!!
Quickly running your code and examining the created file reveals that its first line is empty. Of course you will only get one field if you attempt to split that. Why are you writing \n in front of the data?
Similarly, your code to read the file fails to account for the empty line, whether at the beginning or the end of the file.
More tangentially, you are forgetting to close the file you write, and generally overcomplicate matters.
Here is a quick refactoring of the two file-management functions.
def open_create_boxes(file):
to_do_list = []
with open(f"{file}.dat", "r") as lines:
for line in lines:
i = line.rstrip("\n").split("|")
# print("#", i)
b = checkbox(i[1])
if i[0] == "x":
b.check()
to_do_list.append(b)
return to_do_list
# Don't call your variable "list"
def create_to_do_list_file (items ,label):
with open(f"{label}.dat", "w") as new_file:
for n in items:
new_file.write(f"o|{n}\n")
As a general first tip for how to debug things, break your problem into smaller steps and add a print statement at various points to verify that the variables contain what you hope they should, or use a debugger and set a breakpoint to examine the program's state at those spots.
The code I am running so far is as follows
import os
import math
import statistics
def main ():
infile = open('USPopulation.txt', 'r')
values = infile.read()
infile.close()
index = 0
while index < len(values):
values(index) = int(values(index))
index += 1
print(values)
main()
The text file contains 41 rows of numbers each entered on a single line like so:
151868
153982
156393
158956
161884
165069
168088
etc.
My tasks is to create a program which shows average change in population during the time period. The year with the greatest increase in population during the time period. The year with the smallest increase in population (from the previous year) during the time period.
The code will print each of the text files entries on a single line, but upon trying to convert to int for use with the statistics package I am getting the following error:
values(index) = int(values(index))
SyntaxError: can't assign to function call
The values(index) = int(values(index)) line was taken from reading as well as resources on stack overflow.
You can change values = infile.read() to values = list(infile.read())
and it will have it ouput as a list instead of a string.
One of the things that tends to happen whenever reading a file like this is, at the end of every line there is an invisible '\n' that declares a new line within the text file, so an easy way to split it by lines and turn them into integers would be, instead of using values = list(infile.read()) you could use values = values.split('\n') which splits the based off of lines, as long as values was previously declared.
and the while loop that you have can be easily replace with a for loop, where you would use len(values) as the end.
the values(index) = int(values(index)) part is a decent way to do it in a while loop, but whenever in a for loop, you can use values[i] = int(values[i]) to turn them into integers, and then values becomes a list of integers.
How I would personally set it up would be :
import os
import math
import statistics
def main ():
infile = open('USPopulation.txt', 'r')
values = infile.read()
infile.close()
values = values.split('\n') # Splits based off of lines
for i in range(0, len(values)) : # loops the length of values and turns each part of values into integers
values[i] = int(values[i])
changes = []
# Use a for loop to get the changes between each number.
for i in range(0, len(values)-1) : # you put the -1 because there would be an indexing error if you tried to count i+1 while at len(values)
changes.append(values[i+1] - values[i]) # This will get the difference between the current and the next.
print('The max change :', max(changes), 'The minimal change :', min(changes))
#And since there is a 'change' for each element of values, meaning if you print both changes and values, you would get the same number of items.
print('A change of :', max(changes), 'Happened at', values[changes.index(max(changes))]) # changes.index(max(changes)) gets the position of the highest number in changes, and finds what population has the same index (position) as it.
print('A change of :', min(changes), 'Happened at', values[changes.index(min(changes))]) #pretty much the same as above just with minimum
# If you wanted to print the second number, you would do values[changes.index(min(changes)) + 1]
main()
If you need any clarification on anything I did in the code, just ask.
I personally would use numpy for reading a text file.
in your case I would do it like this:
import numpy as np
def main ():
infile = np.loadtxt('USPopulation.txt')
maxpop = np.argmax(infile)
minpop = np.argmin(infile)
print(f'maximum population = {maxpop} and minimum population = {minpop}')
main()
I am studying "Python for Everybody" book written by Charles R. Severance and I have a question to the exercise2 from Chapter7.
The task is to go through the mbox-short.txt file and "When you encounter a line that starts with “X-DSPAM-Confidence:” pull apart the line to extract the floating-point number on the line. Count these lines and then compute the total of the spam confidence values from these lines. When you reach the end of the file, print out the average spam confidence."
Here is my way of doing this task:
fname = input('Enter the file name: ')
try:
fhand = open(fname)
except:
print('File cannot be opened:', fname)
exit()
count = 0
values = list()
for line in fhand:
if line.startswith('X-DSPAM-Confidence:'):
string = line
count = count + 1
colpos = string.find(":")
portion = string[colpos+1:]
portion = float(portion)
values.append(portion)
print('Average spam confidence:', sum(values)/count)
I know this code works because I get the same result as in the book, however, I think this code can be simpler. The reason I think so is because I used a list in this code (declared it and then stored values in it). However, "Lists" is the next topic in the book and when solving this task I didn't know anything about lists and had to google them. I solved this task this way, because this is what I'd do in the R language (which I am already quite familiar with), I'd make a vector in which I'd store the values from my iteration.
So my question is: Can this code be simplified? Can I do the same task without using list? If yes, how can I do it?
I could change the "values" object to a floating type. The overhead of a list is not really needed in the problem.
values = 0.0
Then in the loop use
values += portion
Otherwise, there really is not a simpler way as this problem has tasks and you must meet all of the tasks in order to solve it.
Open File
Check For Error
Loop Through Lines
Find certain lines
Total up said lines
Print average
If you can do it in 3 lines of code great but that doesn't make what goes on in the background necessarily simpler. It will also probably look ugly.
You could filter the file's lines before the loop, then you can collapse the other variables into one, and get the values using list-comprehension. From that, you have your count from the length of that list.
interesting_lines = (line.startswith('X-DSPAM-Confidence:') for line in fhand)
values = [float(line[(line.find(":")+1):]) for line in interesting_lines]
count = len(values)
Can I do the same task without using list?
If the output needs to be an average, yes, you can accumlate the sum and the count as their own variables, and not need a list to call sum(values) against
Note that open(fname) is giving you an iterable collection anyway, and you're looping over the "list of lines" in the file.
List-comprehensions can often replace for-loops that add to a list:
fname = input('Enter the file name: ')
try:
fhand = open(fname)
except:
print('File cannot be opened:', fname)
exit()
values = [float(l[l.find(":")+1:]) for l in fhand if l.startswith('X-DSPAM-Confidence:')]
print('Average spam confidence:', sum(values)/len(values))
The inner part is simply your code combined, so perhaps less readable.
EDIT: Without using lists, it can be done with "reduce":
from functools import reduce
fname = input('Enter the file name: ')
try:
fhand = open(fname)
except:
print('File cannot be opened:', fname)
exit()
sum, count = reduce(lambda acc, l: (acc[0] + float(l[l.find(":")+1:]), acc[1]+1) if l.startswith('X-DSPAM-Confidence:') else acc, fhand, (0,0))
print('Average spam confidence:', sum / count)
Reduce is often called "fold" in other languages, and it basically allows you to iterate over a collection with an "accumulator". Here, I iterate the collection with an accumulator which is a tuple of (sum, count). With each item, we add to the sum and increment the count. See Reduce documentation.
All this being said, "simplify" does not necessarily mean as little code as possible, so I would stick with your own code if you're not comfortable with these shorthand notations.
I'm having some trouble here. For my CS assignment, I have to have python take data from a file on my pc and run the data through my program.
So, this code works fine on http://repl.it/languages/Python, but not in python. I'm assuming because my line of code has some Python 2.0 lines of code? I can't seem to fix it. Can you guys help? And, another small question except this one. I have to input some code in my program to take data from a file and run it through my program as I stated above. I have this.
import math
def mean(values):
average = sum(values)*1.0/len(values)
return average
def deviation(values):
length = len(values)
m = mean(values)
total_sum = 0
for i in range(length):
total_sum += (values[i]-m)**2
root = total_sum*1.0/length
return math.sqrt(root)
def median(values):
if len(values)%2 != 0:
return sorted(values)[len(values)/2]
else:
midavg = (sorted(values)[len(values)/2] + sorted(values)[len(values)/2-1])/2.0
return midavg
def main():
x = [15, 17, 40, 16, 9]
print mean(x)
print deviation(x)
print median(x)
main()
How do I specifically have the program take data from the file and run it through my program? The data is just a bunch of numbers, by the way. It's been giving me trouble for some hours now. Thanks if you can help out.
This is what I know about the opening/closing file stuff so far
f = open("filename.txt")
data = f.readlines()
f.close()
Apparently you are using python2.x:
I'm assuming because my line of code has some Python 2.0 lines of code?
So yes, you do have a problem: In python3.x, print became a function.
Thus, your prints need to be changed:
print mean(x)
print deviation(x)
print median(x)
Becomes
print(mean(x))
print(deviation(x))
print(median(x))
Also, your part about opening and closing files is unclear.
what i want to do is write a code that has a file (in the code, no need to be input by user), and the code picks a random line from the file - whatever it is, a long line, an ip or even a word and at the end of the loop puts it into a string so i could use that in other parts of the code.
i tried using randomchoice(lines) but wasn't sure how to continue from here.
after that i tried using:
import random
def random_line(afile):
line = next(afile)
for num, aline in enumerate(afile):
if random.randrange(num + 2): continue
line = aline
return line
which also for some reason didnt work for me.
The last method you posted worked for me. Maybe you are not opening the file correctly. Here is another approach, using random.choice
import random
def random_line(f):
return random.choice([line for line in f])
f = open("sample.txt", 'r')
print random_line(f)
Edit:
Another way would be (thanks to #zhangxaochen):
def random_line(f):
return random.choice(f.readlines())
Translating another answer of mine from C:
def random_line(afile):
count = 0
kept_line = None
for line in afile:
if random.randint(0, count) == 0:
kept_line = line
count += 1
return kept_line
Edit: This appears to do the same thing as random.choice. I wonder if they use the same algorithm?
Edit 2: from the comments and a little experimentation it appears random.choice uses a different algorithm, which will be much more efficient if all of the elements are already in memory. This isn't usually the case for files unless you use readlines. There will be a tradeoff between having to keep the entire file in memory vs. having to calculate n random numbers.