I'm trying to calculate the average of the numeric grade for students who take [0]=14224. But how do I tell my program to ignore any grades with a 'W'?
import sys
import csv
def findnumericgrade(grade):
if grade == 'A':
return 4.0
elif grade == 'B':
return 3.0
else:
return 2.0
def loaddata(filename, course):
count = 0
total = 0.0
with open(filename, 'r') as f:
lines = csv.reader(f)
next(lines)
for row in lines:
if course in row[0]:
get_grade = findnumericgrade(row[3])
total += float(get_grade)
count += 1
avg = total / count
print(f"The {course} average is: {round(avg, 2)}")
loaddata('studentdata.csv', sys.argv[1])
#example of studentdata.csv:
There are certainly a number of ways. The easiest approach is probably just to check for the 'W' string and continue to the next row.
One approach to doing this is to use the continue control to move on to the next iteration in the loop.
def loaddata(filename, course):
count = 0
total = 0.0
with open(filename, 'r') as f:
lines = csv.reader(f)
next(lines)
for row in lines:
if row[3] == 'W':
continue # Go to next iteration in loop
if course in row[0]:
get_grade = findnumericgrade(row[3])
total += float(get_grade)
count += 1
avg = total / count
print(f"The {course} average is: {round(avg, 2)}")
You can also do this by making your if statement the and boolean operation to also ensure that Course_Grade is not 'W'.
def loaddata(filename, course):
count = 0
total = 0.0
with open(filename, 'r') as f:
lines = csv.reader(f)
next(lines)
for row in lines:
if course in row[0] and row[3] != 'W':
get_grade = findnumericgrade(row[3])
total += float(get_grade)
count += 1
avg = total / count
print(f"The {course} average is: {round(avg, 2)}")
The above solutions are probably most practical, since this looks like some sort of utility script, but depending on how large you expect your dataset to be, you could use something like pandas. Then you'd have access to all of the data manipulation and analysis tools it offers.
import sys
import pandas as pd
def find_numeric_grade(grade):
if grade == 'A':
return 4.0
elif grade == 'B':
return 3.0
else:
return 2.0
df = pd.read_csv('studentdata.csv')
section_number = int(sys.argv[1])
print(df[(section_number == df['Section_Number']) & (df['Course_Grade'] != 'W')]
['Course_Grade'].apply(find_numeric_grade).mean())
*Solutions tested with the following data in studentdata.csv
Section_Number,Prof_ID,Student_ID,Course_Grade,Student_Name,Course_ID
14224,5,109,B,John Smith,IT1130
14224,5,110,B,Jennifer Johnson,IT1130
14224,5,111,W,Kristen Hawkins,IT1130
14224,5,112,A,Tom Brady,IT1130
14224,5,113,C,Cam Newton,IT1130
14224,5,114,C,Tim Tebow,IT1130
14225,5,115,A,Peyton Manning,IT1130
14225,5,116,B,Maria Sharapova,IT1130
14225,5,117,W,Brian McCoy,IT1130
if course in row[0]:
if row[3]!='W':
get_grade = findnumericgrade(row[3])
total += float(get_grade)
count += 1
avg = total / count
Related
def list():
list_name = []
list_name_second = []
with open('CoinCount.txt', 'r', encoding='utf-8') as csvfile:
num_lines = 0
for line in csvfile:
num_lines = num_lines + 1
i = 0
while i < num_lines:
for x in volunteers[i].name:
if x not in list_name: # l
f = 0
while f < num_lines:
addition = []
if volunteers[f].true_count == "Y":
addition.append(1)
else:
addition.append(0)
f = f + 1
if f == num_lines:
decimal = sum(addition) / len(addition)
d = decimal * 100
percentage = float("{0:.2f}".format(d))
list_name_second.append({'Name': x , 'percentage': str(percentage)})
list_name.append(x)
i = i + 1
if i == num_lines:
def sort_percentages(list_name_second):
return list_name_second.get('percentage')
print(list_name_second, end='\n\n')
above is a segment of my code, it essentially means:
If the string in nth line of names hasn't been listed already, find the percentage of accurate coins counted and then add that all to a list, then print that list.
the issue is that when I output this, the program is stuck on a while loop continuously on addition.append(1), I'm not sure why so please can you (using the code displayed) let me know how to update the code to make it run as intended, also if it helps, the first two lines of code within the txt file read:
Abena,5p,325.00,Y
Malcolm,1p,3356.00,N
this doesn't matter much but just incase you need it, I suspect that the reason it is stuck looping addition.append(1) is because the first line has a "Y" as its true_count
I want to divide a file in a two random halfs with python. I have a small script, but it did not divide exactly into 2. Any suggestions?
import random
fin = open("test.txt", 'rb')
f1out = open("test1.txt", 'wb')
f2out = open("test2.txt", 'wb')
for line in fin:
r = random.random()
if r < 0.5:
f1out.write(line)
else:
f2out.write(line)
fin.close()
f1out.close()
f2out.close()
The notion of randomness means that you will not be able to deterministically rely on the number to produce an equal amount of results below 0.5 and above 0.5.
You could use a counter and check if it is even or odd after shuffling all the lines in a list:
file_lines = [line for line in fin]
random.shuffle(file_lines)
counter = 0
for line in file_lines:
counter += 1
if counter % 2 == 0:
f1out.write(line)
else:
f2out.write(line)
You can use this pattern with any number (10 in this example):
counter = 0
for line in file_lines:
counter += 1
if counter % 10 == 0:
f1out.write(line)
elif counter % 10 == 1:
f2out.write(line)
elif counter % 10 == 2:
f3out.write(line)
elif counter % 10 == 3:
f4out.write(line)
elif counter % 10 == 4:
f5out.write(line)
elif counter % 10 == 5:
f6out.write(line)
elif counter % 10 == 6:
f7out.write(line)
elif counter % 10 == 7:
f8out.write(line)
elif counter % 10 == 8:
f9out.write(line)
else:
f10out.write(line)
random will not give you exactly half each time. If you flip a coin 10 times, you dont necessarily get 5 heads and 5 tails.
One approach would be to use the partitioning method described in Python: Slicing a list into n nearly-equal-length partitions, but shuffling the result beforehand.
import random
N_FILES = 2
out = [open("test{}.txt".format(i), 'wb') for i in range(min(N_FILES, n))]
fin = open("test.txt", 'rb')
lines = fin.readlines()
random.shuffle(lines)
n = len(lines)
size = n / float(N_FILES)
partitions = [ lines[int(round(size * i)): int(round(size * (i + 1)))] for i in xrange(n) ]
for f, lines in zip(out, partitions):
for line in lines:
f.write(line)
fin.close()
for f in out:
f.close()
The code above will split the input file into N_FILES (defined as a constant at the top) of approximately equal size, but never splitting beyond one line per file. Handling things this way would let you put this into a function that can take a variable number of files to split into without having to alter code for each case.
Here is my code I need it to read a line of text that just composes of y's a's and n's y meaning yes n meaning no a meaning abstain, I'm trying to add up the number of yes votes. The text file looks like this:
Aberdeenshire
yyynnnnynynyannnynynanynaanyna
Midlothian
nnnnynyynyanyaanynyanynnnanyna
Berwickshire
nnnnnnnnnnnnnnnnnnnnynnnnnynnnnny
here is my code:
def main():
file = open("votes.txt")
lines = file.readlines()
votes = 0
count = 0
count_all = 0
for m in range(1,len(lines),2):
line = lines[m]
for v in line:
if v == 'a':
votes += 1
elif v == 'y':
count_all += 1
count += 1
votes += 1
else:
count_all += 1
print("percentage:" + (str(count/count_all)))
print("Overall there were ", (count/count_all)," yes votes")
main()
First of all, you should note that your file.readlines() actually gives you the \n at the end of each line, which in your code will both be treated in the else block, so as no's:
>>> with open("votes.txt","r") as f:
... print(f.readlines())
...
['Aberdeenshire\n',
'yyynnnnynynyannnynynanynaanyna\n',
'Midlothian\n',
'nnnnynyynyanyaanynyanynnnanyna\n',
'Berwickshire\n',
'nnnnnnnnnnnnnnnnnnnnynnnnnynnnnny\n']
So that might explain why you don't find the good numbers...
Now, to make the code a bit more efficient, we could look into the count method of str, and maybe also get rid of those \n with a split rather than a readlines:
with open("votes.txt","r") as f:
full = f.read()
lines = full.split("\n")
votes = 0
a = 0
y = 0
n = 0
for m in range(1,len(lines),2):
line = lines[m]
votes += len(line) # I'm counting n's as well here
a += line.count("a")
y += line.count("y")
n += line.count("n")
print("Overall, there were " + str(100 * y / (y + n)) + "% yes votes.")
Hope that helped!
More or less pythonic one liner, it doesn't give you the votes for each person/city tho:
from collections import Counter
l = """Aberdeenshire
yyynnnnynynyannnynynanynaanyna
Midlothian
nnnnynyynyanyaanynyanynnnanyna
Berwickshire
nnnnnnnnnnnnnnnnnnnnynnnnnynnnnny"""
Counter([char for line in l.split('\n')[1::2] for char in line.strip()])
Returns:
Counter({'a': 11, 'n': 60, 'y': 22})
I have a large number of text files (>1000) with the same format for all.
The part of the file I'm interested in looks something like:
# event 9
num: 1
length: 0.000000
otherstuff: 19.9 18.8 17.7
length: 0.000000 176.123456
# event 10
num: 1
length: 0.000000
otherstuff: 1.1 2.2 3.3
length: 0.000000 1201.123456
I need only the second index value of the second instance of the defined variable, in this case length. Is there a pythonic way of doing this (i.e. not sed)?
My code looks like:
with open(wave_cat,'r') as catID:
for i, cat_line in enumerate(catID):
if not len(cat_line.strip()) == 0:
line = cat_line.split()
#replen = re.sub('length:','length0:','length:')
if line[0] == '#' and line[1] == 'event':
num = long(line[2])
elif line[0] == 'length:':
Length = float(line[2])
If you can read the entire file into memory, just do a regex against the file contents:
for fn in [list of your files, maybe from a glob]:
with open(fn) as f:
try:
nm=pat.findall(f.read())[1]
except IndexError:
nm=''
print nm
If larger files, use mmap:
import re, mmap
nth=1
pat=re.compile(r'^# event.*?^length:.*?^length:\s[\d.]+\s(\d+\.\d+)', re.S | re.M)
for fn in [list of your files, maybe from a glob]:
with open(fn, 'r+b') as f:
mm = mmap.mmap(f.fileno(), 0)
for i, m in enumerate(pat.finditer(mm)):
if i==nth:
print m.group(1)
break
Use a counter:
with open(wave_cat,'r') as catID:
ct = 0
for i, cat_line in enumerate(catID):
if not len(cat_line.strip()) == 0:
line = cat_line.split()
#replen = re.sub('length:','length0:','length:')
if line[0] == '#' and line[1] == 'event':
num = long(line[2])
elif line[0] == 'length:':
ct += 1
if ct == 2:
Length = float(line[2])
ct = 0
You're on the right track. It'll probably be a bit faster deferring the splitting unless you actually need it. Also, if you're scanning lots of files and only want the second length entry, it will save a lot of time to break out of the loop once you've seen it.
length_seen = 0
elements = []
with open(wave_cat,'r') as catID:
for line in catID:
line = line.strip()
if not line:
continue
if line.startswith('# event'):
element = {'num': int(line.split()[2])}
elements.append(element)
length_seen = 0
elif line.startswith('length:'):
length_seen += 1
if length_seen == 2:
element['length'] = float(line.split()[2])
I am trying to determine the median and mode from a list of numbers in "numbers.txt" file.
I am EXTREMELY new to python and have ZERO coding experience.
This is what I have so far calculating mean, sum, count, max, and min but I have no idea where to go from here.
number_file_name = 'numbers.txt'
number_sum = 0
number_count = 0
number_average = 0
number_maximum = 0
number_minimum = 0
number_range = 0
do_calculation = True
while(do_calculation):
while (True):
try:
# Get the name of a file
number_file_name = input('Enter a filename. Be sure to include .txt after the file name: ')
random_number_count = 0
print('')
random_number_file = open(number_file_name, "r")
print ('File Name: ', number_file_name, ':', sep='')
print('')
numbers = random_number_file.readlines()
random_number_file.close
except:
print('An error occured trying to read', random_number_file)
else:
break
try:
number_file = open(number_file_name, "r")
is_first_number = True
for number in number_file:
number = int(number) # convert the read string to an int
if (is_first_number):
number_maximum = number
number_minimum = number
is_first_number = False
number_sum += number
number_count += 1
if (number > number_maximum):
number_maximum = number
if (number < number_minimum):
number_minimum = number
number_average = number_sum / number_count
number_range = number_maximum - number_minimum
index = 0
listnumbers = 0
while index < len(numbers):
numbers[index] = int(numbers[index])
index += 1
number_file.close()
except Exception as err:
print ('An error occurred reading', number_file_name)
print ('The error is', err)
else:
print ('Sum: ', number_sum)
print ('Count:', number_count)
print ('Average:', number_average)
print ('Maximum:', number_maximum)
print ('Minimum:', number_minimum)
print ('Range:', number_range)
print ('Median:', median)
another_calculation = input("Do you want to enter in another file name? (y/n): ")
if(another_calculation !="y"):
do_calculation = False
If you want to find the median and mode of the numbers, you need to keep track of the actual numbers you've encountered so far. You can either create a list holding all the numbers, or a dictionary mapping numbers to how often you've seen those. For now, let's create a (sorted) list from those numbers:
with open("numbers.txt") as f:
numbers = []
for line in f:
numbers.append(int(line))
numbers.sort()
Or shorter: numbers = sorted(map(int, f))
Now, you can use all sorts of builtin functions to calculate count, sum, min and max
count = len(numbers)
max_num = max(numbers)
min_num = min(numbers)
sum_of_nums = sum(numbers)
Calculating the mode and median can also be done very quickly using the list of numbers:
median = numbers[len(numbers)//2]
mode = max(numbers, key=lambda n: numbers.count(n))
Maybe there is a reason for it but why are you avoiding using the python libraries? Numpy and scipy should have everything you are looking for such a task.
Have a look at numpy.genfromtxt() , numpy.mean() and scipy.stats.mode().