I want to write code that will take a file of grades and return an average of it all, so homework average, project average, quiz average, and semester average. In the file it would have a column at the start which would be the "legend" to the code. Here's an example of what the grade file would look like:
last, first, hw, hw, project, quiz, hw, hw, hw, quiz, hw, hw, project
#It won't also be in this order, that's what makes this hard!
Cat, Figaro, 57, 58, 71, 93, 56, 86, 90, 99, 55, 99, 88
#Not a real name lol, there would also be A LOT more names and grades
I can't figure out how to make it iterate through the legend column and the grades column to correctly grade the file. Here's the formula for grading:
semester average = homework average * 0.2 + quiz average * 0.2 + project average * 0.6.
Here's what I have so far. I'm terrible with files so I only know how to call a column.
def start():
gb_data = open('gb_data.txt','r')
header = gb_data.readline()
print(header.strip())
the strip() at the end would get rid of the \n that happens when you change lines. This prints the first column of the file but I want to iterate through it and identify the legend, not just print it. Here's what the return file would look like:
Cat, Figaro: hw avg = 71.57, quiz avg = 96.0, proj avg = 79.5, sem avg = 81.21
Please help! This is NOT homework and is simply a project to understand files better, my name is Scarlett btw.
ginkul's answer couldn't cover to handle multiple row of values
so I try to write code more generally like below:
def get_header():
gb_data = open('gb_data.txt','r')
header = gb_data.readline()
return header.strip()
def get_content():
gb_data = open('gb_data.txt','r')
content = gb_data.readlines()
del content[0]
return content
hw_pos = []
project_pos = []
quiz_pos = []
header_list = get_header()
header_list = header_list.split(", ")
header_list_index = enumerate(header_list)
for index, target in header_list_index:
if target == "hw":
hw_pos.append(index)
elif target == "quiz":
quiz_pos.append(index)
elif target == "project":
project_pos.append(index)
content_list = get_content()
avg_dict = {}
for element in content_list:
element = element.strip().split(", ")
name = element[0] + ', ' + element[1]
hw_avg = sum([int(element[i]) for i in hw_pos]) / len(hw_pos)
project_avg = sum([int(element[i]) for i in project_pos]) / len(project_pos)
quiz_avg = sum([int(element[i]) for i in quiz_pos]) / len(quiz_pos)
avg_dict.update({name:(hw_avg, project_avg, quiz_avg)})
for name, avg in avg_dict.items():
print(name, "hw avg : ", round(avg[0], 2), "project avg : ", round(avg[1], 2), "quiz avg : ", round(avg[2], 2))
save your all output to another text file
for name, avg in avg_dict.items():
with open("avg.txt", "a") as f:
dataline = name + " hw avg : " + str(round(avg[0], 2)) + \
" project avg : " + str(round(avg[1], 2)) + " quiz avg : " + str(round(avg[2], 2)) + "\n"
f.write(dataline)
You probably want to have a look at pandas. It's best to give your columns unique column names, though.
import pandas as pd
def get_avg(row):
hw_avg = (row['hw_1'] + row['hw_1'] + row['hw_1']) / 3
# fill in as needed
quiz_avg = ...
proj_avg = ...
return hw_avg * 0.2 + quiz_avg * 0.2 + proj_avg * 0.6
# read your data
df = pd.read_csv('your-file.csv')
# 'apply' a function to each row (axis=1) in a dataframe,
# add all results to a new column called 'semester_avg'
df['semester_avg'] = df.apply(get_avg, axis=1)
EDIT This code is incorrect for multiple rows
You could try the following code:
def start():
with open('gb_data_.txt', 'r') as f:
keys = f.readline().strip().split(',')
values = f.readline().strip().split(',')
last = values[0]
first = values[1]
hw = [int(v) for k, v in zip(keys, values) where 'hw' in k]
hw_avr = sum(hw) / len(hw)
project = [int(v) for k, v in zip(keys, values) where 'project' in k]
project_avr = sum(project) / len(project)
quiz = [int(v) for k, v in zip(keys, values) where 'quiz' in k]
quiz_avr = sum(quiz) / len(quiz)
sem_avr = hw_avr * 0.2 + quiz_avr * 0.2 + project_avr * 0.2
Related
Here is my code:
inputFile = open("Employees.txt", "r").read()
inputList = inputFile.split("\n")
fList = []
def listString(s):
string = ""
return (string.join(s))
for i in inputList:
for x in i.split(","):
fList.append(x)
for y in range (len(fList)):
**if fList[y] == "90000":
fList[y] = str(90000 * 1.05) + "\n"
elif fList[y] == "75000":
fList[y] = str(75000 * 1.05) + "\n"
elif fList[y] == "110000":
fList[y] = str(110000 * 1.05) + "\n"
else:
fList[y] = fList[y] + ","**
print(listString(fList))
file = open("Emp_Bonus.txt", "a")
file.write(listString(fList))
Employees.txt contains the following:
Adam Lee,Programmer,90000
Morris Heather,DA,75000
John Lee,PM,110000
I am trying to get the following output:
Adam Lee,Programmer,94500
Morris Heather,DA,78750
John Lee,PM,115500
The part of the code that is in bold is the problem, The input salaries need to be able to be different values instead of the code only working for the sample input. The input salaries have to be multiplied by 1.05. How should I go about doing this? Thanks!
Another way without any library. Just read lines of the file as a list using readlines() and then iterate each line. Only modify the last part after splitting it using split(',') e.g salary of each line and finally create the new file as per the requirements.
multiply, final_result = 1.05, []
with open('Employees.txt', 'r') as f:
fList = f.readlines()
if fList:
for line in fList:
employee_info = line.split(',')
name = employee_info[0]
designation = employee_info[2]
salary = float(employee_info[2].replace('\n','').strip()) * multiply
final_result.append(f"{name},{employee_info[1]},{salary}")
if final_result:
with open('Emp_Bonus.txt', 'w') as f:
f.write('\n'.join(final_result))
Output:
Adam Lee,Programmer,94500.0
Morris Heather,DA,78750.0
John Lee,PM,115500.0
I will like to use Pandas:
import pandas as pd
df = pd.read_csv("Employees.txt",header=None)
df[2] = df.loc[df[2].isin([90000,75000,110000]),2]*1.05
df[2] = df[2].astype(int)
df.to_csv("Emp_Bonus.txt",mode="a",header=None)
I have code that will take a legend of grades, and number of grades and then return values for it. I have everything right except for the semester average. Here's the formula to finding semester average:
homework average * 0.2 + quiz average * 0.2 + project average * 0.6.
My code works well with homework averages, quiz averages, and project averages but not with semester average.
Here's what I have written:
def get_header():
gb_data = open('gb_data.txt','r')
header = gb_data.readline()
return header.strip()
def get_content():
gb_data = open('gb_data.txt','r')
content = gb_data.readlines()
del content[0]
return content
hw_pos = []
project_pos = []
quiz_pos = []
header_list = get_header()
header_list = header_list.split(", ")
header_list_index = enumerate(header_list)
for index, target in header_list_index:
if target == "hw":
hw_pos.append(index)
elif target == "quiz":
quiz_pos.append(index)
elif target == "project":
project_pos.append(index)
content_list = get_content()
avg_dict = {}
for element in content_list:
element = element.strip().split(", ")
name = element[0] + ', ' + element[1]
hw_avg = sum([int(element[i]) for i in hw_pos]) / len(hw_pos)
quiz_avg = sum([int(element[i]) for i in quiz_pos]) / len(quiz_pos)
project_avg = sum([int(element[i]) for i in project_pos]) / len(project_pos)
sem_avg = hw_avg * 0.2 + quiz_avg * 0.2 + project_avg * 0.6
avg_dict.update({name:(hw_avg, quiz_avg, project_avg, sem_avg)})
f = open('avg.txt', 'w')
for name, avg in avg_dict.items():
dataline = name + ": hw avg = " + str(round(avg[0], 2)) + ", quiz avg = " + str(round(avg[2], 2)) + ", proj avg = " + str(round(avg[1], 2)) + ", sem avg = " + str(round(avg[2], 2)) + "\n"
f.write(dataline)
f.close()
Here's an example of an input I put in:
last, first, hw, hw, project, quiz, hw, hw, hw, quiz, hw, hw, project
Cat, Figaro, 57, 58, 71, 93, 56, 86, 90, 99, 55, 99, 88
The top line is the legend so ignore that, my code handles that
Here's what should be given back:
Cat, Figaro: hw avg = 71.57, quiz avg = 96.0, proj avg = 79.5, sem avg = 81.21
Here's what I actually get back:
Cat, Figaro: hw avg = 71.57, quiz avg = 96.0, proj avg = 79.5, sem avg = 96.0
I want to make them match EXACTLY, down to every character. I just need to know how to round it correctly. This is NOT homework and is just a project to understand files better, I'm very close!! My name is Scarlett btw please help!!!
You have a typo this line:
dataline = name + ": hw avg = " + str(round(avg[0], 2)) + ", quiz avg = " + str(round(avg[2], 2)) + ", proj avg = " + str(round(avg[1], 2)) + ", sem avg = " + str(round(avg[2], 2)) + "\n"
It should be
dataline = name + ": hw avg = " + str(round(avg[0], 2)) + ", quiz avg = " + str(round(avg[1], 2)) + ", proj avg = " + str(round(avg[2], 2)) + ", sem avg = " + str(round(avg[3], 2)) + "\n"
instead. The calculation is fine, just the output was messed up (You printed 0, 2, 1, 2 instead of 0, 1, 2, 3). To prevent mistakes like this in the future, maybe take a look at Pandas with its column names? Pandas is probably an overkill in this case, but a very powerful tool for table-based calculations.
This should work:
from statistics import mean
legend = ['last', 'first', 'hw', 'hw', 'project', 'quiz', 'hw', 'hw', 'hw', 'quiz', 'hw', 'hw', 'project']
grades = ['Cat', 'Figaro', 57, 58, 71, 93, 56, 86, 90, 99, 55, 99, 88]
hw_avg = mean([g for l, g in zip(legend, grades) if l == 'hw'])
quiz_avg = mean([g for l, g in zip(legend, grades) if l == 'quiz'])
project_avg = mean([g for l, g in zip(legend, grades) if l == 'project'])
sem_avg = hw_avg * 0.2 + quiz_avg * 0.2 + project_avg * 0.6
print(f'{grades[0]}, {grades[1]}: hw avg = {hw_avg:.2f}, quiz avg = {quiz_avg:.2f}, project avg = {project_avg:.2f}, sem avg = {sem_avg:.2f}')
It gives me the following:
Cat, Figaro: hw avg = 71.57, quiz avg = 96.00, project avg = 79.50, sem avg = 81.21
Note that I'm just using statistics.mean as it makes the code cleaner. But you could do the same with a method like yours.
everyone.
I have some problems with calculating gcskews in python.
My 2 major inputs are fasta file and bed file.
Bed file has columns of gn(0), gene_type(1), gene name(2), chromosome(3), strand(4), num(5), start(6).(These numbers are index numbers in python.) Then I am trying to use some functions which can calculate gcskews of sense and antisense strand from the start site of each gene. The window is 100bp and these are the functions.
import re
import sys
import os
# opening bed file
content= []
with open("gene_info.full.tsv") as new :
for line in new :
content.append(line.strip().split())
content = content[1:]
def fasta2dict(fil):
dic = {}
scaf = ''
seq = []
for line in open(fil):
if line.startswith(">") and scaf == '':
scaf = line.split(' ')[0].lstrip(">").replace("\n", "")
elif line.startswith(">") and scaf != '':
dic[scaf] = ''.join(seq)
scaf = line.split(' ')[0].lstrip(">").replace("\n", "")
seq = []
else:
seq.append(line.rstrip())
dic[scaf] = ''.join(seq)
return dic
dic_file = fasta2dict("full.fa")
# functions for gc skew
def GC_skew_up(strand, loc, seq, window = 100) : # need -1 for index
values_up = []
loc = loc - 1
if strand == "+" :
sp_up = seq[loc - window : loc]
g_up = sp_up.count('G') + sp_up.count('g')
c_up = sp_up.count('C') + sp_up.count('c')
try :
skew_up = (g_up - c_up) / float(g_up + c_up)
except ZeroDivisionError:
skew_up = 0.0
values_up.append(skew_up)
elif strand == "-" :
sp_up = seq[loc : loc + window]
g_up = sp_up.count('G') + sp_up.count('g')
c_up = sp_up.count('C') + sp_up.count('c')
try :
skew_up = (c_up - g_up) / float(g_up + c_up)
except ZeroDivisionError:
skew_up = 0.0
values_up.append(skew_up)
return values_up
def GC_skew_dw(strand, loc, seq, window = 100) :
values_dw = []
loc = loc - 1
if strand == "+" :
sp_dw = seq[loc : loc + window]
g_dw = sp_dw.count('G') + sp_dw.count('g')
c_dw = sp_dw.count('C') + sp_dw.count('c')
try :
skew_dw = (g_dw - c_dw) / float(g_dw + c_dw)
except ZeroDivisionError:
skew_dw = 0.0
values_dw.append(skew_dw)
elif strand == "-" :
sp_dw = seq[loc - window : loc]
g_dw = sp_dw.count('G') + sp_dw.count('g')
c_dw = sp_dw.count('C') + sp_dw.count('c')
try :
skew_dw = (c_dw - g_dw) / float(g_dw + c_dw)
except ZeroDivisionError:
skew_dw = 0.0
values_dw.append(skew_dw)
return values_dw
As I said, I want to calculate the gcskews for 100bp of strands from the start site of genes.
Therefore, I made codes that get the chromosome name from the bed file and get the sequence data from the Fasta file.
Then according to gene name and strand information, I expected that codes will find the correct start site and gcskew for 100bp window will be calculated.
However, when I run this code, gcskew of - strand is wrong but + strand is correct. (I got correct gcskew data and I used it.)
Gcskews are different from the correct data, but I don't know what is the problem.
Could anyone tell me what is the problem of this code?
Thanks in advance!
window = 100
gname = []
up = []
dw = []
for match in content :
seq_chr = dic_file[str(match[3])]
if match[4] == "+" :
strand = match[4]
new = int(match[6])
sen_up = GC_skew_up(strand, new, seq_chr, window = 100)
sen_dw = GC_skew_dw(strand, new, seq_chr, window = 100)
gname.append(match[2])
up.append(str(sen_up[0]))
dw.append(str(sen_dw[0]))
if match[4] == "-" :
strand = match[4]
new = int(match[6])
an_up = GC_skew_up(strand, new, seq_chr, window = 100)
an_dw = GC_skew_dw(strand, new, seq_chr, window = 100)
gname.append(match[2])
up.append(str(an_up[0]))
dw.append(str(an_dw[0]))
tot = zip(gname, up, dw)
I have a program that operates on a csv file to create output that looks like this:
724, 2
724, 1
725, 3
725, 3
726, 1
726, 0
I would like to modify the script with some simple math operations such that it would render the output:
724, 1.5
725, 3
726, 0.5
The script I'm currently using is here:
lines=open("1.txt",'r').read().splitlines()
for l in lines:
data = l.split('"Overall evaluation:')
if len(data) == 2:
print(data[0] + ", " + data[1])
How could I add a simple averaging and slicing operation to that pipeline?
I guess I need to create some temporary variable, but it should be outside the loop that iterates over lines?
Maybe something like this:
lines=open("EasyChairData.csv",'r').read().splitlines()
for l in lines:
data = l.split('"Overall evaluation:')
submission_number_repo = data[0]
if len(data) == 2:
print(data[0] + ", " + data[1])
if submission_number_repo != data[0]
submission_number_repo = data[0]
EDIT
The function is just a simple average
You can use a dictionary that map the key to the total and count and then print it:
map = {}
lines=open("1.txt",'r').read().splitlines()
for l in lines:
data = l.split('"Overall evaluation:')
if len(data) == 2:
if data[0] not in map.keys():
map[data[0]] = (0,0)
map[data[0]] = (map[data[0]][0]+int(data[1]) , map[data[0]][1]+1)
for x, y in map.items():
print(str(x) + ", " + str(y[0]/y[1]))
I would just store an list of values with the key. Then take the average when file is read.
lines=open("1.txt",'r').read().splitlines()
results = {}
for l in lines:
data = l.split('"Overall evaluation:')
if len(data) == 2:
if data[0] in results:
results[data[0]].append(data[1])
else:
results[data[0]] = [data[1]]
for k,v in results.iteritems():
print("{} , {}".format(k, sum(v)/len(v) ))
A simple way is to keep a state storing current number, current sum and number of items, and only print it when current number changes (do not forget to print last state!). Code could be:
lines=open("1.txt",'r') # .read().splitlines() is useless and only force a full load in memory
state = [None]
for l in lines:
data = l.split('"Overall evaluation:')
if len(data) == 2:
if data[0] != state[0]:
if state[0] is not None:
average = state[1]/state[2]
print(state[0] + ", " + str(average))
state = [data[0], 0., 0]
state[1] += float(data[1])
state[2] += 1
if state[0] is not None:
average = state[1]/state[2]
print(data[0] + ", " + str(average))
(Edited to avoid storing of values)
I love defaultdict:
from collections import defaultdict
average = defaultdict(lambda: (0,0))
with open("1.txt") as input:
for line in input.readlines():
data = line.split('"Overall evaluation:')
if len(data) != 2:
continue
key = data[0].strip()
val = float(data[1])
average[key] = (val+average[key][0], average[key][1]+1)
for k in sorted(average):
v = average[k]
print "{},{}".format(k, v[0]/v[1])
I am doing a highscore in pygame; would like to check, how to we output the names and score to screen.
Right now I am able to grab the content from the file and output on screen.However, the data all displayed together in 1 line.
I tried to write "\n", but it doesn't work.
I am new to pygame, can someone help me out?
My code
f = open("score.txt",'r')
for line in f:
column = line.split("\t")
names = column[0]
scores = int(column[2])
scoreArray.append(scores)
nameArray.append(names)
data = list(zip(scoreArray,nameArray))
l = heapsort(data)
for i in l:
//output on screen
score = smallfont.render("\n "+ str(i[0]), True, Color.black)
nameRect = diskSurf.get_rect()
nameRect.midtop = (width / 2, height / 2)
screen.blit(score, nameRect)
score = smallfont.render("\n "+ str(i[1]), True, Color.black)
nameRect = diskSurf.get_rect()
nameRect.midtop = (width / 2, height / 2)
screen.blit(score, nameRect)
print i[1] , "\t", i[0] // in the command line
The output I wanted is
Name Score
abc 2
cde 5
ffd 10
but the output I have now is 1 on top of another in the same line
Edit code:
for i in l, range (0,520, 10):
screen.blit(smallfont.render(str(i[0]), True, Color.gray),(200,190))
screen.blit(smallfont.render(str(i[1]), True, Color.gray),(500,190))
print i[1] , "\t", i[0]
this code print out the value like this:
Name Score
(2,'abc')
Disclaimer: I have not tested this!
with open("score.txt",'r') as f:
name_scores = []
for line in f:
row = line.split("\t")
player_name = row[0]
score = int(row[2])
name_scores.append((name, score))
name_scores.sort(key=lambda name_score: name_score[1], reverse=True)
for i in xrange(len(name_scores)):
player_name, score = name_scores[i]
x_name = 200
x_score = 500
y = 10 * i
name_pos = (x_name, y)
score_pos = (x_score, y)
name_sprite = smallfont.render(player_name, True, Color.gray)
score_sprite = smallfont.render(score, True, Color.gray)
screen.blit(name_sprite, name_pos)
screen.blit(score_sprite, score_pos)