Looping through values in file data - python

Write a script high_scores.py that will read in a CSV file of users' scores and display the
highest score for each person. The file you will read in is named scores.csv. You should store the high scores as values
in a dictionary with the associated names as dictionary keys. This way, as you read in
each row of data, if the name already has a score associated with it in the dictionary, you
can compare these two scores and decide whether or not to replace the "current" high
score in the dictionary.
Use the sorted() function on the dictionary's keys in order to display an ordered list of
high scores, which should match this output:
Empiro 23
L33tH4x 42
LLCoolDave 27
MaxxT 25
Misha46 25
O_O 22
johnsmith 30
red 12
tom123 26
scores.csv :
LLCoolDave,23
LLCoolDave,27
red,12
LLCoolDave,26
tom123,26
O_O,7
Misha46,24
O_O,14
Empiro,18
Empiro,18
MaxxT,25
L33tH4x,42
Misha46,25
johnsmith,30
Empiro,23
O_O,22
MaxxT,25
Misha46,24
I stumbled on how to check if i need to replace score of certain name
import csv
dic = {}
with open("scores.csv", "r") as my_file:
my_file_reader = csv.reader(my_file)
for i in my_file_reader:
dic[i[0]] = i[1]

If you run your code on the csv, you'll see that LLCoolDave's score is 26 instead of 27. This is because you update your dictionary every time a new entry is seen, and effectively, you're keeping the most recent scores -- not the highest. To fix this, you can try something like:
import csv
dic = {}
with open("scores.csv", "r") as my_file:
my_file_reader = csv.reader(my_file)
for row in my_file_reader:
if row[0] in dic:
dic[row[0]] = max(dic[row[0]], row[1])
else:
dic[row[0]] = row[1]
print(dic)
Essentially, we are first checking whether an entry exists for the current user. If yes, his best score is the maximum of the new score and the previous best score. Otherwise, his best score is just whatever the new score is.

Related

How can I get average of value in list of lists and add it to a dictionary

I am looking for a way to get the average value of a list of records
# value of ids
id=xyz.domain.com response_time=10
id=xyz.domain.biz response_time=20
id=xyz.domain.com response_time=20
id=xyz.domain.co response_time=10
id=abc.domain.com response_time=100
id=xyz.domain.com response_time=10
id=xyz.domain.com response_time=10
and display some info in the following way:
xyz.domain.com
count = 4
avg_response_time = 12.5
mode_response_time = 10
median_response_time = 10
My approach was to get unique ids in the list and get the following info
get a count of them,
average of the response times
#lets say we read the values from a file and store it in logs list
content = open("logs.txt", 'r')
logs = []
logs_dict = {}
for line in content:
logs.append(line)
for log in logs:
log = log.split(' ')
id = log[0].split('=')[1]
response_time = log[1].split('=')[1]
# get count using dictionary
if id in logs_dict:
logs_dict[id] += 1
else:
logs_dict[id] = 1
To get other values like average, median I think we need a list of values in the dictionary mapped to the ID.
How can we solve this efficiently and any tips on doing this?
Thank you in advance
You could store the results of your log calls in a dictionary mapping the domain names to a list of ping values. If you use a defaultdict, you can simply append each ping without having to check for existence of the dictionary keys.
from collections import defaultdict
from statistics import mean
log_dict = defaultdict(list)
for line in open("logs.txt", "r"):
# ... processing into id and response_time values here
log_dict[site_id].append(response_time)
# calculate average
mean_response_times = {site_id: mean(pings) for site_id, pings in log_dict.items()}
num_responses = {site_id: len(pings) for site_id, pings in log_dict.items()}
Note also my for loop here. You can loop over a file handle to return the contents line by line.

parsing csv in python and writing in other csv, yes is choice present and no if choice absent

Input (new.csv:)
student Jack
Choice Phy
Choice Chem
Choice Maths
Choice Biology
student Jill
Choice Phy
Choice Biology
Choice Maths
Expected Output (out.csv)
Student Phy Chem Maths Biology
Jack Yes Yes Yes Yes
Jill Yes No Yes Yes
Parsing new.csv and writing result in out.csv.For each student name, writing YES if a choice of subject is present and NO if the subject is not in the choice(subjects become new header in out.csv).
Here I have used nested if to get desired output.Please help me with better pythonic way of code.
I am newbie to python.Eager to learn better way of coding.
P.S: Choice of subjects is not in the same order.
import csv
la =[]
l2=[]
with open("new.csv","r",newline='\n') as k:
k=csv.reader(k, delimiter=',', quotechar='_', quoting=csv.QUOTE_ALL)
counter = 0
for col in k :
# number of rows in csv is 600
if counter<=600:
if col[0] =='student':
la.append("\n "+col[1])
a=next(k)
if a[1] == 'Phy':
la.append('yes')
a = next(k)
else:
la.append('no')
if a[1] == 'Chem':
la.append('yes')
a = next(k)
else:
la.append('no')
if a[1] == 'Maths':
la.append('yes')
a = next(k)
else:
la.append('no')
if a[1] == 'Biology':
la.append('yes')
a = next(k)
counter += 1
else:
la.append('no')
counter += 1
l2=",".join(la)
with open("out.csv","w") as w:
w.writelines(l2)
IMHO, it is time to learn how to debug simple prorams. Some IDE come with nice debuggers, but you can still use the good old pdb or simply add print traces in your code to easily understand what happens.
Here, the first and most evident problem is here:
tot = sum(1 for col in k)
It is pretty useless because for col in k would be enough, but it consumes the totality of the k iterator, so the next line for col in k: try to access an iterator that has already reached its end and the loop immediately stops.
That is not all:
first line contains Student with an upper case S while you test student with a lower case s: they are different strings... This case problems exists on all the other comparisons.
when you find student, you set a to the line following it... and never change it. So even if you fix your case errors, you will consistently use that only line for the student!
If you are a beginner, the rule is Keep It Simple, Stupid. So start from something you can control and then start to add other features:
read the input file with the csv module and just print the list for every row. Do not step further until this gives what you want! That would have stopped you from the tot = sum(1 for col in k) error...
identify every student. Just print it first, then store its name in a list and print the list after the loop
identify subject. Just print them first, then feed a dictionnary with the subjects
wonder how you can get that at the end of the loop...
just realize that you could store the student name in that dictionnary, and put the full dictionnary in the list (feel free to ask a new question if you are stuck there...)
print the list of dictionnaries at the end of the loop
build one row for student that could feed the csv writer, or as you already have a list of dict, considere using a DictWriter.
Good luck in practicing Python!
Here is a possible way for the read part:
import csv
la = {} # use a dict to use the student name as index
with open("new.csv","r",newline='\n') as k:
k=csv.reader(k, delimiter=',', quotechar='_', quoting=csv.QUOTE_ALL)
# counter = 0 # pretty useless...
for col in k :
if col[0] =='student':
l2 = set() # initialize a set to store subjects
la[col[1]] = l2 # reference it in la indexed by the student's name
else: # it should be a subject line
l2.add(col[1]) # note the subject
# Ok la is a dict with studend names as key, and a set containing subjects for that student as value
print(la)
For the write part, you should:
build an union of all sets to get all the possible subjects (unless you know that)
for each item (name, subjects) from la, build a list storing yes or no for each of the possible subject
write that list to the output csv file
...left as an exercise...

Python algorithm error when trying to find the next largest value

I've written an algorithm that scans through a file of "ID's" and compares that value with the value of an integer i (I've converted the integer to a string for comparison, and i've trimmed the "\n" prefix from the line). The algorithm compares these values for each line in the file (each ID). If they are equal, the algorithm increases i by 1 and uses reccurtion with the new value of i. If the value doesnt equal, it compares it to the next line in the file. It does this until it has a value for i that isn't in the file, then returns that value for use as the ID of the next record.
My issue is i have a file of ID's that list 1,3,2 as i removed a record with ID 2, then created a new record. This shows the algorithm to be working correctly, as it gave the new record the ID of 2 which was previously removed. However, when i then create a new record, the next ID is 3, resulting in my ID list reading: 1,3,2,3 instead of 1,3,2,4. Bellow is my algorithm, with the results of the print() command. I can see where its going wrong but can't work out why. Any ideas?
Algorithm:
def _getAvailableID(iD):
i = iD
f = open(IDFileName,"r")
lines = f.readlines()
for line in lines:
print("%s,%s,%s"%("i=" + str(i), "ID=" + line[:-1], (str(i) == line[:-1])))
if str(i) == line[:-1]:
i += 1
f.close()
_getAvailableID(i)
return str(i)
Output:
(The output for when the algorithm was run for finding an appropriate ID for the record that should have ID of 4):
i=1,ID=1,True
i=2,ID=1,False
i=2,ID=3,False
i=2,ID=2,True
i=3,ID=1,False
i=3,ID=3,True
i=4,ID=1,False
i=4,ID=3,False
i=4,ID=2,False
i=4,ID=2,False
i=2,ID=3,False
i=2,ID=2,True
i=3,ID=1,False
i=3,ID=3,True
i=4,ID=1,False
i=4,ID=3,False
i=4,ID=2,False
i=4,ID=2,False
I think your program is failing because you need to change:
_getAvailableID(i)
to
return _getAvailableID(i)
(At the moment the recursive function finds the correct answer which is discarded.)
However, it would probably be better to simply put all the ids you have seen into a set to make the program more efficient.
e.g. in pseudocode:
S = set()
loop over all items and S.add(int(line.rstrip()))
i = 0
while i in S:
i += 1
return i
In case you are simply looking for the max ID in the file and then want to return the next available value:
def _getAvailableID(IDFileName):
iD = '0'
with open(IDFileName,"r") as f:
for line in f:
print("ID=%s, line=%s" % (iD, line))
if line > iD:
iD = line
return str(int(iD)+1)
print(_getAvailableID("IDs.txt"))
with an input file containing
1
3
2
it outputs
ID=1, line=1
ID=1
, line=3
ID=3
, line=2
4
However, we can solve it in a more pythonic way:
def _getAvailableID(IDFileName):
with open(IDFileName,"r") as f:
mx_id = max(f, key=int)
return int(mx_id)+1

Highscores using python / Saving 10 highscores and ordering them

I am making a game on python and tkinter, and I would like to save the highscores of the game. I have an entry which takes the name of the player and assign it to a global (Name1) and a global (Score1) in which the score is saved.
The thing is that I need a file that stores the best 10 highscores, obviously showing them from the biggest score to the lowest.
My question is: How do I make that?
How can I save the Name and the Score to the file?
How can I save the scores with an order? (Including the name associated to the score)
I am really confused using the methods of .readline() and .write() and .seek() and all of those.
EDIT:
I need something like this:
The globals are Name1, Which lets say is "John" And Score1, Which at the end of the game will have an int object.
So after a few games, John achieved 3000 on the score, and then 3500, and then 2000.
The highscores file should look like:
John 3500
John 3000
John 2000
And if Mike comes out and play and achieve a 3200 score, the file should look like this:
John 3500
Mike 3200
John 3000
John 2000
The max amount of players on the highscore must be ten. So if there are already ten Highscores saved, and a bigger one comes out, the lowest score must be although ignored or deleted. (I need the file because I need to do the display on a Tkinter window)
PD: Excuse my weird English! Thank you!
user2109788's answer is basically all you need, and pickle or shelve is the best way of serialising your data given that the file doesn't need to be human readable.
Feeling generous today, the following code should be readily adaptable to your specific situation with tkinter.
To keep only 10 values in the high score table, I just add a new high score to the list, sorting it and retaining the first 10 values by slicing the sorted list. For 10 values it should be fine, but this would be a terrible idea if your list were very large.
from operator import itemgetter
import pickle
# pickle
high_scores = [
('Liz', 1800),
('Desi', 5000),
('Mike', 3200),
('John', 2000),
('Gabi', 3150),
('John', 3500),
('Gabi', 3100),
('John', 3000),
('Liz', 2800),
('Desi', 2800),
]
high_scores.append(('Dave', 3300))
high_scores = sorted(high_scores, key=itemgetter(1), reverse=True)[:10]
with open('highscores.txt', 'w') as f:
pickle.dump(high_scores, f)
# unpickle
high_scores = []
with open('highscores.txt', 'r') as f:
high_scores = pickle.load(f)
You can have a list of tuples in you script to store highscores and sort it before displaying it to the user. You can use shelve or pickle to store python variables(To store the highscores)
List of tuple will look like:
[(Name1, Score1), (Name2, Score2),....,(Name10, score10)]

How to call in a specifc csv field value in python

I am so new to python (a week in) so I hope I ask this question properly.
I have imported a grade sheet in csv format into python 2.7. The first column is the name of the student and the column titles are the name of the assignments. So the data looks something like this:
Name Test1 Test2 Test3
Robin 89 78 100
...
Rick 72 100 98
I want to be able to do (or have someone else do) 3 things just by typing in the name of the person and the assignment.
1. Get the score for that person for that assignment
2. Get the average score for that assignment
3. Get that persons average score
But for some reason I get lost at figuring how to get python to recognize the field I am trying to call in. So far this is what I have (so far the only part that works is calling in file):
data = csv.DictReader(open("C:\file.csv"))
for row in data:
print row
def grade()
student= input ("Enter a student name: ")
assignment= input("Enter a assignment: ")
for row in data:
task_grade= data.get(int(row["student"], int(row["assignment"])) # specific grade
task_total= sum(int(row['assignment'])) #assignment total
student_total= #student assignments total-- no clue how to do this
task_average= task_total/11
average_score= student_total/9
You can access the individual "columns" of your csv this way:
import csv
def parse_csv():
csv_file = open('data.csv', 'r')
r = csv.reader(csv_file)
grade_averages = {}
for row in r:
if row[0].startswith('Name'):
continue
#print "Student: ", row[0]
grades = []
for column in row[1:]:
#print "Grade: ", column
grades.append(int(column.strip()))
grade_total = 0
for i in grades:
grade_total += i
grade_averages[row[0]] = grade_total / len(grades)
#print "grade_averages: ", grade_averages
return grade_averages
def get_grade(student_name):
grade_averages = parse_csv()
return grade_averages[student_name]
print "Rick: ", get_grade('Rick')
print "Robin: ", get_grade('Robin')
What you are trying to do is not meant for Python because you have keys and values. However...
If you know that your columns are always the same, no need to use keywords, you can use positions:
Here is the easy, inefficient* way to do 1 and 3:
students_name = ...
number = ...
for line in open("C:\file.csv")).readlines()
items = line.split()
num_assignments = len(items)-1
name = items[0]
if name = students_name:
print("assignment score: {0}".format(items[number]))
asum = 0
for k in range(0,num_assignments):
asum+= items[k+1]
print("their average: {0}".format(asum / num_assignments)
To do 2, you should precompute the averages and return them beucase the averages for each assignment is the same for each user query.
I say easy *innefficnet because you search the text file for each user query each time a name is entered. To do it properly, you should probably build a dictionary of all names and their information. But that solution is more complicated, and you are only a week in! Moreover, its longer and you should give it a try. Look up dict.
I believe the reason you are not seeing the field the second time around is because the iterator returned by csv.DictReader() is a one-time iterator. That is to say, once you've reached the last row of the csv file, it will not reset to the first position.
So, by doing this:
data = csv.DictReader(open("C:\file.csv"))
for row in data:
print row
You are running it out. Try commenting those lines and see if that helps.

Categories

Resources