Sort a CSV file to read in Python Program

Sort a CSV file to read in Python Program - python

I'm trying to create a leaderboard in python, where a player will get a score from playing a game, which will write to a .csv file. I then need to read from this leaderboard, sorted from largest at the top to smallest at the bottom.
Is the sorting something that should be done when the values are written to the file, when i read the file, or somewhere in between?
my code:
writefile=open("leaderboard.csv","a")
writefile.write(name+", "points)
writefile.close()
readfile=open("leaderboard.csv","r")
I'm hoping to display the top 5 scores and the accompanying names.
It is this point that I have hit a brick wall. Thanks for any help.
Edit: getting the error 'list index out of range'
import csv
name = 'Test'
score = 3
with open('scores.csv', 'a') as f:
writer = csv.writer(f)
writer.writerow([name, score])
with open('scores.csv') as f:
reader = csv.reader(f)
scores = sorted(reader, key=lambda row: (float(row[1]), row[0]))
top5 = scores[-5:]
csv file:
test1 3
test2 3
test3 3

Python has a csv module in the standard library. It's very simple to use:
import csv
with open('scores.csv', 'a') as f:
writer = csv.writer(f)
writer.writerow([name, score])
with open('scores.csv') as f:
reader = csv.reader(f)
scores = sorted(reader, key=lambda row: (float(row[1]), row[0]))
top5 = scores[-5:]
Is the sorting something that should be done when the values are written to the file, when i read the file, or somewhere in between?
Both approaches have their benefits. If you sort when you read, you can simply append to the file, which makes writing new scores faster. You do take more time when reading as you have to sort it before you can figure out the highest score, but in a local leaderboard file this is unlikely to be a problem as you are unlikely to have more than a few thousands of lines, so you'll likely be fine with sorting when reading.
If you sort while writing, this comes with the problem that you'll have to rewrite the entire file every time a new score is added. However, it's also easier to cleanup the leaderboard file if you sort while writing. You can simply remove old/low scores that you don't care about anymore while writing.

Try this:
import pandas as pd
df = pd.read_csv('myfullfilepath.csv', sep=',', names=['name', 'score'])
df = df.sort_values(['score'], ascending=False)
first_ten = df.head(10)
first_ten.to_csv('myfullpath.csv', index=False).
I named the columns like that , following the structure tat you suggested.

Related

Reading, formatting, sorting, and saving a csv file without pandas

I am given some sample data in a file --> transactions1.csv
transactions1
I need to do code a function that will do the following without using pandas:
Open the file
Sort the index column in ascending order
Save the updated data back to the same file
This is the code I currently have
import csv
def sort_index():
read_file=open("transactions1.csv","r")
r=csv.reader(read_file)
lines=list(r)
sorted_lines=sorted(lines[1:], key=lambda row: row[0])
read_file.close()
with open('transactions1.csv','w',newline='') as file_writer:
header=['Index','Day','Month','Year','Category','Amount','Notes'] #-
writer=csv.DictWriter(file_writer, fieldnames=header)
writer=csv.writer(file_writer)
writer.writerows(sorted_lines)
return False
sort_index()
Current output:
1,2,Dec,2021,Transport,5,MRT cost
10,19,May,2020,Transport,25,taxi fare
2,5,May,2021,investment,7,bill
3,2,Aug,2020,Bills,8,small bill
4,15,Feb,2021,Bills,40,phone bill
5,14,Oct,2021,Shopping,100,shopping 20
6,27,May,2021,Others,20,other spend
7,19,Nov,2019,Investment,1000,new invest
8,28,Mar,2020,Food,4,drink
9,18,Nov,2019,Shopping,15,clothes
The code doesn't seem to work because index 10 appears right after index 1. And the headers are missing.
Expected Output:
Index,Day,Month,Year,Category,Amount,Notes
1,2,Dec,2021,Transport,5,MRT cost
10,19,May,2020,Transport,25,taxi fare
2,5,May,2021,investment,7,bill
3,2,Aug,2020,Bills,8,small bill
4,15,Feb,2021,Bills,40,phone bill
5,14,Oct,2021,Shopping,100,shopping 20
6,27,May,2021,Others,20,other spend
7,19,Nov,2019,Investment,1000,new invest
8,28,Mar,2020,Food,4,drink
9,18,Nov,2019,Shopping,15,clothes

Several improvements can be made to the current function.
No need to call list on the reader object, we can sort it directly as it is iterable.
We can extract the headers from the reader object by calling next directly on the reader object.
It is more maintainable if we define a custom key function not using lambda, that way if something changes only the function definition needs to change.
Use a with statement to automatically handle closing the file object.
def _key(row):
return int(row[0])
def sort_index():
with open("transactions1.csv", "r") as fin:
reader = csv.reader(fin)
header = next(reader)
rows = sorted(reader, key=_key)
with open("transactions1.csv", "w", newline="") as fout:
writer = csv.writer(fout)
writer.writerow(header)
writer.writerows(rows)

Sort CSV File by Numerical Values

So for a Homework task i have to make a program that sorts scores like a leader board and i cant figure out how to sort it in descending order.
I will send the rest of my code if that helps, any help would be appreciated.
Forgot to mention CSV file looks like this:
NAME, SCORE
I have Seen many questions on here about this and none of them seem to work with mine.
with open('names.csv', 'a', newline='') as names:
csvWriter = csv.writer(names)
csvWriter.writerow([name, int(score)])
with open('names.csv', 'r') as names:
csvReader = csv.reader(names)
csv1 = csv.reader(names, delimiter='.')
sort = sorted(csv1, key=operator.itemgetter(0))
csv_writer = csv.writer(names, delimiter='.')
for name in csvReader:
print(' '.join(name))
no error message or results, just an exit code

See here for reading a csv into dataframe
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
and here for sorting it
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html

There's a few problems with your code here, i'll try to break it down.. I don't recommend you jump to pandas just yet. I'm going to assume this is actually writing data.. but you should check "names.csv" to make sure it isn't empty. It looks like this would only write one row. I'm also not sure why you set newline='', I probably wouldn't do that here.. though i'm not sure it will matter to csv.writer.
with open('names.csv', 'a', newline='') as names:
csvWriter = csv.writer(names)
csvWriter.writerow([name, int(score)])
usually you would have a loop inside a statement like this:
my_list = [("Bob", "2"), ("Alice", "1")]
with open('names.csv', 'a') as names:
csvWriter = csv.writer(names)
for name, score in my_list:
csvWriter.writerow([name, int(score)])
You make two csv readers here.. i'm not sure why. The second one uses "." as it's delimiter.. you haven't specified a delimiter when you wrote the csv, so it would actually have the standard delimiter with is a comma ",". csvReader would use this by default, but csv1 will attempt to split on periods and not do so well.
with open('names.csv', 'r') as names:
csvReader = csv.reader(names)
csv1 = csv.reader(names, delimiter='.')
You shouldn't use "sort" as a variable name.. it's the name of a common method. But that won't actually break anything here.. you create a new sorted version of csv1 but you actually run your for loop on csvReader which hasn't changed since you set it above. You also create csv_writer with names with is open for reading, not writing. but you don't use it anyway.
itemgetter is looking at index 0, which would be the names from your writer.. if you want to sort by score use 1.
sort = sorted(csv1, key=operator.itemgetter(0))
csv_writer = csv.writer(names, delimiter='.')
for name in csvReader:
print(' '.join(name))
I think this might be more what you're going for.. with my example list.
#!/usr/bin/env python3
import csv
import operator
my_list = [("Bob", "2"), ("Alice", "1")]
with open('names.csv', 'a') as names:
csvWriter = csv.writer(names)
for name, score in my_list:
csvWriter.writerow([name, int(score)])
with open('names.csv', 'r') as names:
csvReader = csv.reader(names)
sorted_reader = sorted(csvReader, key=operator.itemgetter(1))
for name, score in sorted_reader:
print(name, score)
Output:
Alice 1
Bob 2
Since you want it to look "like a leaderboard" you probably want a different sort though.. you want to reverse it and actually use the integer values of the score column (so that 100 will appear above 2, for example).
sorted_reader = sorted(csvReader, key=lambda row: int(row[1]), reverse=True)

Breaking up columns in a simple csv file

My csv file looks like this:
Test Number,Score
1,100 2,40 3,80 4,90.
I have been trying to figure out how to write a code that ignores the header + first column and focuses on scores because the assignment was to find the averages of the test scores and print out a float(for those particular numbers the output should be 77.5). I've looked online and found pieces that I think would work but I'm getting errors every time. Were learning about read, realines, split, rstrip and \n if that helps! I'm sure the answer is so simple, but I'm new to coding and I have no idea what I'm doing. Thank you!
def calculateTestAverage(fileName):
myFile = open(fileName, "r")
column = myFile.readline().rstrip("\n")
for column in myFile:
scoreColumn = column.split(",")
(scoreColumn[1])
This is my code so far my professor wanted us to define a function and go from there using the stuff we learned in lecture. I'm stuck because it's printing out all the scores I need on separate returned lines, yet I am not able to sum those without getting an error. Thanks for all your help, I don't think I would be able to use any of the suggestions because we never went over them. If anyone has an idea of how to take those test scores that printed out vertically as a column and sum them that would help me a ton!

You can use csv library. This code should do the work:
import csv
reader = csv.reader(open('csvfile.txt','r'), delimiter=' ')
reader.next() # this line lets you skip the header line
for row_number, row in enumerate(reader):
total_score = 0
for element in row:
test_number, score = element.split(',')
total_score += score
average_score = total_score/float(len(row))
print("Average score for row #%d is: %.1f" % (row_number, average_score))
The output should look like this:
Average score for row #1 is: 77.5

I always approach this with a pandas data frame. Specifically the read_csv() function. You don’t need to ignore the header, just state that it is in row 0 (for example) and then also the same with the row labels.
So for example:
import pandas as pd
import numpy as np
df=read_csv(“filename”,header=0,index_col=0)
scores=df.values
print(np.average(scores))

I will break it down for you.
Since you're dealing with .csv files, I recommend using the csv library. You can import it with:
import csv
Now we need to open() the file. One common way is to use with:
with open('test.csv') as file:
Which is a context manager that avoids having to close the file at the end. The other option is to open and close normally:
file = open('test.csv')
# Do your stuff here
file.close()
Now you need to wrap the opened file with csv.reader(), which allows you to read .csv files and do things with them:
csv_reader = csv.reader(file)
To skip the headers, you can use next():
next(csv_reader)
Now for the average calculation part. One simple way is to have two variables, score_sum and total. The aim is to increment the scores and totals to these two variables respectively. Here is an example snippet :
score_sum = 0
total = 0
for number, score in csv_reader:
score_sum += int(score)
total += 1
Here's how to do it with indexing also:
score_sum = 0
total = 0
for line in csv_reader:
score_sum += int(line[1])
total += 1
Now that we have our score and totals calculated, getting the average is simply:
score_sum / total
The above code combined will then result in an average of 77.5.
Off course, this all assumes that your .csv file is actually in this format:
Test Number,Score
1,100
2,40
3,80
4,90

manipulating a csv file and writing its output to a new csv file in python

I have a simple file named saleem.csv which contains the following lines of csv information:
File,Run,Module,Name,,,,,
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithInterference,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithoutInterference,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithInterferencePartial,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithoutInterferencePartial,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithInterferenceDropped,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithoutInterferenceDropped,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].appl,broadcast queued,3,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].appl,replies sent,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].appl,replies received,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,nominal,1.188e+07,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,total,1232.22,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,lifetime,-1,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,Mean power consumption,55.7565,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,num devices,1,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,physical layer,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,device total (mWs),1232.22,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,account,0,1,2,3,4
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,energy (mWs),0,207.519,1024.7,0,0
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,time (s),0,3.83442,18.2656,0,
I want to skip the first line, read this file and only write column[2] and column[4] to a new csv file named out.csv. I have written the following to script to do the job.
import csv
with open('saleem.csv') as f:
readcsv = csv.reader(f)
for row in readcsv:
dele = (row[2], row[4])
print dele
with open('out.csv', 'w+') as j:
writecsv = csv.writer(j)
#for row in dele:
for row in dele:
writecsv.writerows(dele)
f.close()
j.close()
This produces the following output:
M,y,N,e,t,w,o,r,k,.,n,o,d,e,[,4,],.,b,a,t,t,e,r,y,S,t,a,t,s
0
M,y,N,e,t,w,o,r,k,.,n,o,d,e,[,4,],.,b,a,t,t,e,r,y,S,t,a,t,s
0
Please help me, Sorry for the mistake previously please, as i mistakenly wrote row.

Edited to reflect revised question
Some problems I can see:
P1: writerows(...)
for row in dele:
writecsv.writerows(dele)
writerows takes a list of rows to write to the csv file. So it shouldn't be inside a loop where you iterate over all rows and attempt to write them individually.
P2: overwriting
for row in readcsv:
dele = (row[2], row[4])
You are continuously overwriting dele, so you aren't going to be keeping track of row[2] and row[4] from every row.
What you could do instead:
dele = []
with open('saleem.csv') as f:
readcsv = csv.reader(f)
for row in readcsv:
dele.append([row[2], row[4])
print([row[2], row[4]])
with open('out.csv', 'w+') as j:
writecsv.csvwriter(j)
writecsv.writerows(dele)
This produced output:
MyNetwork.node[0].nic.phy,0
MyNetwork.node[0].nic.phy,0
MyNetwork.node[0].nic.phy,0
MyNetwork.node[0].nic.phy,0
MyNetwork.node[0].nic.phy,0
MyNetwork.node[0].nic.phy,0
MyNetwork.node[0].appl,3
MyNetwork.node[0].appl,0
MyNetwork.node[0].appl,0
MyNetwork.node[0].batteryStats,1.188e+07
MyNetwork.node[0].batteryStats,1232.22
MyNetwork.node[0].batteryStats,-1
MyNetwork.node[0].batteryStats,55.7565
MyNetwork.node[0].batteryStats,1
MyNetwork.node[0].batteryStats,0
MyNetwork.node[0].batteryStats,1232.22
MyNetwork.node[0].batteryStats,0
MyNetwork.node[0].batteryStats,0
MyNetwork.node[0].batteryStats,0
Also, unrelated to your issue at hand, the following code is unnecessary:
f.close()
j.close()
The reason why with open(...): syntax is so widely used, is because it handles gracefully closing the file for you. You don't need to separately close it yourself. As soon as the with block ends, the file will be closed.

I would suggest using the pandas library.
It makes working with csv files very easy.
import pandas as pd #standard convention for importing pandas
# reads the csv file into a pandas dataframe
dataframe = pd.read_csv('saleem.csv')
# make a new dataframe with just columns 2 and 4
print_dataframe = dataframe.iloc[:,[2,4]]
# output the csv file, but don't include the index numbers or header, just the data
print_dataframe.to_csv('out.csv', index=False, header=False)
If you use Ipython or Jupyter Notebook, you can type
dataframe.head()
to see the first few values of the dataframe. There is a lot more you can do with the library that might be worth learning, but in general it is a great way to read in, filter, and process csv data.

Sorting a table in python

I am creating a league table for a 6 a side football league and I am attempting to sort it by the points column and then display it in easygui. The code I have so far is this:
data = csv.reader(open('table.csv'), delimiter = ',')
sortedlist = sorted(data, key=operator.itemgetter(7))
with open("Newtable.csv", "wb") as f:
fileWriter = csv.writer(f, delimiter=',')
for row in sortedlist:
fileWriter.writerow(row)
os.remove("table.csv")
os.rename("Newtable.csv", "table.csv")
os.close
The number 7 relates to the points column in my csv file. I have a problem with Newtable only containing the teams information that has the highest points and the table.csv is apparently being used by another process and so cannot be removed.
If anyone has any suggestions on how to fix this it would be appreciated.

If the indentation in your post is actually the indentation in your script (and not a copy-paste error), then the problem is obvious:
os.rename() is executed during the for loop (which means that it's called once per line in the CSV file!), at a point in time where Newtable.csv is still open (not by a different process but by your script itself), so the operation fails.
You don't need to close f, by the way - the with statement takes care of that for you. What you do need to close is data - that file is also still open when the call occurs.
Finally, since a csv object contains strings, and strings are sorted alphabetically, not numerically (so "10" comes before "2"), you need to sort according to the numerical value of the string, not the string itself.
You probably want to do something like
with open('table.csv', 'rb') as infile:
data = csv.reader(infile, delimiter = ',')
sortedlist = [next(data)] + sorted(data, key=lambda x: int(x[7])) # or float?
# next(data) reads the header before sorting the rest
with open("Newtable.csv", "wb") as f:
fileWriter = csv.writer(f, delimiter=',')
fileWriter.writerows(sortedList) # No for loop needed :)
os.remove("table.csv")
os.rename("Newtable.csv", "table.csv")

I'd suggest using pandas:
Assuming an input file like this:
team,points
team1, 5
team2, 6
team3, 2
You could do:
import pandas as pd
a = pd.read_csv('table.csv')
b=a.sort('points',ascending=False)
b.to_csv('table.csv',index=False)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Sort a CSV file to read in Python Program - python

Try this: import pandas as pd df = pd.read_csv('myfullfilepath.csv', sep=',', names=['name', 'score']) df = df.sort_values(['score'], ascending=False) first_ten = df.head(10) first_ten.to_csv('myfullpath.csv', index=False). I named the columns like that , following the structure tat you suggested.

Related

Reading, formatting, sorting, and saving a csv file without pandas

Sort CSV File by Numerical Values

Breaking up columns in a simple csv file

manipulating a csv file and writing its output to a new csv file in python

Sorting a table in python

Categories

Resources