Sorting a table in python - python

I am creating a league table for a 6 a side football league and I am attempting to sort it by the points column and then display it in easygui. The code I have so far is this:
data = csv.reader(open('table.csv'), delimiter = ',')
sortedlist = sorted(data, key=operator.itemgetter(7))
with open("Newtable.csv", "wb") as f:
fileWriter = csv.writer(f, delimiter=',')
for row in sortedlist:
fileWriter.writerow(row)
os.remove("table.csv")
os.rename("Newtable.csv", "table.csv")
os.close
The number 7 relates to the points column in my csv file. I have a problem with Newtable only containing the teams information that has the highest points and the table.csv is apparently being used by another process and so cannot be removed.
If anyone has any suggestions on how to fix this it would be appreciated.

If the indentation in your post is actually the indentation in your script (and not a copy-paste error), then the problem is obvious:
os.rename() is executed during the for loop (which means that it's called once per line in the CSV file!), at a point in time where Newtable.csv is still open (not by a different process but by your script itself), so the operation fails.
You don't need to close f, by the way - the with statement takes care of that for you. What you do need to close is data - that file is also still open when the call occurs.
Finally, since a csv object contains strings, and strings are sorted alphabetically, not numerically (so "10" comes before "2"), you need to sort according to the numerical value of the string, not the string itself.
You probably want to do something like
with open('table.csv', 'rb') as infile:
data = csv.reader(infile, delimiter = ',')
sortedlist = [next(data)] + sorted(data, key=lambda x: int(x[7])) # or float?
# next(data) reads the header before sorting the rest
with open("Newtable.csv", "wb") as f:
fileWriter = csv.writer(f, delimiter=',')
fileWriter.writerows(sortedList) # No for loop needed :)
os.remove("table.csv")
os.rename("Newtable.csv", "table.csv")

I'd suggest using pandas:
Assuming an input file like this:
team,points
team1, 5
team2, 6
team3, 2
You could do:
import pandas as pd
a = pd.read_csv('table.csv')
b=a.sort('points',ascending=False)
b.to_csv('table.csv',index=False)

Related

Reading, formatting, sorting, and saving a csv file without pandas

I am given some sample data in a file --> transactions1.csv
transactions1
I need to do code a function that will do the following without using pandas:
Open the file
Sort the index column in ascending order
Save the updated data back to the same file
This is the code I currently have
import csv
def sort_index():
read_file=open("transactions1.csv","r")
r=csv.reader(read_file)
lines=list(r)
sorted_lines=sorted(lines[1:], key=lambda row: row[0])
read_file.close()
with open('transactions1.csv','w',newline='') as file_writer:
header=['Index','Day','Month','Year','Category','Amount','Notes'] #-
writer=csv.DictWriter(file_writer, fieldnames=header)
writer=csv.writer(file_writer)
writer.writerows(sorted_lines)
return False
sort_index()
Current output:
1,2,Dec,2021,Transport,5,MRT cost
10,19,May,2020,Transport,25,taxi fare
2,5,May,2021,investment,7,bill
3,2,Aug,2020,Bills,8,small bill
4,15,Feb,2021,Bills,40,phone bill
5,14,Oct,2021,Shopping,100,shopping 20
6,27,May,2021,Others,20,other spend
7,19,Nov,2019,Investment,1000,new invest
8,28,Mar,2020,Food,4,drink
9,18,Nov,2019,Shopping,15,clothes
The code doesn't seem to work because index 10 appears right after index 1. And the headers are missing.
Expected Output:
Index,Day,Month,Year,Category,Amount,Notes
1,2,Dec,2021,Transport,5,MRT cost
10,19,May,2020,Transport,25,taxi fare
2,5,May,2021,investment,7,bill
3,2,Aug,2020,Bills,8,small bill
4,15,Feb,2021,Bills,40,phone bill
5,14,Oct,2021,Shopping,100,shopping 20
6,27,May,2021,Others,20,other spend
7,19,Nov,2019,Investment,1000,new invest
8,28,Mar,2020,Food,4,drink
9,18,Nov,2019,Shopping,15,clothes
Several improvements can be made to the current function.
No need to call list on the reader object, we can sort it directly as it is iterable.
We can extract the headers from the reader object by calling next directly on the reader object.
It is more maintainable if we define a custom key function not using lambda, that way if something changes only the function definition needs to change.
Use a with statement to automatically handle closing the file object.
def _key(row):
return int(row[0])
def sort_index():
with open("transactions1.csv", "r") as fin:
reader = csv.reader(fin)
header = next(reader)
rows = sorted(reader, key=_key)
with open("transactions1.csv", "w", newline="") as fout:
writer = csv.writer(fout)
writer.writerow(header)
writer.writerows(rows)

Removing the end of line character from a read csv file

I tried sever times to use strip() but I can't get it to work.
I removed that piece from this snip but every time I tried it I had
an error or it did nothing. The sort is fine I just want to strip the newline before writing to the new file?
import sys, csv, operator
data = csv.reader(open('tickets.csv'),delimiter=',')
sortedlist = sorted(data, key=operator.itemgetter(6))
# 0 specifies according to first column we want to sort
#now write the sort result into new CSV file
with open("newfiles.csv", "w") as f:
#writablefile = csv.writer(f)
fileWriter = csv.writer(f, delimiter=',')
for row in sortedlist:
#print(row)
lst = (row)
fileWriter.writerow(lst)
You need to add newline='' to your open() when writing a CSV file. This is explained in the documentation. Without it, your file can end up having a blank line per row.
import sys, csv, operator
data = csv.reader(open('tickets.csv'),delimiter=',')
header = next(data)
sortedlist = sorted(data, key=operator.itemgetter(6))
# 0 specifies according to first column we want to sort
#now write the sort result into a new CSV file
with open("newfiles.csv", "w", newline="") as f:
fileWriter = csv.writer(f)
fileWriter.writerow(header) # keep the header at the top
fileWriter.writerows(sortedlist)
Also you need to first read in the header row before loading everything for sorting. This avoids it being sorted. It can then be output separately when writing your sorted output CSV.
If your tickets.csv file contains blank lines, you would need to also remove these. For example:
for row in sortedList:
if row:
fileWriter.writerow(row)

Sort CSV File by Numerical Values

So for a Homework task i have to make a program that sorts scores like a leader board and i cant figure out how to sort it in descending order.
I will send the rest of my code if that helps, any help would be appreciated.
Forgot to mention CSV file looks like this:
NAME, SCORE
I have Seen many questions on here about this and none of them seem to work with mine.
with open('names.csv', 'a', newline='') as names:
csvWriter = csv.writer(names)
csvWriter.writerow([name, int(score)])
with open('names.csv', 'r') as names:
csvReader = csv.reader(names)
csv1 = csv.reader(names, delimiter='.')
sort = sorted(csv1, key=operator.itemgetter(0))
csv_writer = csv.writer(names, delimiter='.')
for name in csvReader:
print(' '.join(name))
no error message or results, just an exit code
See here for reading a csv into dataframe
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
and here for sorting it
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html
There's a few problems with your code here, i'll try to break it down.. I don't recommend you jump to pandas just yet. I'm going to assume this is actually writing data.. but you should check "names.csv" to make sure it isn't empty. It looks like this would only write one row. I'm also not sure why you set newline='', I probably wouldn't do that here.. though i'm not sure it will matter to csv.writer.
with open('names.csv', 'a', newline='') as names:
csvWriter = csv.writer(names)
csvWriter.writerow([name, int(score)])
usually you would have a loop inside a statement like this:
my_list = [("Bob", "2"), ("Alice", "1")]
with open('names.csv', 'a') as names:
csvWriter = csv.writer(names)
for name, score in my_list:
csvWriter.writerow([name, int(score)])
You make two csv readers here.. i'm not sure why. The second one uses "." as it's delimiter.. you haven't specified a delimiter when you wrote the csv, so it would actually have the standard delimiter with is a comma ",". csvReader would use this by default, but csv1 will attempt to split on periods and not do so well.
with open('names.csv', 'r') as names:
csvReader = csv.reader(names)
csv1 = csv.reader(names, delimiter='.')
You shouldn't use "sort" as a variable name.. it's the name of a common method. But that won't actually break anything here.. you create a new sorted version of csv1 but you actually run your for loop on csvReader which hasn't changed since you set it above. You also create csv_writer with names with is open for reading, not writing. but you don't use it anyway.
itemgetter is looking at index 0, which would be the names from your writer.. if you want to sort by score use 1.
sort = sorted(csv1, key=operator.itemgetter(0))
csv_writer = csv.writer(names, delimiter='.')
for name in csvReader:
print(' '.join(name))
I think this might be more what you're going for.. with my example list.
#!/usr/bin/env python3
import csv
import operator
my_list = [("Bob", "2"), ("Alice", "1")]
with open('names.csv', 'a') as names:
csvWriter = csv.writer(names)
for name, score in my_list:
csvWriter.writerow([name, int(score)])
with open('names.csv', 'r') as names:
csvReader = csv.reader(names)
sorted_reader = sorted(csvReader, key=operator.itemgetter(1))
for name, score in sorted_reader:
print(name, score)
Output:
Alice 1
Bob 2
Since you want it to look "like a leaderboard" you probably want a different sort though.. you want to reverse it and actually use the integer values of the score column (so that 100 will appear above 2, for example).
sorted_reader = sorted(csvReader, key=lambda row: int(row[1]), reverse=True)

Sort a CSV file to read in Python Program

I'm trying to create a leaderboard in python, where a player will get a score from playing a game, which will write to a .csv file. I then need to read from this leaderboard, sorted from largest at the top to smallest at the bottom.
Is the sorting something that should be done when the values are written to the file, when i read the file, or somewhere in between?
my code:
writefile=open("leaderboard.csv","a")
writefile.write(name+", "points)
writefile.close()
readfile=open("leaderboard.csv","r")
I'm hoping to display the top 5 scores and the accompanying names.
It is this point that I have hit a brick wall. Thanks for any help.
Edit: getting the error 'list index out of range'
import csv
name = 'Test'
score = 3
with open('scores.csv', 'a') as f:
writer = csv.writer(f)
writer.writerow([name, score])
with open('scores.csv') as f:
reader = csv.reader(f)
scores = sorted(reader, key=lambda row: (float(row[1]), row[0]))
top5 = scores[-5:]
csv file:
test1 3
test2 3
test3 3
Python has a csv module in the standard library. It's very simple to use:
import csv
with open('scores.csv', 'a') as f:
writer = csv.writer(f)
writer.writerow([name, score])
with open('scores.csv') as f:
reader = csv.reader(f)
scores = sorted(reader, key=lambda row: (float(row[1]), row[0]))
top5 = scores[-5:]
Is the sorting something that should be done when the values are written to the file, when i read the file, or somewhere in between?
Both approaches have their benefits. If you sort when you read, you can simply append to the file, which makes writing new scores faster. You do take more time when reading as you have to sort it before you can figure out the highest score, but in a local leaderboard file this is unlikely to be a problem as you are unlikely to have more than a few thousands of lines, so you'll likely be fine with sorting when reading.
If you sort while writing, this comes with the problem that you'll have to rewrite the entire file every time a new score is added. However, it's also easier to cleanup the leaderboard file if you sort while writing. You can simply remove old/low scores that you don't care about anymore while writing.
Try this:
import pandas as pd
df = pd.read_csv('myfullfilepath.csv', sep=',', names=['name', 'score'])
df = df.sort_values(['score'], ascending=False)
first_ten = df.head(10)
first_ten.to_csv('myfullpath.csv', index=False).
I named the columns like that , following the structure tat you suggested.

Optimize Python CSV Reader Performance

My following code works correctly, but far too slowly. I would greatly appreciate any help you can provide:
import gf
import csv
cic = gf.ct
cii = gf.cit
li = gf.lt
oc = "Output.csv"
with open(cic, "rb") as input1:
reader = csv.DictReader(cie,gf.ctih)
with open(oc,"wb") as outfile:
writer = csv.DictWriter(outfile,gf.ctoh)
writer.writerow(dict((h,h) for h in gf.ctoh))
next(reader)
for ci in reader:
row = {}
row["ci"] = ci["id"]
row["cyf"] = ci["yf"]
with open(cii,"rb") as ciif:
reader2 = csv.DictReader(ciif,gf.citih)
next(reader2)
with open(li, "rb") as lif:
reader3 = csv.DictReader(lif,gf.lih)
next(reader3)
for cii in reader2:
if ci["id"] == cii["id"]:
row["ci"] = cii["ca"]
for li in reader3:
if ci["id"] == li["en_id"]:
row["cc"] = li["c"]
writer.writerow(row)
The reason I open reader2 and reader3 for every row in reader is because reader objects iterate through once and then are done. But there has to be a much more efficient way of doing this and I would greatly appreciate any help you can provide!
If it helps, the intuition behind this code is the following: From Input file 1, grab two cells; see if input file 2 has the same Primary Key as in input file 1, if so, grab a cell from input file 2 and save it with the two other saved cells; see if input file 3 has the same primary key as in input file 1, if so, grab a cell from inputfile3 and save it. Then output these four values. That is, I'm grabbing meta-data from normalized tables and I'm trying to denormalize it. There must be a way of doing this very efficiently in Python. One problem with the current code is that I iterate through reader objects until I find the relevant ID, when there must be a simpler way of searching for a given ID in a reader object...
For one, if this really does live in a relational database, why not just do a big join with some carefully phrased selects?
If I were doing this, I would use pandas.DataFrame and merge the 3 tables together, then I would iterate over each row and use suitable logic to transform the resulting "join"ed datasets into a single final result.

Categories

Resources