Sort CSV File by Numerical Values

Sort CSV File by Numerical Values - python

So for a Homework task i have to make a program that sorts scores like a leader board and i cant figure out how to sort it in descending order.
I will send the rest of my code if that helps, any help would be appreciated.
Forgot to mention CSV file looks like this:
NAME, SCORE
I have Seen many questions on here about this and none of them seem to work with mine.
with open('names.csv', 'a', newline='') as names:
csvWriter = csv.writer(names)
csvWriter.writerow([name, int(score)])
with open('names.csv', 'r') as names:
csvReader = csv.reader(names)
csv1 = csv.reader(names, delimiter='.')
sort = sorted(csv1, key=operator.itemgetter(0))
csv_writer = csv.writer(names, delimiter='.')
for name in csvReader:
print(' '.join(name))
no error message or results, just an exit code

See here for reading a csv into dataframe
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
and here for sorting it
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html

There's a few problems with your code here, i'll try to break it down.. I don't recommend you jump to pandas just yet. I'm going to assume this is actually writing data.. but you should check "names.csv" to make sure it isn't empty. It looks like this would only write one row. I'm also not sure why you set newline='', I probably wouldn't do that here.. though i'm not sure it will matter to csv.writer.
with open('names.csv', 'a', newline='') as names:
csvWriter = csv.writer(names)
csvWriter.writerow([name, int(score)])
usually you would have a loop inside a statement like this:
my_list = [("Bob", "2"), ("Alice", "1")]
with open('names.csv', 'a') as names:
csvWriter = csv.writer(names)
for name, score in my_list:
csvWriter.writerow([name, int(score)])
You make two csv readers here.. i'm not sure why. The second one uses "." as it's delimiter.. you haven't specified a delimiter when you wrote the csv, so it would actually have the standard delimiter with is a comma ",". csvReader would use this by default, but csv1 will attempt to split on periods and not do so well.
with open('names.csv', 'r') as names:
csvReader = csv.reader(names)
csv1 = csv.reader(names, delimiter='.')
You shouldn't use "sort" as a variable name.. it's the name of a common method. But that won't actually break anything here.. you create a new sorted version of csv1 but you actually run your for loop on csvReader which hasn't changed since you set it above. You also create csv_writer with names with is open for reading, not writing. but you don't use it anyway.
itemgetter is looking at index 0, which would be the names from your writer.. if you want to sort by score use 1.
sort = sorted(csv1, key=operator.itemgetter(0))
csv_writer = csv.writer(names, delimiter='.')
for name in csvReader:
print(' '.join(name))
I think this might be more what you're going for.. with my example list.
#!/usr/bin/env python3
import csv
import operator
my_list = [("Bob", "2"), ("Alice", "1")]
with open('names.csv', 'a') as names:
csvWriter = csv.writer(names)
for name, score in my_list:
csvWriter.writerow([name, int(score)])
with open('names.csv', 'r') as names:
csvReader = csv.reader(names)
sorted_reader = sorted(csvReader, key=operator.itemgetter(1))
for name, score in sorted_reader:
print(name, score)
Output:
Alice 1
Bob 2
Since you want it to look "like a leaderboard" you probably want a different sort though.. you want to reverse it and actually use the integer values of the score column (so that 100 will appear above 2, for example).
sorted_reader = sorted(csvReader, key=lambda row: int(row[1]), reverse=True)

Related

how to select a specific column of a csv file in python

I am a beginner of Python and would like to have your opinion..
I wrote this code that reads the only column in a file on my pc and puts it in a list.
I have difficulties understanding how I could modify the same code with a file that has multiple columns and select only the column of my interest.
Can you help me?
list = []
with open(r'C:\Users\Desktop\mydoc.csv') as file:
for line in file:
item = int(line)
list.append(item)
results = []
for i in range(0,1086):
a = list[i-1]
b = list[i]
c = list[i+1]
results.append(b)
print(results)

You can use pandas.read_csv() method very simply like this:
import pandas as pd
my_data_frame = pd.read_csv('path/to/your/data')
results = my_data_frame['name_of_your_wanted_column'].values.tolist()

A useful module for the kind of work you are doing is the imaginatively named csv module.
Many csv files have a "header" at the top, this by convention is a useful way of labeling the columns of your file. Assuming you can insert a line at the top of your csv file with comma delimited fieldnames, then you could replace your program with something like:
import csv
with open(r'C:\Users\Desktop\mydoc.csv') as myfile:
csv_reader = csv.DictReader(myfile)
for row in csv_reader:
print ( row['column_name_of_interest'])
The above will print to the terminal all the values that match your specific 'column_name_of_interest' after you edit it to match your particular file.
It's normal to work with lots of columns at once, so that dictionary method of packing a whole row into a single object, addressable by column-name can be very convenient later on.

To a pure python implementation, you should use the package csv.
data.csv
Project1,folder1/file1,data
Project1,folder1/file2,data
Project1,folder1/file3,data
Project1,folder1/file4,data
Project1,folder2/file11,data
Project1,folder2/file42a,data
Project1,folder2/file42b,data
Project1,folder2/file42c,data
Project1,folder2/file42d,data
Project1,folder3/filec,data
Project1,folder3/fileb,data
Project1,folder3/filea,data
Your python program should read it by line
import csv
a = []
with open('data.csv') as csv_file:
reader = csv.reader(csv_file, delimiter=',')
for row in reader:
print(row)
# ['Project1', 'folder1/file1', 'data']
If you print the row element you will see it is a list like that
['Project1', 'folder1/file1', 'data']
If I would like to put in my list all elements in column 1, I need to put that element in my list, doing:
a.append(row[1])
Now in list a I will have a list like:
['folder1/file1', 'folder1/file2', 'folder1/file3', 'folder1/file4', 'folder2/file11', 'folder2/file42a', 'folder2/file42b', 'folder2/file42c', 'folder2/file42d', 'folder3/filec', 'folder3/fileb', 'folder3/filea']
Here is the complete code:
import csv
a = []
with open('data.csv') as csv_file:
reader = csv.reader(csv_file, delimiter=',')
for row in reader:
a.append(row[1])

How to delete lines from csv file using python?

I have a CSV file:It contain the classes name and type of code smell and for each class Icalculated the number of a code smell .the final calcul is on the last line so there are many repeated classes name .
I need just the last line of the class name.
This is a part of my CSV file beacause it's too long :
NameOfClass,LazyClass,ComplexClass,LongParameterList,FeatureEnvy,LongMethod,BlobClass,MessageChain,RefusedBequest,SpaghettiCode,SpeculativeGenerality
com.nirhart.shortrain.MainActivity,NaN,NaN,NaN,NaN,NaN,NaN,1,NaN,NaN,NaN
com.nirhart.shortrain.path.PathParser,NaN,1,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.PathParser,NaN,1,NaN,1,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.PathParser,NaN,1,1,1,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.PathParser,NaN,1,2,1,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.PathParser,NaN,1,2,1,1,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.PathPoint,1,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.PathPoint,1,NaN,1,NaN,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.TrainPath,NaN,NaN,NaN,1,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.rail.RailActionActivity,NaN,NaN,NaN,1,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.rail.RailActionActivity,NaN,NaN,NaN,1,1,NaN,NaN,NaN,NaN,NaN

To filter out the last entry for groups of NameOfClass, you can make use of Python's groupby() function to return lists of rows with the same NameOfClass. The last entry from each can then be written to a file.
from itertools import groupby
import csv
with open('data_in.csv', newline='') as f_input, open('data_out.csv', 'w', newline='') as f_output:
csv_input = csv.reader(f_input)
csv_output = csv.writer(f_output)
for key, rows in groupby(csv_input, key=lambda x: x[0]):
csv_output.writerow(list(rows)[-1])
For the data you have given, this would give you the following output:
NameOfClass,LazyClass,ComplexClass,LongParameterList,FeatureEnvy,LongMethod,BlobClass,MessageChain,RefusedBequest,SpaghettiCode,SpeculativeGenerality
com.nirhart.shortrain.MainActivity,NaN,NaN,NaN,NaN,NaN,NaN,1,NaN,NaN,NaN
com.nirhart.shortrain.path.PathParser,NaN,1,2,1,1,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.PathPoint,1,NaN,1,NaN,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.TrainPath,NaN,NaN,NaN,1,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.rail.RailActionActivity,NaN,NaN,NaN,1,1,NaN,NaN,NaN,NaN,NaN

To get just the unique class names (ignoring repeated rows, not deleting them), you can do this:
import csv
with open('my_file.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
classNames = set(row[0] for row in reader)
print(classNames)
# {'com.nirhart.shortrain.MainActivity', 'com.nirhart.shortrain.path.PathParser', 'com.nirhart.shortrain.path.PathPoint', ...}
This is just using the csv module to open a file, getting the first value in each row, and then taking only the unique values of those. You can then manipulate the resulting set of strings (you might want to cast it back to a list via list(classNames)) however you need to.

If you intend to later process the data in pandas, filtering duplicates is trivial:
import pandas as pd
df = pd.read_csv('file.csv')
df = df.loc[~df.NameOfClass.duplicated(keep='last')]
If you just want to build a new csv file with only the expected lines, pandas is overkill and the csv module is enough:
import csv
with open('file.csv') as fdin, file('new_file.csv', 'w', newline='') as fdout:
rd = csv.reader(fdin)
wr = csv.writer(fdout)
wr.writerow(next(rd)) # copy the header line
old = None
for row in rd:
if old is not None and old[0] != row[0]:
wr.writerow(old)
old = row
wr.writerow(old)

Sort a CSV file to read in Python Program

I'm trying to create a leaderboard in python, where a player will get a score from playing a game, which will write to a .csv file. I then need to read from this leaderboard, sorted from largest at the top to smallest at the bottom.
Is the sorting something that should be done when the values are written to the file, when i read the file, or somewhere in between?
my code:
writefile=open("leaderboard.csv","a")
writefile.write(name+", "points)
writefile.close()
readfile=open("leaderboard.csv","r")
I'm hoping to display the top 5 scores and the accompanying names.
It is this point that I have hit a brick wall. Thanks for any help.
Edit: getting the error 'list index out of range'
import csv
name = 'Test'
score = 3
with open('scores.csv', 'a') as f:
writer = csv.writer(f)
writer.writerow([name, score])
with open('scores.csv') as f:
reader = csv.reader(f)
scores = sorted(reader, key=lambda row: (float(row[1]), row[0]))
top5 = scores[-5:]
csv file:
test1 3
test2 3
test3 3

Python has a csv module in the standard library. It's very simple to use:
import csv
with open('scores.csv', 'a') as f:
writer = csv.writer(f)
writer.writerow([name, score])
with open('scores.csv') as f:
reader = csv.reader(f)
scores = sorted(reader, key=lambda row: (float(row[1]), row[0]))
top5 = scores[-5:]
Is the sorting something that should be done when the values are written to the file, when i read the file, or somewhere in between?
Both approaches have their benefits. If you sort when you read, you can simply append to the file, which makes writing new scores faster. You do take more time when reading as you have to sort it before you can figure out the highest score, but in a local leaderboard file this is unlikely to be a problem as you are unlikely to have more than a few thousands of lines, so you'll likely be fine with sorting when reading.
If you sort while writing, this comes with the problem that you'll have to rewrite the entire file every time a new score is added. However, it's also easier to cleanup the leaderboard file if you sort while writing. You can simply remove old/low scores that you don't care about anymore while writing.

Try this:
import pandas as pd
df = pd.read_csv('myfullfilepath.csv', sep=',', names=['name', 'score'])
df = df.sort_values(['score'], ascending=False)
first_ten = df.head(10)
first_ten.to_csv('myfullpath.csv', index=False).
I named the columns like that , following the structure tat you suggested.

Sorting a table in python

I am creating a league table for a 6 a side football league and I am attempting to sort it by the points column and then display it in easygui. The code I have so far is this:
data = csv.reader(open('table.csv'), delimiter = ',')
sortedlist = sorted(data, key=operator.itemgetter(7))
with open("Newtable.csv", "wb") as f:
fileWriter = csv.writer(f, delimiter=',')
for row in sortedlist:
fileWriter.writerow(row)
os.remove("table.csv")
os.rename("Newtable.csv", "table.csv")
os.close
The number 7 relates to the points column in my csv file. I have a problem with Newtable only containing the teams information that has the highest points and the table.csv is apparently being used by another process and so cannot be removed.
If anyone has any suggestions on how to fix this it would be appreciated.

If the indentation in your post is actually the indentation in your script (and not a copy-paste error), then the problem is obvious:
os.rename() is executed during the for loop (which means that it's called once per line in the CSV file!), at a point in time where Newtable.csv is still open (not by a different process but by your script itself), so the operation fails.
You don't need to close f, by the way - the with statement takes care of that for you. What you do need to close is data - that file is also still open when the call occurs.
Finally, since a csv object contains strings, and strings are sorted alphabetically, not numerically (so "10" comes before "2"), you need to sort according to the numerical value of the string, not the string itself.
You probably want to do something like
with open('table.csv', 'rb') as infile:
data = csv.reader(infile, delimiter = ',')
sortedlist = [next(data)] + sorted(data, key=lambda x: int(x[7])) # or float?
# next(data) reads the header before sorting the rest
with open("Newtable.csv", "wb") as f:
fileWriter = csv.writer(f, delimiter=',')
fileWriter.writerows(sortedList) # No for loop needed :)
os.remove("table.csv")
os.rename("Newtable.csv", "table.csv")

I'd suggest using pandas:
Assuming an input file like this:
team,points
team1, 5
team2, 6
team3, 2
You could do:
import pandas as pd
a = pd.read_csv('table.csv')
b=a.sort('points',ascending=False)
b.to_csv('table.csv',index=False)

Write csv with each list as column

I have a dictionary that holds a two level nested list for each key that looks like the following:
OrderedDict([(0,[['a','b','c','d'],['e','f','g','h']]), (1,[['i','j','k','l'],['m','n','o','p']])])
I would like to write each nested list to a csv as a column:
a,b,c,d e,f,g,h
i,j,k,l m,n,o,p
the output I am getting from my current code is:
['a','b','c','d'] ['e','f','g','h']
['i','j','k','l'] ['m','n','o','p']
The columns are correct but I would like to remove the brackets [ ] and quotes ' '
a,b,c,d e,f,g,h
i,j,k,l m,n,o,p
CODE:
with open('test.csv', 'w', newline='') as csv_file:
writer = csv.writer(csv_file, quotechar='"', quoting=csv.QUOTE_ALL)
for record in my_dict.values():
writer.writerow(record)
Any help would be appreciated!

This should work:
with open('test.csv', 'w', newline='') as csv_file:
writer = csv.writer(csv_file, quotechar='"', quoting=csv.QUOTE_ALL)
for record in my_dict.values():
final = [",".join(index) for index in record]
writer.writerow(final)

I am having trouble visualizing how your record looks because it seems like each value should be a nested list, meaning each column should contain both lists. If it were really writing one value from .values() at a time, it seems like you would get one column with both lists in it.
At any rate, what you want to achieve should probably be sought through the .join() method:
for record in my_dict.values():
writer.writerow(','.join(record))
But the thing is, you're making a CSV (comma separated values) file, so each comma may be interpreted as a field-delimiter.
Question: Have you considered using a different delimiter when you instantiate your csv.writer?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Sort CSV File by Numerical Values - python

See here for reading a csv into dataframe https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html and here for sorting it https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html

Related

how to select a specific column of a csv file in python

How to delete lines from csv file using python?

Sort a CSV file to read in Python Program

Sorting a table in python

Write csv with each list as column

Categories

Resources