How to sort a CSV file alphabetically? - python

This is my code that adds the data to the CSV file known as studentScores.csv
myfile = open("studentScores.csv", "a+")
newRecord = Score, Name, Gender, FormGroup, Percentage
myfile.write(str(newRecord))
myfile.write("\n")
myfile.close()
As a part of my task, I need to alphabetise the data in the CSV, I have searched, and searched for a solution, but I am unable to find a working solution for me. I am pretty new to Python, so the simplest solution will be appreciated.

import csv
from operator import itemgetter
with open('studentScores.csv', 'r') as f:
data = [line for line in csv.reader(f)]
newRecord = [Score, Name, Gender, FormGroup, Percentage]
data.append(newRecord)
data.sort(key=itemgetter(1)) # 1 being the column number
with open('studentScores.csv', 'w') as f:
csv.writer(f).writerows(data)
First of all, this uses functions from the csv module for properly parsing and creating CSV syntax. Secondly, it reads all existing entries into data, appends the new record, sorts all records, then dumps them back to the file.
If you're using a header row in your CSV file to add names to columns, look at DictReader and DictWriter, that would allow you to handle columns by name, not number (e.g. in the sorting step).

Related

How to write in a specific cell in a CSV file?

I have an asignment in which I need to imput random grades of different students in a csv file using Python 3, and get the average of each student(the average thing and how to get the random grades, I know how to do it), the thing is that I don't know how to write the grades on those specific columns and rows(highlighted ones).
Highlighted area is the space in which I need to write random grades:
Is there anyway that this can be done? I'm fairly new to programming and Python 3, and as far as I've read, specifics cells can't be changed using normal means.
csv module doesn't have functions to modify specific cells.
You can read rows from original file, append grades and write modified rows to new file:
import random
import csv
inputFile = open('grades.csv', 'r')
outputFile = open('grades_out.csv', 'w')
reader = csv.reader(inputFile)
writer = csv.writer(outputFile)
for row in reader:
grades = row.copy()
for i in range(5):
grades.append(random.randint(1, 5))
writer.writerow(grades)
inputFile.close()
outputFile.close()
Then you can delete original file and rename new file (it is not good to read the whole original file to a variable, close it, open it again in writing mode and then write data, because it can be big).

Grab values from seperate csv file and replace the values of columns in a pipe delimited file

Trying to whip this out in python. Long story short I got a csv file that contains column data i need to inject into another file that is pipe delimited. My understanding is that python can't replace values, so i have to re-write the whole file with the new values.
data file(csv):
value1,value2,iwantthisvalue3
source file(txt, | delimited)
value1|value2|iwanttoreplacethisvalue3|value4|value5|etc
fixed file(txt, | delimited)
samevalue1|samevalue2| replacedvalue3|value4|value5|etc
I can't figure out how to accomplish this. This is my latest attempt(broken code):
import re
import csv
result = []
row = []
with open("C:\data\generatedfixed.csv","r") as data_file:
for line in data_file:
fields = line.split(',')
result.append(fields[2])
with open("C:\data\data.txt","r") as source_file, with open("C:\data\data_fixed.txt", "w") as fixed_file:
for line in source_file:
fields = line.split('|')
n=0
for value in result:
fields[2] = result[n]
n=n+1
row.append(line)
for value in row
fixed_file.write(row)
I would highly suggest you use the pandas package here, it makes handling tabular data very easy and it would help you a lot in this case. Once you have installed pandas import it with:
import pandas as pd
To read the files simply use:
data_file = pd.read_csv("C:\data\generatedfixed.csv")
source_file = pd.read_csv('C:\data\data.txt', delimiter = "|")
and after that manipulating these two files is easy, I'm not exactly sure how many values or which ones you want to replace, but if the length of both "iwantthisvalue3" and "iwanttoreplacethisvalue3" is the same then this should do the trick:
source_file['iwanttoreplacethisvalue3'] = data_file['iwantthisvalue3]
now all you need to do is save the dataframe (the table that we just updated) into a file, since you want to save it to a .txt file with "|" as the delimiter this is the line to do that (however you can customize how to save it in a lot of ways):
source_file.to_csv("C:\data\data_fixed.txt", sep='|', index=False)
Let me know if everything works and this helped you. I would also encourage to read up (or watch some videos) on pandas if you're planning to work with tabular data, it is an awesome library with great documentation and functionality.

How can I open a csv file in python, and read one line at a time, without loading the whole csv file in memory?

I have a csv file of size that would not fit in the memory of my machine. So I want to open the csv file and then read it's rows one at a time. I basically want to make a python generator that yields single rows from the csv.
Thanks in advance! :)
with open(filename, "r") as file:
for line in file:
doanything()
Python is lazy whenever possible. File objects are generators and do not load the entire file but only one line at a time.
Solution:
You can use chunksize param available in pandas read_csv function
chunksize = 10 ** 6
for chunk in pd.read_csv(filename, chunksize=chunksize):
print(type(chunk))
# CODE HERE
set chunksize to 1 and it should take care of your problem statement.
My personal preference for doing this is with csv.DictReader
You set it up as an object, with pointers/parameters, and then to access the file one row at a time, you just iterate over it with next and it returns a dictionary containing the named field key, value pairs in your csv file.
e.g.
import csv
csvfile = open('names.csv')
my_reader = csv.DictReader(csvfile)
first_row = next(my_reader)
for row in my_reader:
print ( [(k,v) for k,v in row.items() ] )
csvfile.close()
See the linked docs for parameter usage etc - it's fairly straightforward.
python generator that yields single rows from the csv.
This sounds like you want csv.reader from built-in csv module. You will get one list for each line in file.

How to read, edit, merge and save all csv files from one folder?

I'm new in Python, I'm trying to read all .csv files from one folder, I must add the third column (Dataset 1)from all files to a new .csv file (or Excel file). I have no problem to work with one file and edit (read, cut rows and columns, add columns and make simple statistics).
This is an example of one of my CSV files Imgur
and I have more than 2000!!! each one with 1123 rows
This should be fairly easy with something like the csv library, if you don't want to get into learning dataframes.
import os
import csv
new_data = []
for filename in os.listdir('./csv_dir'):
if filename.endswith('.csv'):
with open('./csv_dir/' + filename, mode='r') as curr_file:
reader = csv.reader(curr_file, delimiter=',')
for row in reader:
new_data.append(row[2]) # Or whichever column you need
with open('./out_dir/output.txt', mode='w') as out_file:
for row in new_data:
out_file.write('{}\n'.format(row))
Your new_data will contain the 2000 * 1123 columns.
This may not be the most efficient way to do this, but it'll get the job done and grab each CSV. You'll need to do the work of making sure the CSV files have the correct structure, or adding in checks in the code for validating the columns before appending to new_data.
Maybe try
csv_file = csv.reader(open(path, "r",), delimiter=",")
csv_file1 = csv.reader(open(path, "r",), delimiter=",")
csv_file2 = csv.reader(open(path, "r",), delimiter=",")
and then read like
for row in csv_file:
your code here
for row in csv_file1:
your code here
for row in csv_file2:
your code here

python:compare two large csv files by two reference columns and update another column

I have quite large csv file, about 400000 lines like:
54.10,14.20,34.11
52.10,22.20,22.11
49.20,17.30,29.11
48.40,22.50,58.11
51.30,19.40,13.11
and the second one about 250000 lines with updated data for the third column - the first and the second column are reference for update:
52.10,22.20,22.15
49.20,17.30,29.15
48.40,22.50,58.15
I would like to build third file like:
54.10,14.20,34.11
52.10,22.20,22.15
49.20,17.30,29.15
48.40,22.50,58.15
51.30,19.40,13.11
It has to contain all data from the first file except these lines where value of third column is taken from the second file.
Suggest you look at Pandas merge functions. You should be able to do what you want,, It will also handle the data reading from CSV (create a dataframe that you will merge)
A stdlib solution with just the csv module; the second file is read into memory (into a dictionary):
import csv
with open('file2.csv', 'rb') as updates_fh:
updates = {tuple(r[:2]): r for r in csv.reader(updates_fh)}
with open('file1.csv', 'rb') as infh, open('output.csv', 'wb') as outfh:
writer = csv.writer(outfh)
writer.writerows((updates.get(tuple(r[:2]), r) for r in csv.reader(infh)))
The first with statement opens the second file and builds a dictionary keyed on the first two columns. It is assumed that these are unique in the file.
The second block then opens the first file for reading, the output file for writing, and writes each row from the inputfile to the output file, replacing any row present in the updates dictionary with the updated version.

Categories

Resources