Python 3.6: rename column header using DictWriter - python

my python script currently uses DictWriter to read a csv file, re-arrange the columns and write to a new output csv file. The input CSV file has the following columns:
A;B;C;D
which will be transferred to:
B,C;A;D
Additionally, I would like to rename one of the header. I already tried 2 approaches:
1.) create a new writer object and use the `writer' method. However, this simply puts all the given fieldnames in the very first columns:
newHeader = csv.writer(outfile)
newFN = ['B', 'C', 'Renamed', 'D']
newHeader.writerow(newFN)
the output is:
B,C,Renamed,D;;;
2.) Using the existing DictWriter object I define a new list of column headers and iterate over it:
newHeader = ['B', 'C', 'Renamed', 'D']
writer.writerow(dict((fn, fn) for fn in newHeader))
This time however, the renamed column header remains empty in the output CSV.

Your can use a dictionary to rename columns and csv.writer to write values from reordered OrderedDict objects:
from io import StringIO
from collections import OrderedDict
import csv
mystr = StringIO("""A;B;C;D
1;2;3;4
5;6;7;8""")
order = ['B', 'C', 'A', 'D']
# define renamed columns via dictionary
renamer = {'C': 'C2'}
# define column names after renaming
new_cols = [renamer.get(x, x) for x in order]
# replace mystr as open(r'file.csv', 'r')
with mystr as fin, open(r'C:\temp\out.csv', 'w', newline='') as fout:
# define reader / writer objects
reader = csv.DictReader(fin, delimiter=';')
writer = csv.writer(fout, delimiter=';')
# write new header
writer.writerow(new_cols)
# iterate reader and write row
for item in reader:
writer.writerow([item[k] for k in order])
Result:
B;C2;A;D
2;3;1;4
6;7;5;8

Related

Re-ordering columns in a csv but with Dictionaries broken

I have a code that is braking because I am trying to reorganize columns but also ignoring other columns on the output csv file.
Input csv file:
book1.csv
A,B,C,D,E,F
a1,b1,c1,d1,e1,F1
a1,b1,c1,d1,e1,F1
a1,b1,c1,d1,e1,
a1,b1,c1,d1,e1,F1
a1,b1,c1,d1,e1,
My code:
import csv
order_of_headers_should_be = ['A', 'C', 'D', 'E', 'B']
dictionary = {'A':'X1','B':'Y1','C':'U1','D':'T1','E':'K1'}
new_headers = [dictionary[old_header] for old_header in order_of_headers_should_be]
with open('Book1.csv', 'r') as infile, open('reordered.csv', 'a') as outfile:
# output dict needs a list for new column ordering
writer = csv.DictWriter(outfile, fieldnames = new_headers)
# reorder the header first
writer.writeheader()
for row in csv.DictReader(infile):
new_row = {dictionary[old_header]: row[old_header] for old_header in row}
writer.writerow(new_row)
my current output is only the headers (but they are in the correct order):
X1,U1,T1,K1,Y1
Getting an KeyError: 'F'
But I need it to also output so it will look like this:
reordered.csv
X1,U1,T1,K1,Y1
a1,c1,d1,e1,b1
a1,c1,d1,e1,b1
a1,c1,d1,e1,b1
a1,c1,d1,e1,b1
a1,c1,d1,e1,b1
When old_header is F you'll get a KeyError, so the for row loop will stop and you won't get any data rows in the output file.
Add a check for this to ther dictionary comprehension.
new_row = {dictionary[old_header]: value for old_header, value in row.items() if old_header in dictionary}
You could also loop through dictionary instead of row.
new_row = {new_header: row[old_header] for old_header, new_header in dictionary}
Here is a simpler way using pandas to do the heavy lifting.
import pandas as pd
# Read CSV file into DataFrame df
df = pd.read_csv('Book1.csv')
# delete F column
df = df.drop('F', axis=1)
# rename columns
df.columns = ['X1', 'Y1', 'U1', 'T1', 'K1']
# write to file in desired order
df.to_csv('book_out.csv', index=False,
columns=['X1', 'U1', 'T1', 'K1', 'Y1'])

Re-ordering columns in a csv but with Dictionaries

I need to re-order columns in a csv but I'll need to call each column from a dictionary.
EXAMPLE:
Sample input csv File:
$ cat file.csv
A,B,C,D,E
a1,b1,c1,d1,e1
a2,b2,c2,d2,e2
Code
import csv
with open('file.csv', 'r') as infile, open('reordered.csv', 'a') as outfile:
order_of_headers_should_be = ['A', 'C', 'D', 'E', 'B']
dictionary = {'A':'X1','B':'Y1','C':'U1','D':'T1','E':'K1'}
writer = csv.DictWriter(outfile)
# reorder the header first
writer.writeheader()
for row in csv.DictReader(infile):
# writes the reordered rows to the new file
writer.writerow(row)
The Output csv file needs to look like this:
$ cat reordered.csv
X1,U1,T1,K1,Y1
a1,c1,d1,e1,b1
a2,c2,d2,e2,b2
Trying to make a variable to call the dictionary
You can do this by permuting the keys when you are about to write the row like so:
for row in csv.DictReader(infile):
# writes the reordered rows to the new file
writer.writerow({dictionary[i]: row[i] for i in row})
Note the use of a dictionary comprehension.

How to edit a CSV file row by row in Python without using Pandas

I have a CSV file and when I read it by importing the CSV library I get as the output:
['exam', 'id_student', 'grade']`
['maths', '573834', '7']`
['biology', '573834', '8']`
['biology', '578833', '4']
['english', '581775', '7']`
# goes on...
I need to edit it by creating a 4th column called 'Passed' with two possible values: True or False depending on whether the grade of the row is >= 7 (True) or not (False), and then count how many times each student passed an exam.
If it's not possible to edit the CSV file that way, I would need to just read the CSV file and then create a dictionary of lists with the following output:
dict = {'id_student':[573834, 578833, 581775], 'passed_count': [2,0,1]}
# goes on...
Thanks
Try using importing csv as pandas dataframe
import pandas as pd
data=pd.read_csv('data.csv')
And then use:
data['passed']=(data['grades']>=7).astype(bool)
And then save dataframe to csv as:
data.to_csv('final.csv',index=False)
It is totally possible to "edit" CSV.
Assuming you have a file students.csv with the following content:
exam,id_student,grade
maths,573834,7
biology,573834,8
biology,578833,4
english,581775,7
Iterate over input rows, augment the field list of each row with an additional item, and save it back to another CSV:
import csv
with open('students.csv', 'r', newline='') as source, open('result.csv', 'w', newline='') as result:
csvreader = csv.reader(source)
csvwriter = csv.writer(result)
# Deal with the header
header = next(csvreader)
header.append('Passed')
csvwriter.writerow(header)
# Process data rows
for row in csvreader:
row.append(str(int(row[2]) >= 7))
csvwriter.writerow(row)
Now result.csv has the content you need.
If you need to replace the original content, use os.remove() and os.rename() to do that:
import os
os.remove('students.csv')
os.rename('result.csv', 'students.csv')
As for counting, it might be an independent thing, you don't need to modify CSV for that:
import csv
from collections import defaultdict
with open('students.csv', 'r', newline='') as source:
csvreader = csv.reader(source)
next(csvreader) # Skip header
stats = defaultdict(int)
for row in csvreader:
if int(row[2]) >= 7:
stats[row[1]] += 1
print(stats)
You can include counting into the code above and have both pieces in one place. defaultdict (stats) has the same interface as dict if you need to access that.

Python: writing elements from a list into CSV file without brackets or quotes

I am struggeling with the elements of a list written to CSV file.
I wrote a Python script selecting some 25 human languages from a CSV file and put them into a list. The idea is to randomly create content that can result into a fake CV for machine learning:
# languages
all_langs = []
#populate languges from a central language csv
with open('sprachen.csv', newline='', encoding="utf-8") as csvfile:
file_reader = csv.reader(csvfile, delimiter=',', quotechar='|')
for row in file_reader:
if row[0] == "Sprache":
continue
else:
all_langs.append(format(row[0]))
After that, I create a new empty list and fill it with the initial languages in a range from 0 to 3:
lang_skills = []
ran_number = random.randint(1,3)
for i in range(0,ran_number):
lang_skills.append(str(all_langs[random.randint(0,len(all_langs)-1)]))
When I print the lang_skills it looks like this:
['Urdu', 'Ukrainian', 'Swedish']
In the last step, I want to write the lang_skills into a new CVS file.
with open('liste.csv', 'a+', newline='', encoding="utf-8") as csvfile:
writer = csv.writer(csvfile, delimiter=';',quotechar='"', quoting=csv.QUOTE_ALL)
writer.writerow(lang_skills)
However, the CSV looks like this:
Language
"Urdu";"Ukrainian";"Swedish"
...
How can write the output like this:
Language
"Urdu, Ukrainian, Swedish"; ...
...
You can convert your list into a pandas dataframe, to create a CSV you need the columns name and your list of value:
import pandas as pd
my_list = ['a', 'b', 'c']
df = pd.DataFrame({'col': my_list})
df.to_csv("path/csv_name.csv")
You are trying to write a string to the CSV, not a list; you want to write the string "Urdu, Ukrainian, Swedish" to the file, so you need to produce that string beforehand. Try joining the languages with ", ".join(lang_skills).

Python - re-ordering columns in a csv

I have a bunch of csv files with the same columns but in different order. We are trying to upload them with SQL*Plus but we need the columns with a fixed column arrange.
Example
required order: A B C D E F
csv file: A C D E B (sometimes a column is not in the csv because it is not available)
is it achievable with python? we are using Access+Macros to do it... but it is too time consuming
PS. Sorry if anyone get upset for my English skills.
You can use the csv module to read, reorder, and then and write your file.
Sample File:
$ cat file.csv
A,B,C,D,E
a1,b1,c1,d1,e1
a2,b2,c2,d2,e2
Code
import csv
with open('file.csv', 'r') as infile, open('reordered.csv', 'a') as outfile:
# output dict needs a list for new column ordering
fieldnames = ['A', 'C', 'D', 'E', 'B']
writer = csv.DictWriter(outfile, fieldnames=fieldnames)
# reorder the header first
writer.writeheader()
for row in csv.DictReader(infile):
# writes the reordered rows to the new file
writer.writerow(row)
output
$ cat reordered.csv
A,C,D,E,B
a1,c1,d1,e1,b1
a2,c2,d2,e2,b2
So one way to tackle this problem is to use pandas library which can be easily install using pip. Basically, you can download csv file to pandas dataframe then re-order the column and save it back to csv file. For example, if your sample.csv looks like below:
A,C,B,E,D
a1,b1,c1,d1,e1
a2,b2,c2,d2,e2
Here is a snippet to solve the problem.
import pandas as pd
df = pd.read_csv('/path/to/sample.csv')
df_reorder = df[['A', 'B', 'C', 'D', 'E']] # rearrange column here
df_reorder.to_csv('/path/to/sample_reorder.csv', index=False)
csv_in = open("<filename>.csv", "r")
csv_out = open("<filename>.csv", "w")
for line in csv_in:
field_list = line.split(',') # split the line at commas
output_line = ','.join(field_list[0], # rejoin with commas, new order
field_list[2],
field_list[3],
field_list[4],
field_list[1]
)
csv_out.write(output_line)
csv_in.close()
csv_out.close()
You can use something similar to this to change the order, replacing ';' with ',' in your case.
Because you said you needed to do multiple .csv files, you could use the glob module for a list of your files
for file_name in glob.glob('<Insert-your-file-filter-here>*.csv'):
#Do the work here
The csv module allows you to read csv files with their values associated to their column names. This in turn allows you to arbitrarily rearrange columns, without having to explicitly permute lists.
for row in csv.DictReader(open("foo.csv")):
print row["b"], row["a"]
2 1
22 21
Given the file foo.csv:
a,b,d,e,f
1,2,3,4,5
21,22,23,24,25

Categories

Resources