Re-ordering columns in a csv but with Dictionaries broken - python

I have a code that is braking because I am trying to reorganize columns but also ignoring other columns on the output csv file.
Input csv file:
book1.csv
A,B,C,D,E,F
a1,b1,c1,d1,e1,F1
a1,b1,c1,d1,e1,F1
a1,b1,c1,d1,e1,
a1,b1,c1,d1,e1,F1
a1,b1,c1,d1,e1,
My code:
import csv
order_of_headers_should_be = ['A', 'C', 'D', 'E', 'B']
dictionary = {'A':'X1','B':'Y1','C':'U1','D':'T1','E':'K1'}
new_headers = [dictionary[old_header] for old_header in order_of_headers_should_be]
with open('Book1.csv', 'r') as infile, open('reordered.csv', 'a') as outfile:
# output dict needs a list for new column ordering
writer = csv.DictWriter(outfile, fieldnames = new_headers)
# reorder the header first
writer.writeheader()
for row in csv.DictReader(infile):
new_row = {dictionary[old_header]: row[old_header] for old_header in row}
writer.writerow(new_row)
my current output is only the headers (but they are in the correct order):
X1,U1,T1,K1,Y1
Getting an KeyError: 'F'
But I need it to also output so it will look like this:
reordered.csv
X1,U1,T1,K1,Y1
a1,c1,d1,e1,b1
a1,c1,d1,e1,b1
a1,c1,d1,e1,b1
a1,c1,d1,e1,b1
a1,c1,d1,e1,b1

When old_header is F you'll get a KeyError, so the for row loop will stop and you won't get any data rows in the output file.
Add a check for this to ther dictionary comprehension.
new_row = {dictionary[old_header]: value for old_header, value in row.items() if old_header in dictionary}
You could also loop through dictionary instead of row.
new_row = {new_header: row[old_header] for old_header, new_header in dictionary}

Here is a simpler way using pandas to do the heavy lifting.
import pandas as pd
# Read CSV file into DataFrame df
df = pd.read_csv('Book1.csv')
# delete F column
df = df.drop('F', axis=1)
# rename columns
df.columns = ['X1', 'Y1', 'U1', 'T1', 'K1']
# write to file in desired order
df.to_csv('book_out.csv', index=False,
columns=['X1', 'U1', 'T1', 'K1', 'Y1'])

Related

Re-ordering columns in a csv but with Dictionaries

I need to re-order columns in a csv but I'll need to call each column from a dictionary.
EXAMPLE:
Sample input csv File:
$ cat file.csv
A,B,C,D,E
a1,b1,c1,d1,e1
a2,b2,c2,d2,e2
Code
import csv
with open('file.csv', 'r') as infile, open('reordered.csv', 'a') as outfile:
order_of_headers_should_be = ['A', 'C', 'D', 'E', 'B']
dictionary = {'A':'X1','B':'Y1','C':'U1','D':'T1','E':'K1'}
writer = csv.DictWriter(outfile)
# reorder the header first
writer.writeheader()
for row in csv.DictReader(infile):
# writes the reordered rows to the new file
writer.writerow(row)
The Output csv file needs to look like this:
$ cat reordered.csv
X1,U1,T1,K1,Y1
a1,c1,d1,e1,b1
a2,c2,d2,e2,b2
Trying to make a variable to call the dictionary
You can do this by permuting the keys when you are about to write the row like so:
for row in csv.DictReader(infile):
# writes the reordered rows to the new file
writer.writerow({dictionary[i]: row[i] for i in row})
Note the use of a dictionary comprehension.

Import csv with inconsistent count of columns per row with original header use pandas

please how can I read csv of that type and keep original columns names? Maybe add some generic column names to the end of the header, depending on the max number of columns in the body of csv...
a,b,c
1,2,3
1,2,3,
1,2,3,4
Simple read_csv does not work:
tempfile = pd.read_csv(path
,index_col=None
,sep=','
,header=0
,error_bad_lines=False
,encoding = 'unicode_escape'
,warn_bad_lines=True
)
b'Skipping line 3: expected 3 fields, saw 4\nSkipping line 4: expected 3 fields, saw 4\n'
I need that type of result:
a,b,c,x1
1,2,3,NA
1,2,3,NA
1,2,3,4
One approach would be to first read just the header row in and then pass these column names with your extra generic names as a parameter to pandas. For example:
import pandas as pd
import csv
filename = "input.csv"
with open(filename, newline="") as f_input:
header = next(csv.reader(f_input))
header += [f'x{n}' for n in range(1, 10)]
tempfile = pd.read_csv(filename,
index_col=None,
sep=',',
skiprows=1,
names=header,
error_bad_lines=False,
encoding='unicode_escape',
warn_bad_lines=True,
)
skiprows=1 tells pandas to jump over the header and names holds the full list of column headers to use.
The header would then contain:
['a', 'b', 'c', 'x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7', 'x8', 'x9']

How to change read_csv handling of empty values

When loading a header with missing values, pandas' read_csv creates a name like Unnamed: 0_level_1. How would I do to replace these with empty strings?
import pandas as pd
file = """A,B,C,C
,,C1,C2
1,2,3,4
5,6,7,8
"""
with open('test.csv', 'w') as f:
f.write(file)
df = pd.read_csv('test.csv', header=[0, 1])
print(df.columns)
You can use built-in rename, something like:
data.rename( columns={0:'whatever you want'}, inplace=True )
More info https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rename.html
In the process of CSV downloading You can put yours names of columns.
uruser_cols = ['A', 'B', 'C', 'D', 'E']
users = pd.read_csv('C:/2/mlotek.csv', header=None, names=uruser_cols)
print(users.head())

Python 3.6: rename column header using DictWriter

my python script currently uses DictWriter to read a csv file, re-arrange the columns and write to a new output csv file. The input CSV file has the following columns:
A;B;C;D
which will be transferred to:
B,C;A;D
Additionally, I would like to rename one of the header. I already tried 2 approaches:
1.) create a new writer object and use the `writer' method. However, this simply puts all the given fieldnames in the very first columns:
newHeader = csv.writer(outfile)
newFN = ['B', 'C', 'Renamed', 'D']
newHeader.writerow(newFN)
the output is:
B,C,Renamed,D;;;
2.) Using the existing DictWriter object I define a new list of column headers and iterate over it:
newHeader = ['B', 'C', 'Renamed', 'D']
writer.writerow(dict((fn, fn) for fn in newHeader))
This time however, the renamed column header remains empty in the output CSV.
Your can use a dictionary to rename columns and csv.writer to write values from reordered OrderedDict objects:
from io import StringIO
from collections import OrderedDict
import csv
mystr = StringIO("""A;B;C;D
1;2;3;4
5;6;7;8""")
order = ['B', 'C', 'A', 'D']
# define renamed columns via dictionary
renamer = {'C': 'C2'}
# define column names after renaming
new_cols = [renamer.get(x, x) for x in order]
# replace mystr as open(r'file.csv', 'r')
with mystr as fin, open(r'C:\temp\out.csv', 'w', newline='') as fout:
# define reader / writer objects
reader = csv.DictReader(fin, delimiter=';')
writer = csv.writer(fout, delimiter=';')
# write new header
writer.writerow(new_cols)
# iterate reader and write row
for item in reader:
writer.writerow([item[k] for k in order])
Result:
B;C2;A;D
2;3;1;4
6;7;5;8

Mapping CSV Header using a Dictionary

I have a reference file that looks like this:
Experiment,Array,Drug
8983,Genechip,Famotidine
8878,Microarray,Dicyclomine
8988,Genechip,Etidronate
8981,Microarray,Flunarizine
I successfully created a dictionary mapping the Experiment numbers to the Drug name using the following:
reader = csv.reader(open('C:\Users\Troy\Documents\ExPSRef.txt'))
#Configure dictionary
result = {}
for row in reader:
key = row[0]
result[key] = row[2]
di = result
I want to map this dictionary to the header of another file which consists of the experiment number. It currently looks like this:
Gene,8988,8981,8878,8983
Vcp,0.011,-0.018,-0.032,-0.034
Ube2d2,0.034,0.225,-0.402,0.418
Becn1,0.145,-0.108,-0.421,-0.048
Lypla2,-0.146,-0.026,-0.101,-0.011
But it should look like this:
Gene,Etidronate,Flunarizine,Dicyclomine,Famotidine
Vcp,0.011,-0.018,-0.032,-0.034
Ube2d2,0.034,0.225,-0.402,0.418
Becn1,0.145,-0.108,-0.421,-0.048
Lypla2,-0.146,-0.026,-0.101,-0.011
I tried using:
import csv
import pandas as pd
reader = csv.reader(open('C:\Users\Troy\Documents\ExPSRef.txt'))
result = {}
for row in reader:
key = row[0]
result[key] = row[2]
di = result
df = pd.read_csv('C:\Users\Troy\Documents\ExPS2.txt')
df['row[0]'].replace(di, inplace=True)
but it returned a KeyError: 'row[0]'.
I tried the following as well, even transposing in order to merge:
import pandas as pd
df1 = pd.read_csv('C:\Users\Troy\Documents\ExPS2.txt',).transpose()
df2 = pd.read_csv('C:\Users\Troy\Documents\ExPSRef.txt', delimiter=',', engine='python')
df3 = df1.merge(df2)
df4 = df3.set_index('Drug').drop(['Experiment', 'Array'], axis=1)
df4.index.name = 'Drug'
print df4
and this time received MergeError('No common columns to perform merge on').
Is there a simpler way to map my dictionary to the header that would work?
One of the things to keep in mind would be to making sure that both the keys corresponding to the mapper dictionary as well as the header which it is mapped to are of the same data type.
Here, one is a string and the other of integer type. So while reading itself, we'll let it not interpret dtype by setting it to str for the reference DF.
df1 = pd.read_csv('C:\Users\Troy\Documents\ExPS2.txt') # Original
df2 = pd.read_csv('C:\Users\Troy\Documents\ExPSRef.txt', dtype=str) # Reference
Convert the columns of the original DF to it's series representation and then replace the old value which were Experiment Nos. with the new Drug name retrieved from the reference DF.
df1.columns = df1.columns.to_series().replace(df2.set_index('Experiment').Drug)
df1
I used csv for the whole script. This fixes the header you wanted and saves into a new file. The new filename can be replaced with the same one if that's what you prefer. This program is written with python3.
import csv
with open('sample.txt', 'r') as ref:
reader = csv.reader(ref)
# skip header line
next(reader)
# make dictionary
di = dict([(row[0], row[2]) for row in reader])
data = []
with open('sample1.txt', 'r') as df:
reader = csv.reader(df)
header = next(reader)
new_header = [header[0]] + [di[i] for i in header if i in di]
data = list(reader)
# used to make new file, can also replace with the same file name
with open('new_sample1.txt', 'w') as df_new:
writer = csv.writer(df_new)
writer.writerow(new_header)
writer.writerows(data)

Categories

Resources