I need to re-order columns in a csv but I'll need to call each column from a dictionary.
EXAMPLE:
Sample input csv File:
$ cat file.csv
A,B,C,D,E
a1,b1,c1,d1,e1
a2,b2,c2,d2,e2
Code
import csv
with open('file.csv', 'r') as infile, open('reordered.csv', 'a') as outfile:
order_of_headers_should_be = ['A', 'C', 'D', 'E', 'B']
dictionary = {'A':'X1','B':'Y1','C':'U1','D':'T1','E':'K1'}
writer = csv.DictWriter(outfile)
# reorder the header first
writer.writeheader()
for row in csv.DictReader(infile):
# writes the reordered rows to the new file
writer.writerow(row)
The Output csv file needs to look like this:
$ cat reordered.csv
X1,U1,T1,K1,Y1
a1,c1,d1,e1,b1
a2,c2,d2,e2,b2
Trying to make a variable to call the dictionary
You can do this by permuting the keys when you are about to write the row like so:
for row in csv.DictReader(infile):
# writes the reordered rows to the new file
writer.writerow({dictionary[i]: row[i] for i in row})
Note the use of a dictionary comprehension.
Related
I have a CSV file and when I read it by importing the CSV library I get as the output:
['exam', 'id_student', 'grade']`
['maths', '573834', '7']`
['biology', '573834', '8']`
['biology', '578833', '4']
['english', '581775', '7']`
# goes on...
I need to edit it by creating a 4th column called 'Passed' with two possible values: True or False depending on whether the grade of the row is >= 7 (True) or not (False), and then count how many times each student passed an exam.
If it's not possible to edit the CSV file that way, I would need to just read the CSV file and then create a dictionary of lists with the following output:
dict = {'id_student':[573834, 578833, 581775], 'passed_count': [2,0,1]}
# goes on...
Thanks
Try using importing csv as pandas dataframe
import pandas as pd
data=pd.read_csv('data.csv')
And then use:
data['passed']=(data['grades']>=7).astype(bool)
And then save dataframe to csv as:
data.to_csv('final.csv',index=False)
It is totally possible to "edit" CSV.
Assuming you have a file students.csv with the following content:
exam,id_student,grade
maths,573834,7
biology,573834,8
biology,578833,4
english,581775,7
Iterate over input rows, augment the field list of each row with an additional item, and save it back to another CSV:
import csv
with open('students.csv', 'r', newline='') as source, open('result.csv', 'w', newline='') as result:
csvreader = csv.reader(source)
csvwriter = csv.writer(result)
# Deal with the header
header = next(csvreader)
header.append('Passed')
csvwriter.writerow(header)
# Process data rows
for row in csvreader:
row.append(str(int(row[2]) >= 7))
csvwriter.writerow(row)
Now result.csv has the content you need.
If you need to replace the original content, use os.remove() and os.rename() to do that:
import os
os.remove('students.csv')
os.rename('result.csv', 'students.csv')
As for counting, it might be an independent thing, you don't need to modify CSV for that:
import csv
from collections import defaultdict
with open('students.csv', 'r', newline='') as source:
csvreader = csv.reader(source)
next(csvreader) # Skip header
stats = defaultdict(int)
for row in csvreader:
if int(row[2]) >= 7:
stats[row[1]] += 1
print(stats)
You can include counting into the code above and have both pieces in one place. defaultdict (stats) has the same interface as dict if you need to access that.
my python script currently uses DictWriter to read a csv file, re-arrange the columns and write to a new output csv file. The input CSV file has the following columns:
A;B;C;D
which will be transferred to:
B,C;A;D
Additionally, I would like to rename one of the header. I already tried 2 approaches:
1.) create a new writer object and use the `writer' method. However, this simply puts all the given fieldnames in the very first columns:
newHeader = csv.writer(outfile)
newFN = ['B', 'C', 'Renamed', 'D']
newHeader.writerow(newFN)
the output is:
B,C,Renamed,D;;;
2.) Using the existing DictWriter object I define a new list of column headers and iterate over it:
newHeader = ['B', 'C', 'Renamed', 'D']
writer.writerow(dict((fn, fn) for fn in newHeader))
This time however, the renamed column header remains empty in the output CSV.
Your can use a dictionary to rename columns and csv.writer to write values from reordered OrderedDict objects:
from io import StringIO
from collections import OrderedDict
import csv
mystr = StringIO("""A;B;C;D
1;2;3;4
5;6;7;8""")
order = ['B', 'C', 'A', 'D']
# define renamed columns via dictionary
renamer = {'C': 'C2'}
# define column names after renaming
new_cols = [renamer.get(x, x) for x in order]
# replace mystr as open(r'file.csv', 'r')
with mystr as fin, open(r'C:\temp\out.csv', 'w', newline='') as fout:
# define reader / writer objects
reader = csv.DictReader(fin, delimiter=';')
writer = csv.writer(fout, delimiter=';')
# write new header
writer.writerow(new_cols)
# iterate reader and write row
for item in reader:
writer.writerow([item[k] for k in order])
Result:
B;C2;A;D
2;3;1;4
6;7;5;8
Input file:
$ cat test.csv
company,spread,cat1,cat2,cat3
A,XYZ,32,67,0
B,XYZ,43,0,432
C,XYZ,32,76,32
D,XYZ,454,87,43
E,XYZ,0,0,65
F,XYZ,0,0,7
Expected CSV output (Sum columns cat1, cat2 and cat3 and append the sum.):
$ cat test.csv
company,spread,cat1,cat2,cat3
A,XYZ,32,67,0
B,XYZ,43,0,432
C,XYZ,32,76,32
D,XYZ,454,87,43
E,XYZ,0,0,65
F,XYZ,0,0,7
,,561,230,579
Code:
import csv
all_keys = ['cat1', 'cat2', 'cat3']
default_values = {i: 0 for i in all_keys}
def read_csv():
with open('test.csv', 'r') as f:
reader = csv.DictReader(f)
yield from reader
for row in read_csv():
for i in all_keys:
default_values[i] += int(row[i])
with open('test.csv', 'a') as w:
writer = csv.DictWriter(w, fieldnames=all_keys)
writer.writerow(default_values)
Actual Output:
$ cat test.csv
company,spread,cat1,cat2,cat3
A,XYZ,32,67,0
B,XYZ,43,0,432
C,XYZ,32,76,32
D,XYZ,454,87,43
E,XYZ,0,0,65
F,XYZ,0,0,7
561,230,579
Question:
The csv.DictWriter is not appending row with correct column alignment. I understand that I have 5 columns but I am providing values for only 3 columns. But I thought as this is DictWriter, it will append values to only a matching column header. If I open my Actual Output CSV, it is quite visual that columns are not aligned:
You should include the column names for the first two in fieldnames:
with open('test.csv', 'a') as w:
writer = csv.DictWriter(w, fieldnames=['company', 'spread']+all_keys)
writer.writerow(default_values)
Blank values will be written to the first two columns if the keys are not available in the dictionary.
You can declare you writter like that:
with open('test.csv', 'a') as w:
writer = csv.DictWriter(w, fieldnames=all_keys, restval=' ')
writer.writerow(default_values)
So you don't have to specify all the missing keys : for all the missing keys, restval char will fill with the value you chose. https://docs.python.org/3/library/csv.html#csv.DictWriter
I have 10 lists of data in different csv file. each file has one column of information . I want to open each csv file one by one and write it to a file called "file.csv". in a way that data from second file should be saved under the data from the first file.
example:
list1=[['a'], ['b'], ['c'], ['d']]
list2=[['e'], ['f'], ['g']]
file.csv=
a
b
c
d
e
f
g
I have following code, I give the index to the csv_list[1] and it can transfer the data to a file.csv. but when I change the index to csv_list[2] to append the result of new list to the file it deletes previous information and adds information form new list.
How can I add them to the same file with following code.
import csv
import os
csv_list= os.listdir("folder1")
pathname = os.path.join("folder1", csv_list[1])
with open(pathname, encoding='utf8') as f:
reader = csv.reader(f)
data = list(reader)
print (data)
.
with open("file.csv","w") as resultFile:
wr = csv.writer(resultFile, dialect='excel')
wr.writerows(data)
You could do something along the lines of:
if os.path.isfile("file.csv"):
write_or_append = "a" # append if csv file already exists
else:
write_or_append = "w" # write otherwise
with open("file.csv", write_or_append) as resultFile:
wr = csv.writer(resultFile, dialect="excel")
wr.writerows(data)
I have a bunch of csv files with the same columns but in different order. We are trying to upload them with SQL*Plus but we need the columns with a fixed column arrange.
Example
required order: A B C D E F
csv file: A C D E B (sometimes a column is not in the csv because it is not available)
is it achievable with python? we are using Access+Macros to do it... but it is too time consuming
PS. Sorry if anyone get upset for my English skills.
You can use the csv module to read, reorder, and then and write your file.
Sample File:
$ cat file.csv
A,B,C,D,E
a1,b1,c1,d1,e1
a2,b2,c2,d2,e2
Code
import csv
with open('file.csv', 'r') as infile, open('reordered.csv', 'a') as outfile:
# output dict needs a list for new column ordering
fieldnames = ['A', 'C', 'D', 'E', 'B']
writer = csv.DictWriter(outfile, fieldnames=fieldnames)
# reorder the header first
writer.writeheader()
for row in csv.DictReader(infile):
# writes the reordered rows to the new file
writer.writerow(row)
output
$ cat reordered.csv
A,C,D,E,B
a1,c1,d1,e1,b1
a2,c2,d2,e2,b2
So one way to tackle this problem is to use pandas library which can be easily install using pip. Basically, you can download csv file to pandas dataframe then re-order the column and save it back to csv file. For example, if your sample.csv looks like below:
A,C,B,E,D
a1,b1,c1,d1,e1
a2,b2,c2,d2,e2
Here is a snippet to solve the problem.
import pandas as pd
df = pd.read_csv('/path/to/sample.csv')
df_reorder = df[['A', 'B', 'C', 'D', 'E']] # rearrange column here
df_reorder.to_csv('/path/to/sample_reorder.csv', index=False)
csv_in = open("<filename>.csv", "r")
csv_out = open("<filename>.csv", "w")
for line in csv_in:
field_list = line.split(',') # split the line at commas
output_line = ','.join(field_list[0], # rejoin with commas, new order
field_list[2],
field_list[3],
field_list[4],
field_list[1]
)
csv_out.write(output_line)
csv_in.close()
csv_out.close()
You can use something similar to this to change the order, replacing ';' with ',' in your case.
Because you said you needed to do multiple .csv files, you could use the glob module for a list of your files
for file_name in glob.glob('<Insert-your-file-filter-here>*.csv'):
#Do the work here
The csv module allows you to read csv files with their values associated to their column names. This in turn allows you to arbitrarily rearrange columns, without having to explicitly permute lists.
for row in csv.DictReader(open("foo.csv")):
print row["b"], row["a"]
2 1
22 21
Given the file foo.csv:
a,b,d,e,f
1,2,3,4,5
21,22,23,24,25