Referring to CSV header string to adjust column format - python

I'm somewhat new to python and csv processing, but I couldn't find any solutions for what I'm looking for. When I open up a specific CSV file in excel, I have a column called "rate" that is in percent. I'm dividing all the values in this column by 100. As of now I'm referring to this column by calling row[6] = percentToFloat(row[6]). My question is if its possible to address the row by the header name rather than just the column number.
with open(input) as inFile:
reader = csv.reader(inFile)
reader.next()
with open(output, 'w') as outFile:
writer = csv.writer(outFile)
for row in reader:
if len(row)>1: #skips empty rows
row[6] = percentToFloat(row[6])
writer.writerow(row)

You could use data frames from Pandas
import pandas as pd
import numpy as np
df = pd.read_csv('data.csv', header=True)
print(df)
print(df.rate)
print(df.rate/100.0)

Use csv.DictReader :
reader = csv.DictReader(inFile)
Now you can use row['column_name'] instead of row[6] in your code.

Use csv.DictReader instead of csv.reader.

with open(input) as inFile:
reader = csv.DictReader(inFile)
rate_index = reader.fieldnames.index('rate')
reader.next()
with open(output, 'w') as outFile:
writer = csv.DictWriter(outFile, fieldnames=reader.fieldnames)
for row in reader:
if len(row)>1: #skips empty rows
row[rate_index] = percentToFloat(row[6])
writer.writerow(row)
Updated.

Related

Compare two CSV files and write difference in the same file as an extra column in python

Hey intelligent community,
I need a little bit of help because i think i don't see the the wood in the trees.
i have to CSV files that look like this:
Name,Number
AAC;2.2.3
AAF;2.4.4
ZCX;3.5.2
Name,Number
AAC;2.2.3
AAF;2.4.4
ZCX;3.5.5
I would like to compare both files and than write any changes like this:
Name,Number,Changes
AAC;2.2.3
AAF;2.4.4
ZCX;5.5.5;change: 3.5.2
So on every line when there is a difference in the number, i want to add this as a new column at the end of the line.
The Files are formated the same but sometimes have a new row so thats why i think i have to map the keys.
I come this far but now iam lost in my thoughts:
Python 3.10.9
import csv
Reading the first csv and set mapping
with open('test1.csv', 'r') as csvfile:
reader= csv.reader(csvfile)
rows = list(reader)
file1_dict = {row[1]: row[0] for row in rows}
Reading the second csv and set mapping
with open('test2.csv', 'r') as csvfile:
reader= csv.reader(csvfile)
rows = list(reader)
file2_dict = {row[1]: row[0] for row in rows}
comparing the keys and find the diff
for k in test1_dict:
if test1_dict[k] != test2:dict[k]
test1_dict[k] = test2_dict[k]
for row in rows:
if row[1] == k:
row.append(test2_dict[k])
#write the csv (not sure how to add the word "change:")
with open('test1.csv', 'w', newline ='') as csvfile:
writer = csv.writer(csvfile)
writer.writerows(rows)
If i try this, i don't get a new column, it just "updates" the csv file with the same columns.
For example this code gives me the diff row but i'am not able to just add it to existing file and row.
with open('test1.csv') as fin1:
with open('test2.csv') as fin2:
read1 = csv.reader(fin1)
read2 = csv.reader(fin2)
diff_rows = (row1 for row1, row2 in zip(read1, read2) if row1 != row2)
with open('test3.csv', 'w') as fout:
writer = csv.writer(fout)
writer.writerows(diff_rows)
Does someone have any tips or help for my problem? I read many answers on here but can't figure it out.
Thanks alot.
#bigkeefer
Thanks for your answer, i tried to change it for the delimiter ; but it gives an "list index out of range error".
with open('test3.csv', 'r') as file1:
reader = csv.reader(file1, delimiter=';')
rows = list(reader)[1:]
file1_dict = {row[0]: row[1] for row in rows}
with open('test4.csv', 'r') as file2:
reader = csv.reader(file2, delimiter=';')
rows = list(reader)[1:]
file2_dict = {row[0]: row[1] for row in rows}
new_file = ["Name;Number;Changes\n"]
with open('output.csv', 'w') as nf:
for key, value in file1_dict.items():
if value != file2_dict[key]:
new_file.append(f"{key};{file2_dict[key]};change: {value}\n")
else:
new_file.append(f"{key};{value}\n")
nf.writelines(new_file)
You will need to adapt this to overwrite your first file etcetera, as you mentioned above, but I've left it like this for your testing purposes. Hopefully this will help you in some way.
I've assumed you've actually got the headers above in each file. If not, remove the slicing on the list creations, and change the new_file variable assignment to an empty list ([]).
with open('f1.csv', 'r') as file1:
reader = csv.reader(file1, delimiter=";")
rows = list(reader)[1:]
file1_dict = {row[0]: row[1] for row in rows if row}
with open('f2.csv', 'r') as file2:
reader = csv.reader(file2, delimiter=";")
rows = list(reader)[1:]
file2_dict = {row[0]: row[1] for row in rows if row}
new_file = ["Name,Number,Changes\n"]
for key, value in file1_dict.items():
if value != file2_dict[key]:
new_file.append(f"{key};{file2_dict[key]};change: {value}\n")
else:
new_file.append(f"{key};{value}\n")
with open('new.csv', 'w') as nf:
nf.writelines(new_file)

Python CSV, Combining multiple columns into one column using CSV

I've been trying to figure out a way to combine all the columns in a csv I have into one columns.
import csv
with open('test.csv') as f:
reader = csv.reader(f)
with open('output.csv', 'w') as g:
writer = csv.writer(g)
for row in reader:
new_row = [' '.join([row[0], row[1]])] + row[2:]
writer.writerow(new_row)
This worked to combine the first two columns, but I've been having trouble trying to loop it and get the rest of the columns into just one.
You should just pass row to .join because it's an array.
import csv
with open('test.csv') as f:
reader = csv.reader(f)
with open('output.csv', 'w') as g:
writer = csv.writer(g)
for row in reader:
new_row = [' '.join(row)] # <---- CHANGED HERE
writer.writerow(new_row)

Read and export in CSV

i have a csv file with data separated by " ; ".
There is no problem reading the file but i want to export the data that was in the first csv no ANOTHER csv and add 1 new column to the new csv file.
import csv
with open('1.csv','r') as csvinput:
with open('agg.csv', 'w') as csvoutput:
writer = csv.writer(csvoutput)
reader = csv.reader(csvinput,lineterminator=';')
all = []
row = next(reader)
row.append('Movimiento')
all.append(row)
for row in reader:
row.append(row[0])
all.append(row)
writer.writerows(all)
why this happend?
You would have to create a csv writer object, and use that object to write in it the required csv file. If you can tell me a little more about the column proble, I will be able to help better with that.
csvwriter = csv.writer("1.csv", delimiter=',')
if you can use pandas library
try the following code.
import pandas as pd
data = pd.read_csv('<file_name>', sep =';')
#add new column
data['new_column_name']=[list_of_values]
data.to_csv('<to_filename>', index=False)
Try this:
with open('1.csv','r') as csvinput:
with open('agg.csv', 'w') as csvoutput:
writer = csv.writer(csvoutput, delimiter=',')
reader = csv.reader(csvinput,lineterminator=';')
all = []
row = next(reader)
row.append('Movimiento')
all.append(row)
for row in reader:
row.append(row[0])
all.append(row)
writer.writerows(all)
The data has ; as separator, so you probably want to specify that instead of lineterminator:
writer = csv.writer(csvoutput)
reader = csv.reader(csvinput, delimiter=';')

How to write a matrix (read from one file) to another csv file in python with a specific output format

I have a csv file which has data in matrix format a sample of which is shown below:
index,col1,col2,col3,col4,col5,col6
col1_1,1,0.005744233,0.013118052,-0.003772589,0.004284689
col2_1,0.005744233,1,-0.013269414,-0.007132092,0.013950261
col3_1,0.013118052,-0.013269414,1,-0.014029249,-0.00199437
col4_1,-0.003772589,-0.007132092,-0.014029249,1,0.022569309
col5_1,0.004284689,0.013950261,-0.00199437,0.022569309,1
No I want to read the data in this file and write it to another csv file but the format I need is this:
col1_1,value,col1
col1_1,value,col2
col1_1,value,col3
.
.
.
col2_1,value,col1
col2_1,value,col2
.
.
.
So basically 1st element will be the column names in 1st column followed by value for that column and element in 1st row.
I wrote this code but it just writes in the wrong format:
reader = csv.reader(open(IN_FILE, "r"), delimiter=',')
writer = csv.writer(open(OUT_FILE, "w"), delimiter=',')
with open(IN_FILE) as infile:
with open(OUT_FILE, "w") as outfile:
reader = csv.reader(infile, delimiter=",")
writer = csv.writer(outfile, delimiter=",")
writer.writerow(next(reader))
for line in reader:
writer.writerow([line[0],line[1]])
How can I do this in python?
Try this:
reader = csv.reader(open(IN_FILE, "r"), delimiter=',')
writer = csv.writer(open(OUT_FILE, "w"), delimiter=',')
with open(IN_FILE) as infile:
with open(OUT_FILE, "w") as outfile:
reader = csv.reader(infile, delimiter=",")
writer = csv.writer(outfile, delimiter=",")
first_row = None
for line in reader:
if first_row is None:
first_row = line
else:
for index, col in enumerate(first_row[1:]):
writer.writerow([line[0],line[index + 1],col])
This seems to work. Although your test data looked to be missing a 'col6'.
The problem with your initial code was that it wasn't looping through each column the rows.
If your file includes the column and row indices like I assume, this should do it.
old_data = reader
new_data = []
for row in xrange(0,len(old_data)):
for col in xrange(0,len(row)):
if (not row == 0 and not col == 0):
new_data.append([old_data[row][0],old_data[row][col],old_data[0][col]])
writer.writerows(new_data)
csv_file.close()

How to read a column without header from csv and save the output in a txt file using Python?

I have a file "TAB.csv" with many columns. I would like to choose one column without header (index of that column is 3) from CSV file. Then create a new text file "NEW.txt" and write there that column (without header).
Below code reads that column but with the header. How to omit the header and save that column in a new text file?
import csv
with open('TAB.csv','rb') as f:
reader = csv.reader(f)
for row in reader:
print row[3]
This is the solution #tmrlvi was talking: it skips the first row (header) via next function:
import csv
with open('TAB.csv','rb') as input_file:
reader = csv.reader(input_file)
output_file = open('output.csv','w')
next(reader, None)
for row in reader:
row_str = row[3]
output_file.write(row_str + '\n')
output_file.close()
Try this:
import csv
with open('TAB.csv', 'rb') as f, open('out.txt', 'wb') as g:
reader = csv.reader(f)
next(reader) # skip header
g.writelines(row[3] + '\n' for row in reader)
enumerate is a nice function that returns a tuple. It enables to to view the index while running over an iterator.
import csv
with open('NEW.txt','wb') as outfile:
with open('TAB.csv','rb') as f:
reader = csv.reader(f)
for index, row in enumerate(reader):
if index > 0:
outfile.write(row[3])
outfile.write("\n")
Another solution would be to read one line from the file (in order to skip the header).
It's an old question but I would like to add my answer about Pandas library, I would like to say. It's better to use Pandas library for such tasks instead of writing your own code. And the simple code with Pandas will be like :
import pandas as pd
reader = pd.read_csv('TAB.csv', header = None)

Categories

Resources