Add a new column to a csv file in python - python

I am trying to add a column to a csv file that combines strings from two other columns. Whenever I try this I either get an output csv with only the new column or an output with all of the original data and not the new column.
This is what I have so far:
with open(filename) as csvin:
readfile = csv.reader(csvin, delimiter=',')
with open(output, 'w') as csvout:
writefile = csv.writer(csvout, delimiter=',', lineterminator='\n')
for row in readfile:
result = [str(row[10]) + ' ' + str(row[11])]
writefile.writerow(result)
Any help would be appreciated.

No input to test, but try this. Your current approach doesn't include the existing data for each row that already exists in your input data. extend will take the list that represents each row and then add another item to that list... equivalent to adding a column.
import csv
with open(filename) as csvin:
readfile = csv.reader(csvin, delimiter=',')
with open(output, 'w') as csvout:
writefile = csv.writer(csvout, delimiter=',', lineterminator='\n')
for row in readfile:
row.extend([str(row[10]) + ' ' + str(row[11])])
writefile.writerow(row)

I assume that glayne wants to combine column 10 and 11 into one.
In my approach, I concentrate on how to transform a single row first:
def transform_row(input_row):
output_row = input_row[:]
output_row[10:12] = [' '.join(output_row[10:12])]
return output_row
Once tested to make sure that it works, I can move on to replace all rows:
with open('data.csv') as inf, open('out.csv', 'wb') as outf:
reader = csv.reader(inf)
writer = csv.writer(outf)
writer.writerows(transform_row(row) for row in reader)
Note that I use the writerows() method to write multiple rows in one statement.

Below code snippet combines strings in column 10 and column 11 in each row and add that to the end of the each row
import csv
input = 'test.csv'
output= 'output.csv'
with open(input, 'rb') as csvin:
readfile = csv.reader(csvin, delimiter=',')
with open(output, 'wb') as csvout:
writefile = csv.writer(csvout, delimiter=',', lineterminator='\n')
for row in readfile:
result = row + [row[10]+row[11]]
writefile.writerow(result)

Related

Compare two CSV files and write difference in the same file as an extra column in python

Hey intelligent community,
I need a little bit of help because i think i don't see the the wood in the trees.
i have to CSV files that look like this:
Name,Number
AAC;2.2.3
AAF;2.4.4
ZCX;3.5.2
Name,Number
AAC;2.2.3
AAF;2.4.4
ZCX;3.5.5
I would like to compare both files and than write any changes like this:
Name,Number,Changes
AAC;2.2.3
AAF;2.4.4
ZCX;5.5.5;change: 3.5.2
So on every line when there is a difference in the number, i want to add this as a new column at the end of the line.
The Files are formated the same but sometimes have a new row so thats why i think i have to map the keys.
I come this far but now iam lost in my thoughts:
Python 3.10.9
import csv
Reading the first csv and set mapping
with open('test1.csv', 'r') as csvfile:
reader= csv.reader(csvfile)
rows = list(reader)
file1_dict = {row[1]: row[0] for row in rows}
Reading the second csv and set mapping
with open('test2.csv', 'r') as csvfile:
reader= csv.reader(csvfile)
rows = list(reader)
file2_dict = {row[1]: row[0] for row in rows}
comparing the keys and find the diff
for k in test1_dict:
if test1_dict[k] != test2:dict[k]
test1_dict[k] = test2_dict[k]
for row in rows:
if row[1] == k:
row.append(test2_dict[k])
#write the csv (not sure how to add the word "change:")
with open('test1.csv', 'w', newline ='') as csvfile:
writer = csv.writer(csvfile)
writer.writerows(rows)
If i try this, i don't get a new column, it just "updates" the csv file with the same columns.
For example this code gives me the diff row but i'am not able to just add it to existing file and row.
with open('test1.csv') as fin1:
with open('test2.csv') as fin2:
read1 = csv.reader(fin1)
read2 = csv.reader(fin2)
diff_rows = (row1 for row1, row2 in zip(read1, read2) if row1 != row2)
with open('test3.csv', 'w') as fout:
writer = csv.writer(fout)
writer.writerows(diff_rows)
Does someone have any tips or help for my problem? I read many answers on here but can't figure it out.
Thanks alot.
#bigkeefer
Thanks for your answer, i tried to change it for the delimiter ; but it gives an "list index out of range error".
with open('test3.csv', 'r') as file1:
reader = csv.reader(file1, delimiter=';')
rows = list(reader)[1:]
file1_dict = {row[0]: row[1] for row in rows}
with open('test4.csv', 'r') as file2:
reader = csv.reader(file2, delimiter=';')
rows = list(reader)[1:]
file2_dict = {row[0]: row[1] for row in rows}
new_file = ["Name;Number;Changes\n"]
with open('output.csv', 'w') as nf:
for key, value in file1_dict.items():
if value != file2_dict[key]:
new_file.append(f"{key};{file2_dict[key]};change: {value}\n")
else:
new_file.append(f"{key};{value}\n")
nf.writelines(new_file)
You will need to adapt this to overwrite your first file etcetera, as you mentioned above, but I've left it like this for your testing purposes. Hopefully this will help you in some way.
I've assumed you've actually got the headers above in each file. If not, remove the slicing on the list creations, and change the new_file variable assignment to an empty list ([]).
with open('f1.csv', 'r') as file1:
reader = csv.reader(file1, delimiter=";")
rows = list(reader)[1:]
file1_dict = {row[0]: row[1] for row in rows if row}
with open('f2.csv', 'r') as file2:
reader = csv.reader(file2, delimiter=";")
rows = list(reader)[1:]
file2_dict = {row[0]: row[1] for row in rows if row}
new_file = ["Name,Number,Changes\n"]
for key, value in file1_dict.items():
if value != file2_dict[key]:
new_file.append(f"{key};{file2_dict[key]};change: {value}\n")
else:
new_file.append(f"{key};{value}\n")
with open('new.csv', 'w') as nf:
nf.writelines(new_file)

Python CSV, Combining multiple columns into one column using CSV

I've been trying to figure out a way to combine all the columns in a csv I have into one columns.
import csv
with open('test.csv') as f:
reader = csv.reader(f)
with open('output.csv', 'w') as g:
writer = csv.writer(g)
for row in reader:
new_row = [' '.join([row[0], row[1]])] + row[2:]
writer.writerow(new_row)
This worked to combine the first two columns, but I've been having trouble trying to loop it and get the rest of the columns into just one.
You should just pass row to .join because it's an array.
import csv
with open('test.csv') as f:
reader = csv.reader(f)
with open('output.csv', 'w') as g:
writer = csv.writer(g)
for row in reader:
new_row = [' '.join(row)] # <---- CHANGED HERE
writer.writerow(new_row)

Create subset of large CSV file and write to new CSV file

I would like to create a subset of a large CSV file using the rows that have the 4th column ass "DOT" and output to a new file.
This is the code I currently have:
import csv
outfile = open('DOT.csv','w')
with open('Service_Requests_2015_-_Present.csv', newline='', encoding='utf-8') as f:
reader = csv.reader(f)
for row in reader:
if row[3] == "DOT":
outfile.write(row)
outfile.close()
The error is:
outfile.write(row)
TypeError: must be str, not list
How can I manipulate row so that I will be able to just straight up do write(row), if not, what is the easiest way?
You can combine your two open statements, as the with statement accepts multiple arguments, like this:
import csv
infile = 'Service_Requests_2015_-_Present.csv'
outfile = 'DOT.csv'
with open(infile, encoding='utf-8') as f, open(outfile, 'w') as o:
reader = csv.reader(f)
writer = csv.writer(o, delimiter=',') # adjust as necessary
for row in reader:
if row[3] == "DOT":
writer.writerow(row)
# no need for close statements
print('Done')
Make your outfile a csv.writer and use writerow instead of write.
outcsv = csv.writer(outfile, ...other_options...)
...
outcsv.writerow(row)
That is how I would do it... OR
outfile.write(",".join(row)) # comma delimited here...
In Above code you are trying to write list with file object , we can not write list that give error "TypeError: must be str, not list" you can convert list in string format then you able to write row in file. outfile.write(str(row))
or
import csv
def csv_writer(input_path,out_path):
with open(out_path, 'ab') as outfile:
writer = csv.writer(outfile)
with open(input_path, newline='', encoding='utf-8') as f:
reader = csv.reader(f)
for row in reader:
if row[3] == "DOT":
writer.writerow(row)
outfile.close()
csv_writer(input_path,out_path)
[This code for Python 3 version. In Python 2.7, the open function does not take a newline argument, hence the TypeError.]

How to write a matrix (read from one file) to another csv file in python with a specific output format

I have a csv file which has data in matrix format a sample of which is shown below:
index,col1,col2,col3,col4,col5,col6
col1_1,1,0.005744233,0.013118052,-0.003772589,0.004284689
col2_1,0.005744233,1,-0.013269414,-0.007132092,0.013950261
col3_1,0.013118052,-0.013269414,1,-0.014029249,-0.00199437
col4_1,-0.003772589,-0.007132092,-0.014029249,1,0.022569309
col5_1,0.004284689,0.013950261,-0.00199437,0.022569309,1
No I want to read the data in this file and write it to another csv file but the format I need is this:
col1_1,value,col1
col1_1,value,col2
col1_1,value,col3
.
.
.
col2_1,value,col1
col2_1,value,col2
.
.
.
So basically 1st element will be the column names in 1st column followed by value for that column and element in 1st row.
I wrote this code but it just writes in the wrong format:
reader = csv.reader(open(IN_FILE, "r"), delimiter=',')
writer = csv.writer(open(OUT_FILE, "w"), delimiter=',')
with open(IN_FILE) as infile:
with open(OUT_FILE, "w") as outfile:
reader = csv.reader(infile, delimiter=",")
writer = csv.writer(outfile, delimiter=",")
writer.writerow(next(reader))
for line in reader:
writer.writerow([line[0],line[1]])
How can I do this in python?
Try this:
reader = csv.reader(open(IN_FILE, "r"), delimiter=',')
writer = csv.writer(open(OUT_FILE, "w"), delimiter=',')
with open(IN_FILE) as infile:
with open(OUT_FILE, "w") as outfile:
reader = csv.reader(infile, delimiter=",")
writer = csv.writer(outfile, delimiter=",")
first_row = None
for line in reader:
if first_row is None:
first_row = line
else:
for index, col in enumerate(first_row[1:]):
writer.writerow([line[0],line[index + 1],col])
This seems to work. Although your test data looked to be missing a 'col6'.
The problem with your initial code was that it wasn't looping through each column the rows.
If your file includes the column and row indices like I assume, this should do it.
old_data = reader
new_data = []
for row in xrange(0,len(old_data)):
for col in xrange(0,len(row)):
if (not row == 0 and not col == 0):
new_data.append([old_data[row][0],old_data[row][col],old_data[0][col]])
writer.writerows(new_data)
csv_file.close()

Referring to CSV header string to adjust column format

I'm somewhat new to python and csv processing, but I couldn't find any solutions for what I'm looking for. When I open up a specific CSV file in excel, I have a column called "rate" that is in percent. I'm dividing all the values in this column by 100. As of now I'm referring to this column by calling row[6] = percentToFloat(row[6]). My question is if its possible to address the row by the header name rather than just the column number.
with open(input) as inFile:
reader = csv.reader(inFile)
reader.next()
with open(output, 'w') as outFile:
writer = csv.writer(outFile)
for row in reader:
if len(row)>1: #skips empty rows
row[6] = percentToFloat(row[6])
writer.writerow(row)
You could use data frames from Pandas
import pandas as pd
import numpy as np
df = pd.read_csv('data.csv', header=True)
print(df)
print(df.rate)
print(df.rate/100.0)
Use csv.DictReader :
reader = csv.DictReader(inFile)
Now you can use row['column_name'] instead of row[6] in your code.
Use csv.DictReader instead of csv.reader.
with open(input) as inFile:
reader = csv.DictReader(inFile)
rate_index = reader.fieldnames.index('rate')
reader.next()
with open(output, 'w') as outFile:
writer = csv.DictWriter(outFile, fieldnames=reader.fieldnames)
for row in reader:
if len(row)>1: #skips empty rows
row[rate_index] = percentToFloat(row[6])
writer.writerow(row)
Updated.

Categories

Resources