Combine two rows into one in a csv file with Python - python

I am trying to combine multiple rows in a csv file together. I could easily do it in Excel but I want to do this for hundreds of files so I need it to be as a code. I have tried to store rows in arrays but it doesn't seem to work. I am using Python to do it.
So lets say I have a csv file;
1,2,3
4,5,6
7,8,9
All I want to do is to have a csv file as this;
1,2,3,4,5,6,7,8,9
The code I have tried is this;
fin = open("C:\\1.csv", 'r+')
fout = open("C:\\2.csv",'w')
for line in fin.xreadlines():
new = line.replace(',', ' ', 1)
fout.write (new)
fin.close()
fout.close()
Could you please help?

You should be using the csv module for this as splitting CSV manually on commas is very error-prone (single columns can contain strings with commas, but you would incorrectly end up splitting this into multiple columns). The CSV module uses lists of values to represent single rows.
import csv
def return_contents(file_name):
with open(file_name) as infile:
reader = csv.reader(infile)
return list(reader)
data1 = return_contents('csv1.csv')
data2 = return_contents('csv2.csv')
print(data1)
print(data2)
combined = []
for row in data1:
combined.extend(row)
for row in data2:
combined.extend(row)
with open('csv_out.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile)
writer.writerow(combined)
That code gives you the basis of the approach but it would be ugly to extend this for hundreds of files. Instead, you probably want os.listdir to pull all the files in a single directory, one by one, and add them to your output. This is the reason that I packed the reading code into the return_contents function; we can repeat the same process millions of times on different files with only one set of code to do the actual reading. Something like this:
import csv
import os
def return_contents(file_name):
with open(file_name) as infile:
reader = csv.reader(infile)
return list(reader)
all_files = os.listdir('my_csvs')
combined_output = []
for file in all_files:
data = return_contents('my_csvs/{}'.format(file))
for row in data:
combined_output.extend(row)
with open('csv_out.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile)
writer.writerow(combined_output)

If you are specially dealing with csv file format. I recommend you to use csv package for the file operations. If you also use with...as statement, you don't need to worry about closing the file etc. You just need to define the PATH then program will iterate all .csv files
Here is what you can do:
PATH = "your folder path"
def order_list():
data_list = []
for filename in os.listdir(PATH):
if filename.endswith(".csv"):
with open("data.csv") as csvfile:
read_csv = csv.reader(csvfile, delimiter=',', quoting=csv.QUOTE_NONNUMERIC)
for row in read_csv:
data_list.extend(row)
print(data_list)
if __name__ == '__main__':
order_list()

Store your data in pandas df
import pandas as pd
df = pd.read_csv('file.csv')
Store the modified dataframe into new one
df_2 = df.groupby('Column_Name').agg(lambda x: ' '.join(x)).reset_index() ## Write Name of your column
Write the df to new csv
df2.to_csv("file_modified.csv")

You could do it also like this:
fIn = open("test.csv", "r")
fOut = open("output.csv", "w")
fOut.write(",".join([line for line in fIn]).replace("\n",""))
fIn.close()
fOut.close()
I've you want now to run it on multiple file you can run it as script with arguments:
import sys
fIn = open(sys.argv[1], "r")
fOut = open(sys.argv[2], "w")
fOut.write(",".join([line for line in fIn]).replace("\n",""))
fIn.close()
fOut.close()
So now expect you use some Linux System and the script is called csvOnliner.py you could call it with:
for i in *.csv; do python csvOnliner.py $i changed_$i; done
With windows you could do it in a way like this:
FOR %i IN (*.csv) DO csvOnliner.py %i changed_%i

Related

How to add data to existing rows of a CSV file? [duplicate]

This question already has answers here:
How to add a string to each line in a file?
(3 answers)
Closed 9 months ago.
I have already an existing CSV file that I am accessing and I want to append the data to the first row, but it writes data at the end of the file.
What I am getting:
But I want the data to append like this:
Code I have done so far:
import CSV
with open('explanation.csv' , 'a', newline="") as file:
myFile = csv.writer(file)
myFile.writerow(["1"])
What you're actually wanting to do is replace data in an existing CSV file with new values, however in order to update a CSV file you must rewrite the whole thing.
One way to do that is by reading the whole thing into memory, updating the data, and then use it to overwrite the existing file. Alternatively you could process the file a row-at-a-time and store the results in a temporary file, then replace the original with the temporary file when finished updating them all.
The code to do the latter is shown below:
import csv
import os
from pathlib import Path
from tempfile import NamedTemporaryFile
filepath = Path('explanation.csv') # CSV file to update.
with open(filepath, 'r', newline='') as csv_file, \
NamedTemporaryFile('w', newline='', dir=filepath.parent, delete=False) as tmp_file:
reader = csv.reader(csv_file)
writer = csv.writer(tmp_file)
# Replace value in the first column of the first 5 rows.
for data_value in range(1, 6):
row = next(reader)
row[0] = data_value
writer.writerow(row)
writer.writerows(reader) # Copy remaining rows of original file.
# Replace original file with updated version.
os.replace(tmp_file.name, filepath)
print('CSV file updated')
You could read in the entire file, append your rows in memory, and then write the entire file:
def append(fname, data):
with open(fname) as f:
reader = csv.reader(f)
data = list(reader) + list(data)
with open(fname, 'w') as f:
writer = csv.writer(f)
writer.writerows(data)

How to read two csv files and to concatenate them?

First, I need to import two csv files.
Then I need to remove header in both files.
After that, I would like to take one column from both files and to concatenate them.
I have tried to open files, but I'm not sure how to concatenate.
Can anyone give advice how to proceed?
import csv
x = []
chamber_temperature = []
with open(r"C:\Users\mm02058\Documents\test.txt", 'r') as file:
reader = csv.reader(file, delimiter='\t')
with open(r"C:\Users\mm02058\Documents\test.txt", 'r') as file1:
reader_1 = csv.reader(file1, delimiter='\t')
for row in (reader):
x.append(row[0])
chamber_temperature.append(row[1])
for row in (reader_1):
x.append(row[0])
chamber_temperature.append(row[1])
The immediate bug is that you are trying to read from reader1 outside the with block, which means Python has already closed the file.
But the nesting of the with calls is just confusing and misleading anyway. Here is a generalization which should allow you to extend with more new files easily.
import csv
x = []
chamber_temperature = []
for filename in (r"C:\Users\mm02058\Documents\test.txt",
r"C:\Users\mm02058\Documents\test.txt"):
with open(filename, 'r') as file:
for idx, row in enumerate(csv.reader(file, delimiter='\t')):
if idx == 0:
continue # skip header line
x.append(row[0])
chamber_temperature.append(row[1])
Because of how you have structured your code, the context manager for file1 will close the file before it has been used by the for loop.
Use a single context manager to open both files e.g
with open('file1', 'r') as file1, open('file2', 'r') as file2:
# Your code in here
for row in (reader_1):
x.append(row[0])
chamber_temperature.append(row[1])
You are getting this error because you have placed this codeblock outside the 2nd loop and now the file has been closed.
You can either open both the files at once with this
with open('file1', 'r') as file1, open('file2', 'r') as file2:
# Your code in here
or you can use pandas for opening and concatenating csv files
import pandas as pd
data = pd.read_csv(r'file.csv', header=None)
and then refer here Concatenate dataframes

Merging CSV files with a single column into one CSV file with 14 columns

I currently have 14 CSV files, each containing one column of data for a day (14 because it goes back 2 weeks)
What I want to do is make one CSV file containing the data from all 14 of these CSVs
eg. if each CSV contains this:
1
2
3
4
I would want the outcome to be a csv file with
1,1,1,1,1,1,1,1,1,1,1,1,1,1,
2,2,2,2,2,2,2,2,2,2,2,2,2,2,
3,3,3,3,3,3,3,3,3,3,3,3,3,3,
4,4,4,4,4,4,4,4,4,4,4,4,4,4,
( the actual CSVs have 288 Rows)
I'm currently using some code I found in another question, it worked fine for 2 or 3 CSVs but when I added more it didn't do it for more than the first 3 and the code now looks extremely messy.
Apologies for the large chunk of code, but this is what I have so far.
def csvappend():
with open('C:\dev\OTQtxt\\result1.csv', 'rb') as csv1:
with open('C:\dev\OTQtxt\\result2.csv', 'rb') as csv2:
with open('C:\dev\OTQtxt\\result3.csv', 'rb') as csv3:
with open('C:\dev\OTQtxt\\result4.csv', 'rb') as csv4:
with open('C:\dev\OTQtxt\\result5.csv', 'rb') as csv5:
with open('C:\dev\OTQtxt\\result6.csv', 'rb') as csv6:
with open('C:\dev\OTQtxt\\result7.csv', 'rb') as csv7:
with open('C:\dev\OTQtxt\\result8.csv', 'rb') as csv8:
with open('C:\dev\OTQtxt\\result9.csv', 'rb') as csv9:
with open('C:\dev\OTQtxt\\result10.csv', 'rb') as csv10:
with open('C:\dev\OTQtxt\\result11.csv', 'rb') as csv11:
with open('C:\dev\OTQtxt\\result12.csv', 'rb') as csv12:
with open('C:\dev\OTQtxt\\result13.csv', 'rb') as csv13:
with open('C:\dev\OTQtxt\\result14.csv', 'rb') as csv14:
reader1 = csv.reader(csv1, delimiter=',')
reader2 = csv.reader(csv2, delimiter=',')
reader3 = csv.reader(csv3, delimiter=',')
reader4 = csv.reader(csv4, delimiter=',')
reader5 = csv.reader(csv5, delimiter=',')
reader6 = csv.reader(csv6, delimiter=',')
reader7 = csv.reader(csv7, delimiter=',')
reader8 = csv.reader(csv8, delimiter=',')
reader9 = csv.reader(csv9, delimiter=',')
reader10 = csv.reader(csv10, delimiter=',')
reader11 = csv.reader(csv11, delimiter=',')
reader12 = csv.reader(csv12, delimiter=',')
reader13 = csv.reader(csv13, delimiter=',')
reader14 = csv.reader(csv14, delimiter=',')
all = []
for row1, row2, row3, row4, row5, row6, row7, row8, row9, \
row10, row11, row12, row13, row14 in zip(reader1, \
reader2, reader3,\
reader4, reader5, \
reader7, reader8,\
reader9, reader10, \
reader11, reader12,\
reader13,reader14):
row14.append(row1[0])
row14.append(row2[0])
row14.append(row3[0])
row14.append(row4[0])
row14.append(row5[0])
row14.append(row6[0])
row14.append(row7[0])
row14.append(row8[0])
row14.append(row9[0])
row14.append(row10[0])
row14.append(row11[0])
row14.append(row12[0])
row14.append(row13[0])
all.append(row14)
with open('C:\dev\OTQtxt\TODAY.csv', 'wb') as output:
writer = csv.writer(output, delimiter=',')
writer.writerows(all)
I think some of my indenting has been messed up when copying it in, but you should get the idea. And I don't expect to read through all of that, it's very repetitive.
I have seen a few similar/related questions recommending unix tools. In case anybody was going to suggest that I'd better tell you this will be running on windows.
If anybody has any ideas of how I could clean this up and actually get it working. I'd be hugely grateful!
Creating files:
xxxx#xxxx:/tmp/files$ for i in {1..15}; do echo -e "1\n2\n3\n4" > "my_csv_$i.csv"; done
xxxx#xxxx:/tmp/files$ more my_csv_1.csv
1
2
3
4
xxxx#xxxx:/tmp/files$ ls
my_csv_10.csv my_csv_11.csv my_csv_12.csv my_csv_13.csv my_csv_14.csv my_csv_15.csv my_csv_1.csv my_csv_2.csv my_csv_3.csv my_csv_4.csv my_csv_5.csv my_csv_6.csv my_csv_7.csv my_csv_8.csv my_csv_9.csv
Using itertools.izip_longest:
with open('result.csv', 'w') as f_obj:
rows = []
files = os.listdir('.')
for f in files:
rows.append(open(f).readlines())
iter = izip_longest(*rows)
for row in iter:
f_obj.write(','.join([field.strip() for field in row if field is not None])+'\n')
Output:
xxxxx#xxxx:/tmp/files$ more result.csv
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
2,2,2,2,2,2,2,2,2,2,2,2,2,2,2
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3
4,4,4,4,4,4,4,4,4,4,4,4,4,4,4
That's not the best solution since you will put all your data in memory. But you should get an idea how to do this. By the way if all your data is numeric, I would stays with numpy and play with multidimensional arrays.
You can use this, the files' names can also be specified in a loop:
import numpy as np
filenames = ['file1', 'file2', 'file3'] # all the files to be read in
data = [] # saves data from the files
for filename in filenames:
data.append(open(filename, 'r').readlines()) # append a list of all numbers in the current file
data = np.matrix(data).T # transpose the list of list using numpy
data_string = '\n'.join([','.join([k.strip() for k in j]) for j in data.tolist()]) # create a string by separating inner elements by ',' and outer list by '\n'
with open('newfile', 'w') as fp:
fp.write(data_string)
Have just tested:
import csv
import glob
files = glob.glob1("C:\\dev\\OTQtxt", "*csv")
rows=[]
with open('C:\\dev\\OTQtxt\\one.csv', 'a') as oneFile:
for file in files:
rows.append(open("C:\\dev\\OTQtxt\\" + file, 'r').read().splitlines())
for row in rows:
writer = csv.writer(oneFile)
writer.writerow(''.join(row))
This will result a file one.csv in your directory with csv's that will contain all merdged *csv files

Python Read Text File Column by Column

So I have a text file that looks like this:
1,989785345,"something 1",,234.34,254.123
2,234823423,"something 2",,224.4,254.123
3,732847233,"something 3",,266.2,254.123
4,876234234,"something 4",,34.4,254.123
...
I'm running this code right here:
file = open("file.txt", 'r')
readFile = file.readline()
lineID = readFile.split(",")
print lineID[1]
This lets me break up the content in my text file by "," but what I want to do is separate it into columns because I have a massive number of IDs and other things in each line. How would I go about splitting the text file into columns and call each individual row in the column one by one?
You have a CSV file, use the csv module to read it:
import csv
with open('file.txt', 'rb') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
This still gives you data by row, but with the zip() function you can transpose this to columns instead:
import csv
with open('file.txt', 'rb') as csvfile:
reader = csv.reader(csvfile)
for column in zip(*reader):
Do be careful with the latter; the whole file will be read into memory in one go, and a large CSV file could eat up all your available memory in the process.

Python: appending/merging multiple csv files respecting headers and write to csv

[Using Python3] I'm very new to (Python) programming but nonetheless am writing a script that scans a folder for certain csv files, then I want to read them all and append them and write them into another csv file.
In between it is required that data is returned only where the values in a certain columns are matched to a set criteria.
All csv files have the same columns, and would look somewhere like this:
header1 header2 header3 header4 ...
string float string float ...
string float string float ...
string float string float ...
string float string float ...
... ... ... ... ...
The code I'm working with right now is the following (below), however it just keeps on overwriting the data from the previous file. That does make sense to me, I just cannot figure out how to get it working though.
Code:
import csv
import datetime
import sys
import glob
import itertools
from collections import defaultdict
# Raw data files have the format like '2013-06-04'. To be able to use this script during the whole of 2013, the glob is set to search for the pattern '2013-*.csv'
files = [f for f in glob.glob('2013-*.csv')]
# Output file looks like '20130620-filtered.csv'
outfile = '{:%Y%m%d}-filtered.csv'.format(datetime.datetime.now())
# List of 'Header4' values to be filtered for writing output
header4 = ['string1', 'string2', 'string3', 'string4']
for f in files:
with open(f, 'r') as f_in:
dict_reader = csv.DictReader(f_in)
with open(outfile, 'w') as f_out:
dict_writer = csv.DictWriter(f_out, lineterminator='\n', fieldnames=dict_reader.fieldnames)
dict_writer.writeheader()
for row in dict_reader:
if row['Campaign'] in campaign_names:
dict_writer.writerow(row)
I also tried something like readers = list(itertools.chain(*map(lambda f: csv.DictReader(open(f)), files))), and trying to iterate over the readers however then I cannot figure out how to work with the headers. (I get the error that itertools.chain() does not have the fieldnames attribute).
Any help is very much appreciated!
You keep re-opening the file and overwriting it.
Open outfile once, before your loops start. For the first file you read, write the header and the rows. For rest of the files, just write the rows.
Something like
with open(outfile, 'w') as f_out:
dict_writer = None
for f in files:
with open(f, 'r') as f_in:
dict_reader = csv.DictReader(f_in)
if not dict_writer:
dict_writer = csv.DictWriter(f_out, lineterminator='\n', fieldnames=dict_reader.fieldnames)
dict_writer.writeheader()
for row in dict_reader:
if row['Campaign'] in campaign_names:
dict_writer.writerow(row)

Categories

Resources