Writing and appending multiple csv data into new csv using python

Writing and appending multiple csv data into new csv using python - python

I have a directory where there are multiple csv files. Currently I am able to read all the files sequentially using for loop and display their contents.
I need to to write the contents from all the csv files sequentially into a new csv file but I am missing something as in my new csv has no data in it.
this is what I am doing :
import os
import csv
path = r'C:\Users\hu170f\Documents\WORK\MAAP_FILE_DB_REQ\\'
fileNames = os.listdir(path)
for f in fileNames:
file = open(path+f)
csvreader = csv.reader(file)
rows = []
for row in csvreader:
rows.append(row)
for i in rows:
print(i)
#OFile = open('C:\Users\hu170f\Documents\WORK\MAAP_FILE_DB_REQ\ALL_DATA.csv','w')
writer = csv.writer(open('C:\Users\hu170f\Documents\WORK\MAAP_FILE_DB_REQ\ALL_DATA.csv', 'wb'))
#for row in csvreader:
# row1 = csvreader.next()
writer.writerow(i)

You are overwriting the file each row you try to write.
Using the w argument for the open method will overwrite existing files.
The argument you need to use in the case you want to append to files (or create new files if non-existing) is a
See Python File Open for more informations about python file modes.
import os
import csv
path = r'C:\Users\hu170f\Documents\WORK\MAAP_FILE_DB_REQ'
fileNames = os.listdir(path)
with open('C:\Users\hu170f\Documents\WORK\MAAP_FILE_DB_REQ\ALL_DATA.csv', 'a') as output:
writer = csv.writer(output)
for f in fileNames:
with open(os.path.join(path, f), "r") as file:
csvreader = csv.reader(file)
for row in csvreader:
print(row)
writer.writerow(row)
If the csv files have the same columns and formats you could also simply copy the first file and append the others, excluding their headers.
import os
import shutil
path = r'C:\Users\hu170f\Documents\WORK\MAAP_FILE_DB_REQ'
fileNames = os.listdir(path)
output = r'C:\Users\hu170f\Documents\WORK\MAAP_FILE_DB_REQ\ALL_DATA.csv'
# Copy the first file:
shutil.copyfile(os.path.join(path,fileNames[0]), output)
# Append the remaining file contents, excluding each first line
with open(output, 'a') as out:
for file in fileNames[1:]:
with open(os.path.join(path, file), 'r') as in_:
out.write(''.join(in_.readlines()[1:]))

Related

Match a set of data in a csv file with multiple csv files

I would like to take a set of data (a single csv file) to search the data that matched in multiple csv files.
The algorithm as below:
Take the data from a single csv file.
Add the path for multiple csv files.
Open the file and read the entire row
If the data(single csv) is match, will print out the entire row
from csv import reader
import pandas as pd
import glob
import os
#Get the name from csv
a_list=["Name"]
a = pd.read_csv(r"C:\Users\Master.csv", usecols=a_list)
a.reset_index(inplace=True)
#Path for multiple csv
path = r"C:\Users\DeliveryReport"
data_file = glob.glob(os.path.join(path + "/part*.csv"))
#Get the entire row if first column same as the name get from csv
for filename in data_file:
with open(filename, 'r', encoding="utf8") as read_obj:
print(filename)
csv_reader = reader(read_obj)
header = next(csv_reader)
if header != None:
for row in csv_reader:
i = 0
for i in range(len(a)):
if row[0] == a.iloc[i,1]:
print(row)

How to add data to existing rows of a CSV file? [duplicate]

This question already has answers here:
How to add a string to each line in a file?
(3 answers)
Closed 9 months ago.
I have already an existing CSV file that I am accessing and I want to append the data to the first row, but it writes data at the end of the file.
What I am getting:
But I want the data to append like this:
Code I have done so far:
import CSV
with open('explanation.csv' , 'a', newline="") as file:
myFile = csv.writer(file)
myFile.writerow(["1"])

What you're actually wanting to do is replace data in an existing CSV file with new values, however in order to update a CSV file you must rewrite the whole thing.
One way to do that is by reading the whole thing into memory, updating the data, and then use it to overwrite the existing file. Alternatively you could process the file a row-at-a-time and store the results in a temporary file, then replace the original with the temporary file when finished updating them all.
The code to do the latter is shown below:
import csv
import os
from pathlib import Path
from tempfile import NamedTemporaryFile
filepath = Path('explanation.csv') # CSV file to update.
with open(filepath, 'r', newline='') as csv_file, \
NamedTemporaryFile('w', newline='', dir=filepath.parent, delete=False) as tmp_file:
reader = csv.reader(csv_file)
writer = csv.writer(tmp_file)
# Replace value in the first column of the first 5 rows.
for data_value in range(1, 6):
row = next(reader)
row[0] = data_value
writer.writerow(row)
writer.writerows(reader) # Copy remaining rows of original file.
# Replace original file with updated version.
os.replace(tmp_file.name, filepath)
print('CSV file updated')

You could read in the entire file, append your rows in memory, and then write the entire file:
def append(fname, data):
with open(fname) as f:
reader = csv.reader(f)
data = list(reader) + list(data)
with open(fname, 'w') as f:
writer = csv.writer(f)
writer.writerows(data)

Combine two rows into one in a csv file with Python

I am trying to combine multiple rows in a csv file together. I could easily do it in Excel but I want to do this for hundreds of files so I need it to be as a code. I have tried to store rows in arrays but it doesn't seem to work. I am using Python to do it.
So lets say I have a csv file;
1,2,3
4,5,6
7,8,9
All I want to do is to have a csv file as this;
1,2,3,4,5,6,7,8,9
The code I have tried is this;
fin = open("C:\\1.csv", 'r+')
fout = open("C:\\2.csv",'w')
for line in fin.xreadlines():
new = line.replace(',', ' ', 1)
fout.write (new)
fin.close()
fout.close()
Could you please help?

You should be using the csv module for this as splitting CSV manually on commas is very error-prone (single columns can contain strings with commas, but you would incorrectly end up splitting this into multiple columns). The CSV module uses lists of values to represent single rows.
import csv
def return_contents(file_name):
with open(file_name) as infile:
reader = csv.reader(infile)
return list(reader)
data1 = return_contents('csv1.csv')
data2 = return_contents('csv2.csv')
print(data1)
print(data2)
combined = []
for row in data1:
combined.extend(row)
for row in data2:
combined.extend(row)
with open('csv_out.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile)
writer.writerow(combined)
That code gives you the basis of the approach but it would be ugly to extend this for hundreds of files. Instead, you probably want os.listdir to pull all the files in a single directory, one by one, and add them to your output. This is the reason that I packed the reading code into the return_contents function; we can repeat the same process millions of times on different files with only one set of code to do the actual reading. Something like this:
import csv
import os
def return_contents(file_name):
with open(file_name) as infile:
reader = csv.reader(infile)
return list(reader)
all_files = os.listdir('my_csvs')
combined_output = []
for file in all_files:
data = return_contents('my_csvs/{}'.format(file))
for row in data:
combined_output.extend(row)
with open('csv_out.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile)
writer.writerow(combined_output)

If you are specially dealing with csv file format. I recommend you to use csv package for the file operations. If you also use with...as statement, you don't need to worry about closing the file etc. You just need to define the PATH then program will iterate all .csv files
Here is what you can do:
PATH = "your folder path"
def order_list():
data_list = []
for filename in os.listdir(PATH):
if filename.endswith(".csv"):
with open("data.csv") as csvfile:
read_csv = csv.reader(csvfile, delimiter=',', quoting=csv.QUOTE_NONNUMERIC)
for row in read_csv:
data_list.extend(row)
print(data_list)
if __name__ == '__main__':
order_list()

Store your data in pandas df
import pandas as pd
df = pd.read_csv('file.csv')
Store the modified dataframe into new one
df_2 = df.groupby('Column_Name').agg(lambda x: ' '.join(x)).reset_index() ## Write Name of your column
Write the df to new csv
df2.to_csv("file_modified.csv")

You could do it also like this:
fIn = open("test.csv", "r")
fOut = open("output.csv", "w")
fOut.write(",".join([line for line in fIn]).replace("\n",""))
fIn.close()
fOut.close()
I've you want now to run it on multiple file you can run it as script with arguments:
import sys
fIn = open(sys.argv[1], "r")
fOut = open(sys.argv[2], "w")
fOut.write(",".join([line for line in fIn]).replace("\n",""))
fIn.close()
fOut.close()
So now expect you use some Linux System and the script is called csvOnliner.py you could call it with:
for i in *.csv; do python csvOnliner.py $i changed_$i; done
With windows you could do it in a way like this:
FOR %i IN (*.csv) DO csvOnliner.py %i changed_%i

concatenate all files into a directory with python [duplicate]

Guys, I here have 200 separate csv files named from SH (1) to SH (200). I want to merge them into a single csv file. How can I do it?

As ghostdog74 said, but this time with headers:
with open("out.csv", "ab") as fout:
# first file:
with open("sh1.csv", "rb") as f:
fout.writelines(f)
# now the rest:
for num in range(2, 201):
with open("sh"+str(num)+".csv", "rb") as f:
next(f) # skip the header, portably
fout.writelines(f)

Why can't you just sed 1d sh*.csv > merged.csv?
Sometimes you don't even have to use python!

Use accepted StackOverflow answer to create a list of csv files that you want to append and then run this code:
import pandas as pd
combined_csv = pd.concat( [ pd.read_csv(f) for f in filenames ] )
And if you want to export it to a single csv file, use this:
combined_csv.to_csv( "combined_csv.csv", index=False )

fout=open("out.csv","a")
for num in range(1,201):
for line in open("sh"+str(num)+".csv"):
fout.write(line)
fout.close()

I'm just going to throw another code example into the basket:
from glob import glob
with open('singleDataFile.csv', 'a') as singleFile:
for csvFile in glob('*.csv'):
for line in open(csvFile, 'r'):
singleFile.write(line)

It depends what you mean by "merging" -- do they have the same columns? Do they have headers? For example, if they all have the same columns, and no headers, simple concatenation is sufficient (open the destination file for writing, loop over the sources opening each for reading, use shutil.copyfileobj from the open-for-reading source into the open-for-writing destination, close the source, keep looping -- use the with statement to do the closing on your behalf). If they have the same columns, but also headers, you'll need a readline on each source file except the first, after you open it for reading before you copy it into the destination, to skip the headers line.
If the CSV files don't all have the same columns then you need to define in what sense you're "merging" them (like a SQL JOIN? or "horizontally" if they all have the same number of lines? etc, etc) -- it's hard for us to guess what you mean in that case.

Quite easy to combine all files in a directory and merge them
import glob
import csv
# Open result file
with open('output.txt','wb') as fout:
wout = csv.writer(fout,delimiter=',')
interesting_files = glob.glob("*.csv")
h = True
for filename in interesting_files:
print 'Processing',filename
# Open and process file
with open(filename,'rb') as fin:
if h:
h = False
else:
fin.next()#skip header
for line in csv.reader(fin,delimiter=','):
wout.writerow(line)

A slight change to the code above as it does not actually work correctly.
It should be as follows...
from glob import glob
with open('main.csv', 'a') as singleFile:
for csv in glob('*.csv'):
if csv == 'main.csv':
pass
else:
for line in open(csv, 'r'):
singleFile.write(line)

If you are working on linux/mac you can do this.
from subprocess import call
script="cat *.csv>merge.csv"
call(script,shell=True)

If the merged CSV is going to be used in Python then just use glob to get a list of the files to pass to fileinput.input() via the files argument, then use the csv module to read it all in one go.

OR, you could just do
cat sh*.csv > merged.csv

You can simply use the in-built csv library. This solution will work even if some of your CSV files have slightly different column names or headers, unlike the other top-voted answers.
import csv
import glob
filenames = [i for i in glob.glob("SH*.csv")]
header_keys = []
merged_rows = []
for filename in filenames:
with open(filename) as f:
reader = csv.DictReader(f)
merged_rows.extend(list(reader))
header_keys.extend([key for key in reader.fieldnames if key not in header_keys])
with open("combined.csv", "w") as f:
w = csv.DictWriter(f, fieldnames=header_keys)
w.writeheader()
w.writerows(merged_rows)
The merged file will contain all possible columns (header_keys) that can be found in the files. Any absent columns in a file would be rendered as blank / empty (but preserving rest of the file's data).
Note:
This won't work if your CSV files have no headers. In that case you can still use the csv library, but instead of using DictReader & DictWriter, you'll have to work with the basic reader & writer.
This may run into issues when you are dealing with massive data since the entirety of the content is being store in memory (merged_rows list).

Over the solution that made #Adders and later on improved by #varun, I implemented some little improvement too leave the whole merged CSV with only the main header:
from glob import glob
filename = 'main.csv'
with open(filename, 'a') as singleFile:
first_csv = True
for csv in glob('*.csv'):
if csv == filename:
pass
else:
header = True
for line in open(csv, 'r'):
if first_csv and header:
singleFile.write(line)
first_csv = False
header = False
elif header:
header = False
else:
singleFile.write(line)
singleFile.close()
Best regards!!!

You could import csv then loop through all the CSV files reading them into a list. Then write the list back out to disk.
import csv
rows = []
for f in (file1, file2, ...):
reader = csv.reader(open("f", "rb"))
for row in reader:
rows.append(row)
writer = csv.writer(open("some.csv", "wb"))
writer.writerows("\n".join(rows))
The above is not very robust as it has no error handling nor does it close any open files.
This should work whether or not the the individual files have one or more rows of CSV data in them. Also I did not run this code, but it should give you an idea of what to do.

I modified what #wisty said to be worked with python 3.x, for those of you that have encoding problem, also I use os module to avoid of hard coding
import os
def merge_all():
dir = os.chdir('C:\python\data\\')
fout = open("merged_files.csv", "ab")
# first file:
for line in open("file_1.csv",'rb'):
fout.write(line)
# now the rest:
list = os.listdir(dir)
number_files = len(list)
for num in range(2, number_files):
f = open("file_" + str(num) + ".csv", 'rb')
f.__next__() # skip the header
for line in f:
fout.write(line)
f.close() # not really needed
fout.close()

Here is a script:
Concatenating csv files named SH1.csv to SH200.csv
Keeping the headers
import glob
import re
# Looking for filenames like 'SH1.csv' ... 'SH200.csv'
pattern = re.compile("^SH([1-9]|[1-9][0-9]|1[0-9][0-9]|200).csv$")
file_parts = [name for name in glob.glob('*.csv') if pattern.match(name)]
with open("file_merged.csv","wb") as file_merged:
for (i, name) in enumerate(file_parts):
with open(name, "rb") as file_part:
if i != 0:
next(file_part) # skip headers if not first file
file_merged.write(file_part.read())

Updating wisty's answer for python3
fout=open("out.csv","a")
# first file:
for line in open("sh1.csv"):
fout.write(line)
# now the rest:
for num in range(2,201):
f = open("sh"+str(num)+".csv")
next(f) # skip the header
for line in f:
fout.write(line)
f.close() # not really needed
fout.close()

Let's say you have 2 csv files like these:
csv1.csv:
id,name
1,Armin
2,Sven
csv2.csv:
id,place,year
1,Reykjavik,2017
2,Amsterdam,2018
3,Berlin,2019
and you want the result to be like this csv3.csv:
id,name,place,year
1,Armin,Reykjavik,2017
2,Sven,Amsterdam,2018
3,,Berlin,2019
Then you can use the following snippet to do that:
import csv
import pandas as pd
# the file names
f1 = "csv1.csv"
f2 = "csv2.csv"
out_f = "csv3.csv"
# read the files
df1 = pd.read_csv(f1)
df2 = pd.read_csv(f2)
# get the keys
keys1 = list(df1)
keys2 = list(df2)
# merge both files
for idx, row in df2.iterrows():
data = df1[df1['id'] == row['id']]
# if row with such id does not exist, add the whole row
if data.empty:
next_idx = len(df1)
for key in keys2:
df1.at[next_idx, key] = df2.at[idx, key]
# if row with such id exists, add only the missing keys with their values
else:
i = int(data.index[0])
for key in keys2:
if key not in keys1:
df1.at[i, key] = df2.at[idx, key]
# save the merged files
df1.to_csv(out_f, index=False, encoding='utf-8', quotechar="", quoting=csv.QUOTE_NONE)
With the help of a loop you can achieve the same result for multiple files as it is in your case (200 csv files).

If the files aren't numbered in order, take the hassle-free approach below:
Python 3.6 on windows machine:
import pandas as pd
from glob import glob
interesting_files = glob("C:/temp/*.csv") # it grabs all the csv files from the directory you mention here
df_list = []
for filename in sorted(interesting_files):
df_list.append(pd.read_csv(filename))
full_df = pd.concat(df_list)
# save the final file in same/different directory:
full_df.to_csv("C:/temp/merged_pandas.csv", index=False)

An easy-to-use function:
def csv_merge(destination_path, *source_paths):
'''
Merges all csv files on source_paths to destination_path.
:param destination_path: Path of a single csv file, doesn't need to exist
:param source_paths: Paths of csv files to be merged into, needs to exist
:return: None
'''
with open(destination_path,"a") as dest_file:
with open(source_paths[0]) as src_file:
for src_line in src_file.read():
dest_file.write(src_line)
source_paths.pop(0)
for i in range(len(source_paths)):
with open(source_paths[i]) as src_file:
src_file.next()
for src_line in src_file:
dest_file.write(src_line)

import pandas as pd
import os
df = pd.read_csv("e:\\data science\\kaggle assign\\monthly sales\\Pandas-Data-Science-Tasks-master\\SalesAnalysis\\Sales_Data\\Sales_April_2019.csv")
files = [file for file in os.listdir("e:\\data science\\kaggle assign\\monthly sales\\Pandas-Data-Science-Tasks-master\\SalesAnalysis\\Sales_Data")
for file in files:
print(file)
all_data = pd.DataFrame()
for file in files:
df=pd.read_csv("e:\\data science\\kaggle assign\\monthly sales\\Pandas-Data-Science-Tasks-master\\SalesAnalysis\\Sales_Data\\"+file)
all_data = pd.concat([all_data,df])
all_data.head()

I have done it by implementing a function that expect output file and paths of the input files.
The function copy the file content of the first file into the output file and then does the same for the rest of input files but without the header line.
def concat_files_with_header(output_file, *paths):
for i, path in enumerate(paths):
with open(path) as input_file:
if i > 0:
next(input_file) # Skip header
output_file.writelines(input_file)
Usage example of the function:
if __name__ == "__main__":
paths = [f"sh{i}.csv" for i in range(1, 201)]
with open("output.csv", "w") as output_file:
concat_files_with_header(output_file, *paths)

Showing duplicates in columns of csv files

I am trying to read a particular column ("Labels") of any .csv file in a path. Then I want to print each duplicate and the number of times that duplicate appeared.
import os
import csv
from collections import Counter
items = []
directory = os.path.join("c:\\","Users\Bob\Desktop\CSVs")
for root,dirs,files in os.walk(directory):
for file in files:
if file.endswith(".csv"):
with open(file) as csvFile:
reader = csv.DictReader(file)
for row in reader:
items.append(row["Labels"])
print(row)
counted = dict(Counter(items))
print(counted)
I get the following error
File "C:/Users/Bob/Desktop/CSVs/Dupe Check.py", line 14, in <module>
items.append(row["Labels"])
KeyError: 'Labels'
The labels column is always the second column of the csv files.

The problem is you're reading in the file name and not the file object; therefore, it couldn't find the word key "Labels".
with open(file) as csvFile:
reader = csv.DictReader(file)
Try replacing file with csvFile instead.
with open(file) as csvFile:
reader = csv.DictReader(csvFile)
If you printed out the variable reader, you'll have a better understanding.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Writing and appending multiple csv data into new csv using python - python

Related

Match a set of data in a csv file with multiple csv files

How to add data to existing rows of a CSV file? [duplicate]

Combine two rows into one in a csv file with Python

concatenate all files into a directory with python [duplicate]

Showing duplicates in columns of csv files

Categories

Resources