Using Python to combine csv files with headers on different rows - python

I'm trying to combine a bunch of csv files. Each csv file has a different number of columns. This is not a problem, I can easily loop through the files and pull in all the column headers, pasting them into an empty file to use as a base.
The problem I'm having is that the column headers are on different rows in each file.
For example:
Table1
Random Text
!,Header1,Header2,Header3
*,123,124,5235
*,124,15,23624
*,135,677,234
Table2
Random Text
Random Text
!,Header1,Header2,Header4
*,124,2156,7478
*,126,12357,547
*,237,12,267
Output:
Table,Header1,Header2,Header3,Header4
Table1,123,124,5235
Table1,124,15,23624
Table1,135,677,234
Table2,124,2156,7478
Table2,126,12357,547
Table2,237,12,267
My existing code looks something like this:
files = glob.glob(r'//Directory/*.csv')
#This block goes through each file and works out which variables exist
variablelist=[]
for f in files:
with open(f,'r') as csvfile:
read_rows = csv.reader(csvfile)
for row in read_rows:
if row[0]!="*": #The last row with no * in column 1 is the header row
rowlist = row
variablelist.extend(x for x in rowlist if x not in variablelist)
list.sort(variablelist)
I use the fact that the header row is the last row without a * in the first column. I work out which row the headers are on and then store the header names in a list - combining the same list from all files.
I then try and combine the files together using this code that I found by searching this website:
with open("out.csv", "w", newline="") as f_out: # Comment 2 below
writer = csv.DictWriter(f_out, fieldnames=variablelist)
for f in files:
with open(f, "r", newline="",) as f_in:
reader = csv.DictReader(f_in) # Uses the field names in this file
for line in reader:
# Comment 3 below
writer.writerow(line)
The problem is, I don't know how to deal with the headers being on different lines. I tried using code to define the header row number, but don't know how to implement this into the code above - (Can dictreader skip a dynamic number of rows before finding headers?)
with open(f,'r') as csvfile:
read_rows = csv.reader(csvfile)
header_row_number = 0
for row in read_rows:
if row[0]!="*":
header_row_number=read_rows.line_num
Any help would be much appreciated

Related

Python If Statement and Lists

i'm fairly new to python and am looking for some help. What i would like to do is read a csv file and then use a for loop with an if statement to locate at rows of that data contain a value and print it out with a header and some formatting using f'.
The issue i seem to have it when finding the data using the if statement, im unsure what i can output the data to, which will then enable it to be printed out (the search output could contain multiple rows and columns):
with open(r'data.csv', 'r') as csv_file:
# loop through the csv file using for loop
for row in csv_file:
# search each row of data for the input from the user
if panel_number in row:
??
Use the csv module. Then in your if statement you can append the row to a list of matches
import csv
matched_rows = []
with open(r'data.csv', 'r') as file:
file.readline() # skip over header line -- remove this if there's no header
csv_file = csv.reader(file)
for row in csv_file:
# search each row of data for the input from the user
if row[0] == panel_number:
matched_rows.append(row)
print(matched_rows)

copy all except first row into new csv file python

I generate several csv files each day, and I'm trying to write a python script that will open each csv file, one by one, rewrite the csv to a new location and fill in the mission information that is in the filename.
I've been able to cobble together separate python scripts that can do most of it, but I'm hitting brick walls.
each CSV has a standard filename:
"Number_Name_Location.csv"
Row 1 has a header that labels each column, Name, Location, Date etc, and each .csv can have n number of pre-filled in rows.
what I'm trying to automate is the following steps:
Opens the 1st .csv in the “/ToParse” folder
Fill in the remaining details for all rows, which is gathered from the filename itself.
Column A: Number
Column B: Date of Parsing
Column k: Name
Column L: Location
writes a new csv in the folder “/Parsed” w
continue to parse the next .csv until no .csv are in the "/ToParse" folder
move all orginal parsed files to /Parsed/Original
below is the cobbled together code, which is overwriting the header row,
How can i adjust the code to ignore the first row in the csv it's opening and just copy from rows 1 onward
import glob
import csv
from datetime import date
with open('Parsed/output.csv', 'w', newline='') as f_output:
csv_output = csv.writer(f_output)
for filename in glob.glob('*.csv'):
with open(filename, newline='') as f_input:
csv_input = csv.reader(f_input)
filename_parts = filename.split("_")
#print(filename_parts[0])
for row in csv_input:
row[0] = filename_parts[0]
row[1] = date.today()
row[10] = filename_parts[1]
row[11] = filename_parts[2]
csv_output.writerow(row)
The open(filename, newline='') seems strange to me. Are you sure you want newline=''?
You can skip the first line by manually doing f_input.readline() before the for loop. This will read (=skip) the first line and the for-loop will start on the next line.
Or you could do for row in csv_input[1:] but I am not sure that works and cannot test it right now. ([1:] means skip the first item in any list-like variable (formally called: Iterable) )

Joining the columns of one CSV to another CSV

So I'm trying to combine column values from one csv to another while saving it into a final csv file. But I want to iterate through all the rows adding the column values of each row to each row of the original csv.
In other words say csv1 has 3 rows.
Row 1: Frog,Rat,Duck
Row 2: Cat,Dog,Cow
Row 3: Moose,Fox,Zebra
And I want to combine 2 more column values from csv2 to each of those rows.
Row 1: Chicken,Pig
Row 2:
Row 3: Bear,Boar
So csv3 would end up looking like.
Row 1: Frog,Rat,Duck,Chicken,Pig
Row 2: Moose,Fox,Zebra,Bear,Boar
But at the same time if there's a row in csv2 that has no values at all I don't want it to copy the row from csv1. In other words that row will not exist at all in the final csv file. I prefer not to use pandas as I have just been using the csv module thus far throughout my code but any method is appreciated.
So far I have come across this method which works if there's only one single row. But when there's more than that it just adds random lines and appends the values all over the place. And it combines both of the columns into one string while adding an extra blank line at the end of the csv for some odd reason.
import csv
f1 = open ("2.csv","r", encoding='utf-8')
with open("3.csv","w", encoding='utf-8', newline='') as f:
writer = csv.writer(f)
with open("1.csv","r", encoding='utf-8') as csvfile:
reader = csv.reader(csvfile, delimiter=",")
for row in reader:
row[6] = f1.readline()
writer.writerow(row)
f1.close()
Using the same example csvs above the results given are.
Frog,Rat,Duck,Chicken,Pig
Cat,Dog,Cow
Moose,Fox,Zebra,Bear,Boar
You can zip together the two files and then iterate through each row. Then you can concatenate the two lists and write the result to a file.
To check if there is an empty row we can compare the set of the row to the set of an empty string.
import csv
new_csv_data = []
EMPTY_ROW = set([""])
with open("1.csv", "r", newline="") as first_file, open("2.csv", "r", newline="") as second_file, open("3.csv", "w", newline="") as out_file:
first_file_reader = csv.reader(first_file)
second_file_reader = csv.reader(second_file)
out_file_writer = csv.writer(out_file)
# The iterator will stop when the shortest file is finished
for row_1, row_2 in zip(first_file_reader, second_file_reader):
# Check if the second row is empty, skipping if it is
if not row_2 or set(row_2) == EMPTY_ROW:
continue
out_file_writer.writerow(row_1 + row_2)

Python - Printing individual cells from an Excel spreadsheet in CSV format

I have an excel spreadsheet saved as a CSV file, but cannot find a way to call individual values from cells into Python using the CSV module. Any help would be greatly appreciated
There is also a Python library capable of reading xls data. Have a look at python-xlrd.
For writing xls data, you can use python-xlwt.
The csv module provide readers that iterate over the rows of a csv file - the rows are lists of strings. One way to get access to individual cells would be to:
Read the entire file in as a list of lists
import csv
with open('test.csv', 'r') as f:
reader = csv.reader(f)
the_whole_file = list(reader)
Then access the individual cells by indexing into the_whole_file. The first index is the row and the second index is the column - both are zero based. To access the cell at the second row, fourth column:
row = 1
column = 3
cell_R1_C3 = the_whole_file[row][column]
print cell_R1_C3
If you have the excel file as a CSV, you can use csv.reader
import csv
myFilePath = "/Path/To/Your/File"
with open(myFilePath,'rb') as csvfile:
reader = csv.reader( csvfile, delimiter=',' )
for row in reader:
# 'row' has all the cells (thanks to wwii for the fix!). Get the first 4 columns
a, b, c, d = row[:4]

Python- Import Multiple Files to a single .csv file

I have 125 data files containing two columns and 21 rows of data and I'd like to import them into a single .csv file (as 125 pairs of columns and only 21 rows).
This is what my data files look like:
I am fairly new to python but I have come up with the following code:
import glob
Results = glob.glob('./*.data')
fout='c:/Results/res.csv'
fout=open ("res.csv", 'w')
for file in Results:
g = open( file, "r" )
fout.write(g.read())
g.close()
fout.close()
The problem with the above code is that all the data are copied into only two columns with 125*21 rows.
Any help is very much appreciated!
This should work:
import glob
files = [open(f) for f in glob.glob('./*.data')] #Make list of open files
fout = open("res.csv", 'w')
for row in range(21):
for f in files:
fout.write( f.readline().strip() ) # strip removes trailing newline
fout.write(',')
fout.write('\n')
fout.close()
Note that this method will probably fail if you try a large number of files, I believe the default limit in Python is 256.
You may want to try the python CSV module (http://docs.python.org/library/csv.html), which provides very useful methods for reading and writing CSV files. Since you stated that you want only 21 rows with 250 columns of data, I would suggest creating 21 python lists as your rows and then appending data to each row as you loop through your files.
something like:
import csv
rows = []
for i in range(0,21):
row = []
rows.append(row)
#not sure the structure of your input files or how they are delimited, but for each one, as you have it open and iterate through the rows, you would want to append the values in each row to the end of the corresponding list contained within the rows list.
#then, write each row to the new csv:
writer = csv.writer(open('output.csv', 'wb'), delimiter=',')
for row in rows:
writer.writerow(row)
(Sorry, I cannot add comments, yet.)
[Edited later, the following statement is wrong!!!] "The davesnitty's generating the rows loop can be replaced by rows = [[]] * 21." It is wrong because this would create the list of empty lists, but the empty lists would be a single empty list shared by all elements of the outer list.
My +1 to using the standard csv module. But the file should be always closed -- especially when you open that much of them. Also, there is a bug. The row read from the file via the -- even though you only write the result here. The solution is actually missing. Basically, the row read from the file should be appended to the sublist related to the line number. The line number should be obtained via enumerate(reader) where reader is csv.reader(fin, ...).
[added later] Try the following code, fix the paths for your puprose:
import csv
import glob
import os
datapath = './data'
resultpath = './result'
if not os.path.isdir(resultpath):
os.makedirs(resultpath)
# Initialize the empty rows. It does not check how many rows are
# in the file.
rows = []
# Read data from the files to the above matrix.
for fname in glob.glob(os.path.join(datapath, '*.data')):
with open(fname, 'rb') as f:
reader = csv.reader(f)
for n, row in enumerate(reader):
if len(rows) < n+1:
rows.append([]) # add another row
rows[n].extend(row) # append the elements from the file
# Write the data from memory to the result file.
fname = os.path.join(resultpath, 'result.csv')
with open(fname, 'wb') as f:
writer = csv.writer(f)
for row in rows:
writer.writerow(row)

Categories

Resources