copy all except first row into new csv file python - python

I generate several csv files each day, and I'm trying to write a python script that will open each csv file, one by one, rewrite the csv to a new location and fill in the mission information that is in the filename.
I've been able to cobble together separate python scripts that can do most of it, but I'm hitting brick walls.
each CSV has a standard filename:
"Number_Name_Location.csv"
Row 1 has a header that labels each column, Name, Location, Date etc, and each .csv can have n number of pre-filled in rows.
what I'm trying to automate is the following steps:
Opens the 1st .csv in the “/ToParse” folder
Fill in the remaining details for all rows, which is gathered from the filename itself.
Column A: Number
Column B: Date of Parsing
Column k: Name
Column L: Location
writes a new csv in the folder “/Parsed” w
continue to parse the next .csv until no .csv are in the "/ToParse" folder
move all orginal parsed files to /Parsed/Original
below is the cobbled together code, which is overwriting the header row,
How can i adjust the code to ignore the first row in the csv it's opening and just copy from rows 1 onward
import glob
import csv
from datetime import date
with open('Parsed/output.csv', 'w', newline='') as f_output:
csv_output = csv.writer(f_output)
for filename in glob.glob('*.csv'):
with open(filename, newline='') as f_input:
csv_input = csv.reader(f_input)
filename_parts = filename.split("_")
#print(filename_parts[0])
for row in csv_input:
row[0] = filename_parts[0]
row[1] = date.today()
row[10] = filename_parts[1]
row[11] = filename_parts[2]
csv_output.writerow(row)

The open(filename, newline='') seems strange to me. Are you sure you want newline=''?
You can skip the first line by manually doing f_input.readline() before the for loop. This will read (=skip) the first line and the for-loop will start on the next line.
Or you could do for row in csv_input[1:] but I am not sure that works and cannot test it right now. ([1:] means skip the first item in any list-like variable (formally called: Iterable) )

Related

Python If Statement and Lists

i'm fairly new to python and am looking for some help. What i would like to do is read a csv file and then use a for loop with an if statement to locate at rows of that data contain a value and print it out with a header and some formatting using f'.
The issue i seem to have it when finding the data using the if statement, im unsure what i can output the data to, which will then enable it to be printed out (the search output could contain multiple rows and columns):
with open(r'data.csv', 'r') as csv_file:
# loop through the csv file using for loop
for row in csv_file:
# search each row of data for the input from the user
if panel_number in row:
??
Use the csv module. Then in your if statement you can append the row to a list of matches
import csv
matched_rows = []
with open(r'data.csv', 'r') as file:
file.readline() # skip over header line -- remove this if there's no header
csv_file = csv.reader(file)
for row in csv_file:
# search each row of data for the input from the user
if row[0] == panel_number:
matched_rows.append(row)
print(matched_rows)

Using Python to combine csv files with headers on different rows

I'm trying to combine a bunch of csv files. Each csv file has a different number of columns. This is not a problem, I can easily loop through the files and pull in all the column headers, pasting them into an empty file to use as a base.
The problem I'm having is that the column headers are on different rows in each file.
For example:
Table1
Random Text
!,Header1,Header2,Header3
*,123,124,5235
*,124,15,23624
*,135,677,234
Table2
Random Text
Random Text
!,Header1,Header2,Header4
*,124,2156,7478
*,126,12357,547
*,237,12,267
Output:
Table,Header1,Header2,Header3,Header4
Table1,123,124,5235
Table1,124,15,23624
Table1,135,677,234
Table2,124,2156,7478
Table2,126,12357,547
Table2,237,12,267
My existing code looks something like this:
files = glob.glob(r'//Directory/*.csv')
#This block goes through each file and works out which variables exist
variablelist=[]
for f in files:
with open(f,'r') as csvfile:
read_rows = csv.reader(csvfile)
for row in read_rows:
if row[0]!="*": #The last row with no * in column 1 is the header row
rowlist = row
variablelist.extend(x for x in rowlist if x not in variablelist)
list.sort(variablelist)
I use the fact that the header row is the last row without a * in the first column. I work out which row the headers are on and then store the header names in a list - combining the same list from all files.
I then try and combine the files together using this code that I found by searching this website:
with open("out.csv", "w", newline="") as f_out: # Comment 2 below
writer = csv.DictWriter(f_out, fieldnames=variablelist)
for f in files:
with open(f, "r", newline="",) as f_in:
reader = csv.DictReader(f_in) # Uses the field names in this file
for line in reader:
# Comment 3 below
writer.writerow(line)
The problem is, I don't know how to deal with the headers being on different lines. I tried using code to define the header row number, but don't know how to implement this into the code above - (Can dictreader skip a dynamic number of rows before finding headers?)
with open(f,'r') as csvfile:
read_rows = csv.reader(csvfile)
header_row_number = 0
for row in read_rows:
if row[0]!="*":
header_row_number=read_rows.line_num
Any help would be much appreciated

Python- Can't overwrite first column of .csv file with new time stamp

I have a .csv file (see image):
In the image there is a time column with datetime strings, I have a program that takes this column and only reads the times H:M:S. Yet, not only in my program I am attempting to take the column to read only the time stamp H:M:S , but I am also attempting to overwrite the time column of the first file and replace it with only the H:M:S time stamp onto a the new .csv with the following code.
CODE:
import csv
import datetime as dt
import os
File = 'C:/Users/Alan Cedeno/Desktop/Test_Folder/HiSAM1_data_160215_164858.csv'
root, ext = os.path.splitext(File)
output = root + '-new.csv'
with open(File,'r') as csvinput,open(output, 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
reader = csv.reader(csvinput)
all = []
row = next(reader)
for line in reader:
row.append(dt.datetime.strptime(line[0],'%m/%d/%Y %H:%M:%S').time())
all.append(row)
for row in reader:
row.append(row[0])
all.append(row)
writer.writerows(all)
The program works, and takes the datetime strings and overwrites the string with the time stamp H:M:S in a new .csv file. However, here is the problem, the output file instead of replacing the time column it replaced every column obtaining an output file that looks like this. See 2nd image:
At this point I don' t really know how to make the new output file to look like the file of the first image, with the format H:M:S in the first column ONLY, not all scrambled like in the second image. Any suggestions?
SCREENSHOT FOR BAH:
See the K column, it should be column A of the first image, and columns B,C,D,E,F,G,I,and J should stay the same like in image 1.
Download LInk of .csv file: http://www.speedyshare.com/z2jwq/HiSAM1-data-160215-164858.csv
The main problem with your code seems that you're keeping appending to the first row the time of each of the line in the csv, which results in the second image posted in the question.
The idea is to keep track of the different lines and modify just the first element of each line. Also, if you want, you should keep the first line, which indicates the labels of the column. For solving the issue, the code would look like:
import csv
import datetime as dt
import os
File = 'C:/Users/Alan Cedeno/Desktop/Test_Folder/HiSAM1_data_160215_164858.csv'
root, ext = os.path.splitext(File)
output = root + '-new.csv'
with open(File,'r') as csvinput,open(output, 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
reader = csv.reader(csvinput)
rows = [next(reader)]
for line in reader:
line[0] = str(dt.datetime.strptime(line[0],'%m/%d/%Y %H:%M:%S').time())
rows.append(line)
writer.writerows(rows)
Note the list rows has the modified lines from the csvinput.
The resulting output csv file (tested with the first line in the question duplicated) would be
With some simplified data:
#!python3
import csv
import datetime as dt
import os
File = 'data.csv'
root, ext = os.path.splitext(File)
output = root + '-new.csv'
# csv module documents opening with `newline=''` mode in Python 3.
with open(File,'r',newline='') as csvinput,open(output, 'w',newline='') as csvoutput:
writer = csv.writer(csvoutput)
reader = csv.reader(csvinput)
# Copy the header
row = next(reader)
writer.writerow(row)
# Edit the first column of each row.
for row in reader:
row[0] = dt.datetime.strptime(row[0],'%m/%d/%Y %H:%M:%S').time()
writer.writerow(row)
Input:
Time,0.3(/L),0.5(/L)
02/15/2016 13:44:01,88452,16563
02/15/2016 13:44:02,88296,16282
Output:
Time,0.3(/L),0.5(/L)
13:44:01,88452,16563
13:44:02,88296,16282
If actually on Python 2, the csv module documents using binary mode. Replace the with line with:
with open(File,'rb') as csvinput,open(output, 'wb') as csvoutput:
You cannot overwrite a single row in the CSV file. You'll have to write all the rows you want to a new file and then rename it back to the original file name.
Your pattern of usage may fit a database better than a CSV file. Look into the sqlite3 module for a lightweight database.

delete rows by date and add file name column for multiple csv

I have multiple "," delimited csv files with recorded water pipe pressure sensor data, already sorted by date older-newer. For all original files, the first column always contains dates formated as YYYYMMDD. I have looked at similar discussion threads but couldn't find what I need.
Python script to add a new column to every csv file in the directory, where each row of the new column titled as "Pipe" would have a file name, omitting file extension string.
Have the option of specifying a cut off date as YYYYMMDD in order to delete rows in the orginal input file. For example, if some file has dates 20140101 to 20140630, I would like cut out rows of data if their date is < 20140401.
Have the option of either to overwrite the original files after having made these modifications or save each file to a different directory, with file names same as the originals.
Input: PipeRed.csv; Headers: Date,Pressure1,Pressure2,Temperature1,Temperature2 etc,
Output: PipeRed.csv; Headers: Pipe,Date,Pressure1,Pressure2,Temperature1, Temperature2,etc,
I have found some code and modified it a little, but it doesn't delete rows like was described above and adds the file name column last rather than 1st.
import csv
import sys
import glob
import re
for filename in glob.glob(sys.argv[1]):
#def process_file(filename):
# Read the contents of the file into a list of lines.
f = open(filename, 'r')
contents = f.readlines()
f.close()
# Use a CSV reader to parse the contents.
reader = csv.reader(contents)
# Open the output and create a CSV writer for it.
f = open(filename, 'wb')
writer = csv.writer(f)
# Process the header.
writer = csv.writer(f)
writer.writerow( ('Date','Pressure1','Pressure2','Pressure3','Pressure4','Pipe') )
header = reader.next()
header.append(filename.replace('.csv',""))
writer.writerow(header)
# Process each row of the body.
for row in reader:
row.append(filename.replace('.csv',""))
writer.writerow(row)
# Close the file and we're done.
f.close()
This function should be very close to what you want. I've tested it in both Python 2.7.9 and 3.4.2. The initial version I posted had some problems because — as I mention then — it was untested. I'm not sure if you're using Python 2 or 3, but this worked properly in either one.
Another change from the previous version is that the optional keyword date argument's name had been changed from cutoff_date to start_date to better reflect what it is. A cutoff date usually means the last date on which it is possible to do something—the opposite of the way you used it in your question. Also note that any date provided should a string, i.e. start_date='20140401', not as an integer.
One enhancement is that it will now create the output directory if one is specified but doesn't already exist.
import csv
import os
import sys
def open_csv(filename, mode='r'):
""" Open a csv file in proper mode depending on Python verion. """
return (open(filename, mode=mode+'b') if sys.version_info[0] == 2 else
open(filename, mode=mode, newline=''))
def process_file(filename, start_date=None, new_dir=None):
# Read the entire contents of the file into memory skipping rows before
# any start_date given (assuming row[0] is a date column).
with open_csv(filename, 'r') as f:
reader = csv.reader(f)
header = next(reader) # Save first row.
contents = [row for row in reader if start_date and row[0] >= start_date
or not start_date]
# Create different output file path if new_dir was specified.
basename = os.path.basename(filename) # Remove dir name from filename.
output_filename = os.path.join(new_dir, basename) if new_dir else filename
if new_dir and not os.path.isdir(new_dir): # Create directory if necessary.
os.makedirs(new_dir)
# Open the output file and create a CSV writer for it.
with open_csv(output_filename, 'w') as f:
writer = csv.writer(f)
# Add name of new column to header.
header = ['Pipe'] + header # Prepend new column name.
writer.writerow(header)
# Data for new column is the base filename without extension.
new_column = [os.path.splitext( os.path.split(basename)[1] )[0]]
# Process each row of the body by prepending data for new column to it.
writer.writerows((new_column+row for row in contents))

add a new column to an existing csv file

I have a csv file with 5 columns and I want to add data in a 6th column. The data I have is in an array.
Right now, the code that I have will insert the data I would want in the 6th column only AFTER all the data that already exists in the csv file.
For instance I have:
wind, site, date, time, value
10, 01, 01-01-2013, 00:00, 5.1
89.6 ---> this is the value I want to add in a 6th column but it puts it after all the data from the csv file
Here is the code I am using:
csvfile = 'filename'
with open(csvfile, 'a') as output:
writer = csv.writer(output, lineterminator='\n')
for val in data:
writer.writerow([val])
I thought using 'a' would append the data in a new column, but instead it just puts it after ('under') all the other data... I don't know what to do!
Appending writes data to the end of a file, not to the end of each row.
Instead, create a new file and append the new value to each row.
csvfile = 'filename'
with open(csvfile, 'r') as fin, open('new_'+csvfile, 'w') as fout:
reader = csv.reader(fin, newline='', lineterminator='\n')
writer = csv.writer(fout, newline='', lineterminator='\n')
if you_have_headers:
writer.writerow(next(reader) + [new_heading])
for row, val in zip(reader, data)
writer.writerow(row + [data])
On Python 2.x, remove the newline='' arguments and change the filemodes from 'r' and 'w' to 'rb' and 'wb', respectively.
Once you are sure this is working correctly, you can replace the original file with the new one:
import os
os.remove(csvfile) # not needed on unix
os.rename('new_'+csvfile, csvfile)
csv module does not support writing or appending column. So the only thing you can do is: read from one file, append 6th column data, and write to another file. This shows as below:
with open('in.txt') as fin, open('out.txt', 'w') as fout:
index = 0
for line in fin:
fout.write(line.replace('\n', ', ' + str(data[index]) + '\n'))
index += 1
data is a int list.
I test these codes in python, it runs fine.
We have a CSV file i.e. data.csv and its contents are:
#data.csv
1,Joi,Python
2,Mark,Laravel
3,Elon,Wordpress
4,Emily,PHP
5,Sam,HTML
Now we want to add a column in this csv file and all the entries in this column should contain the same value i.e. Something text.
Example
from csv import writer
from csv import reader
new_column_text = 'Something text'
with open('data.csv', 'r') as read_object, \
open('data_output.csv', 'w', newline='') as write_object:
csv_reader = reader(read_object)
csv_writer = writer(write_object)
for row in csv_reader:
row.append(new_column_text)
csv_writer.writerow(row)
Output
#data_output.csv
1,Joi,Python,Something text
2,Mark,Laravel,Something text
3,Elon,Wordpress,Something text
4,Emily,PHP,Something text
5,Sam,HTML,Something text
The append mode of opening files is meant to add data to the end of a file. what you need to do is provide random access to your file writing. you need to use the seek() method
you can see and example here:
http://www.tutorialspoint.com/python/file_seek.htm
or read the python docs on it here: https://docs.python.org/2.4/lib/bltin-file-objects.html which isn't terribly useful
if you want to add to the end of a column you may want to open the file read a line to figure out it's length then seek to the end.

Categories

Resources