Copying Headers from CSV in Python not Working-- Delimiter Issue - python

I'm trying to copy headers from one Python file to another and it's splitting the headers into individual characters, one character for a column. I'm not sure why.
I've read through StackOverflow but couldn't find a question/solution to this problem.
first.csv file data
Date,Data
1/2/2019,a
12/1/2018,b
11/3/2018,c
Python Code
import csv
from datetime import datetime, timedelta
date_ = datetime.strftime(datetime.now(),'%Y_%m_%d')
with open('first.csv', 'r') as full_file, open('second.csv' + '_' + date_ + '.csv', 'w') as past_10_days:
writer = csv.writer(past_10_days)
writer.writerow(next(full_file)) #copy headers over from original file
for row in csv.reader(full_file): #run through remaining rows
if datetime.strptime(row[0],'%m/%d/%Y') > datetime.now() - timedelta(days=10): #write rows where timestamp is greater than today - 10
writer.writerow(row)
Result I get:
D,a,t,e,D,a,t,a
1/2/2019,a
I'd like the result to just be
Date,Data
1/2/2019,a
Am I just missing setting an option? This is Python 3+
Thanks!

Change
writer.writerow(next(full_file))
To
writer.writerow(next(csv.reader(full_file)))
Your code is reading full_file as a text file, not as a CSV, so you'll just get the characters.
Ideally, as roganjosh pointed out, you should simply define the reader once, so the code should look like this:
reader = csv.reader(full_file)
writer.writerow(next(reader))
for row in reader:
if datetime.strptime(row[0],'%m/%d/%Y') > datetime.now() - timedelta(days=10):
writer.writerow(row)

Related

How to convert .dat to .csv using python? the data is being expressed in one column

Hi i'm trying to convert .dat file to .csv file.
But I have a problem with it.
I have a file .dat which looks like(column name)
region GPS name ID stop1 stop2 stopname1 stopname2 time1 time2 stopgps1 stopgps2
it delimiter is a tab.
so I want to convert dat file to csv file.
but the data keeps coming out in one column.
i try to that, using next code
import pandas as pd
with open('file.dat', 'r') as f:
df = pd.DataFrame([l.rstrip() for l in f.read().split()])
and
with open('file.dat', 'r') as input_file:
lines = input_file.readlines()
newLines = []
for line in lines:
newLine = line.strip('\t').split()
newLines.append(newLine)
with open('file.csv', 'w') as output_file:
file_writer = csv.writer(output_file)
file_writer.writerows(newLines)
But all the data is being expressed in one column.
(i want to express 15 column, 80,000 row, but it look 1 column, 1,200,000 row)
I want to convert this into a csv file with the original data structure.
Where is a mistake?
Please help me... It's my first time dealing with data in Python.
If you're already using pandas, you can just use pd.read_csv() with another delimiter:
df = pd.read_csv("file.dat", sep="\t")
df.to_csv("file.csv")
See also the documentation for read_csv and to_csv

How do you add a header to an excel csv file using python

So I'm trying to add a header to a csv file dynamically. My current code looks like the following:
import csv
from datetime import datetime
import pandas as pd
rows = []
with open(r'Test_Timestamp.csv', 'r', newline='') as file:
with open(r'Test_Timestamp_Result.csv', 'w', newline='') as file2:
reader = csv.reader(file, delimiter=',')
for row in reader:
rows.append(row)
file_write = csv.writer(file2)
for val in rows:
current_date_time = datetime.now().replace(hour=0, minute=0, second=0, microsecond=0)
val.insert(0, current_date_time)
file_write.writerow(val)
Currently how this works is it inserts a new timestamp at column A which is exactly what I want it to do, as I want everything to be pushed as I'll be working with csv files with various different number of columns.
What I'm having trouble with is, how am I able to add a column header? Currently a timestamp is created next to the header. I would want to create a new header named: Execution_Date
I have looked at pandas as a solution but from the documentation I've seen the examples given looks like its a set of column headers already pre-determined. I've tried inserting a column header with df.insert(0, "Execution_Date", current_date_time) but gives me an error when trying to accomplish this.
I know I'm fairly close to doing this but I'm running into errors. Is there a way to do this dynamically so it automatically does this with various different csv files and number of different columns in each csv file, etc.? The current output looks like:
What I want the final result to look like is:
Any help with this would be greatly appreciated! I'm going to continue to see if I can solve this in the meantime, but I'm at a wall with how to proceed.
If the end result is something that excel can read like maybe a csv you can likely bypass pandas altogether:
Edit: adding support for existing titles
Given a simple csv like:
Title,Other
Geeks1,foo
Geeks2,bar
Then you might use:
import contextlib
import csv
from datetime import datetime
with contextlib.ExitStack() as stack:
file_in = open('Test_Timestamp.csv', "r", encoding="utf-8")
file_out = open('Test_Timestamp_Result.csv', "w", encoding="utf-8", newline="")
reader = csv.reader(file_in, delimiter=',')
writer = csv.writer(file_out)
writer.writerow(["Execution_Date"] + next(reader))
writer.writerows(
[datetime.now().replace(hour=0, minute=0, second=0, microsecond=0)] + row
for row in reader
)
to give you a file like:
Execution_Date,Title,Other
2022-02-11 00:00:00,Geeks1,foo
2022-02-11 00:00:00,Geeks2,bar
One way to do this is to utilize to_csv().
Example:
# importing python package
import pandas as pd
# read contents of csv file
file = pd.read_csv("gfg.csv")
print("\nOriginal file:")
print(file)
# adding header
headerList = ['id', 'name', 'profession']
# converting data frame to csv
file.to_csv("gfg2.csv", header=headerList, index=False)
# display modified csv file
file2 = pd.read_csv("gfg2.csv")
print('\nModified file:')enter code here
print(file2)

Parsing column from CSV and replace a value in text file with the new value

I have one CSV file, and I want to extract the first column of it. My CSV file is like this:
Device ID;SysName;Entry address(es);IPv4 address;Platform;Interface;Port ID (outgoing port);Holdtime
PE1-PCS-RANCAGUA;;;192.168.203.153;cisco CISCO7606 Capabilities Router Switch IGMP;TenGigE0/5/0/1;TenGigabitEthernet3/3;128 sec
P2-CORE-VALPO.cisco.com;P2-CORE-VALPO.cisco.com;;200.72.146.220;cisco CRS Capabilities Router;TenGigE0/5/0/0;TenGigE0/5/0/4;128 sec
PE2-CONCE;;;172.31.232.42;Cisco 7204VXR Capabilities Router;GigabitEthernet0/0/0/14;GigabitEthernet0/3;153 sec
P1-CORE-CRS-CNT.entel.cl;P1-CORE-CRS-CNT.entel.cl;;200.72.146.49;cisco CRS Capabilities Router;TenGigE0/5/0/0;TenGigE0/1/0/6;164 sec
For that purpose I use the following code that I saw here:
import csv
makes = []
with open('csvoutput/topologia.csv', 'rb') as f:
reader = csv.reader(f)
# next(reader) # Ignore first row
for row in reader:
makes.append(row[0])
print makes
Then I want to replace into a textfile a particular value for each one of the values of the first column and save it as a new file.
Original textfile:
PLANNED.IMPACTO_ID = IMPACTO.ID AND
PLANNED.ESTADOS_ID = ESTADOS_PLANNED.ID AND
TP_CLASIFICACION.ID = TP_DATA.ID_TP_CLASIFICACION AND
TP_DATA.PLANNED_ID = PLANNED.ID AND
PLANNED.FECHA_FIN >= CURDATE() - INTERVAL 1 DAY AND
PLANNED.DESCRIPCION LIKE '%P1-CORE-CHILLAN%’;
Expected output:
PLANNED.IMPACTO_ID = IMPACTO.ID AND
PLANNED.ESTADOS_ID = ESTADOS_PLANNED.ID AND
TP_CLASIFICACION.ID = TP_DATA.ID_TP_CLASIFICACION AND
TP_DATA.PLANNED_ID = PLANNED.ID AND
PLANNED.FECHA_FIN >= CURDATE() - INTERVAL 1 DAY AND
PLANNED.DESCRIPCION LIKE 'FIRST_COLUMN_VALUE’;
And so on for every value in the first column, and save it as a separate file.
How can I do this? Thank you very much for your help.
You could just read the file, apply changes, and write the file back again. There is no efficient way to edit a file (inserting characters is not efficiently possible), you can only rewrite it.
If your file is going to be big, you should not keep the whole table in memory.
import csv
makes = []
with open('csvoutput/topologia.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
makes.append(row)
# Apply changes in makes
with open('csvoutput/topologia.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(makes);

Python- Can't overwrite first column of .csv file with new time stamp

I have a .csv file (see image):
In the image there is a time column with datetime strings, I have a program that takes this column and only reads the times H:M:S. Yet, not only in my program I am attempting to take the column to read only the time stamp H:M:S , but I am also attempting to overwrite the time column of the first file and replace it with only the H:M:S time stamp onto a the new .csv with the following code.
CODE:
import csv
import datetime as dt
import os
File = 'C:/Users/Alan Cedeno/Desktop/Test_Folder/HiSAM1_data_160215_164858.csv'
root, ext = os.path.splitext(File)
output = root + '-new.csv'
with open(File,'r') as csvinput,open(output, 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
reader = csv.reader(csvinput)
all = []
row = next(reader)
for line in reader:
row.append(dt.datetime.strptime(line[0],'%m/%d/%Y %H:%M:%S').time())
all.append(row)
for row in reader:
row.append(row[0])
all.append(row)
writer.writerows(all)
The program works, and takes the datetime strings and overwrites the string with the time stamp H:M:S in a new .csv file. However, here is the problem, the output file instead of replacing the time column it replaced every column obtaining an output file that looks like this. See 2nd image:
At this point I don' t really know how to make the new output file to look like the file of the first image, with the format H:M:S in the first column ONLY, not all scrambled like in the second image. Any suggestions?
SCREENSHOT FOR BAH:
See the K column, it should be column A of the first image, and columns B,C,D,E,F,G,I,and J should stay the same like in image 1.
Download LInk of .csv file: http://www.speedyshare.com/z2jwq/HiSAM1-data-160215-164858.csv
The main problem with your code seems that you're keeping appending to the first row the time of each of the line in the csv, which results in the second image posted in the question.
The idea is to keep track of the different lines and modify just the first element of each line. Also, if you want, you should keep the first line, which indicates the labels of the column. For solving the issue, the code would look like:
import csv
import datetime as dt
import os
File = 'C:/Users/Alan Cedeno/Desktop/Test_Folder/HiSAM1_data_160215_164858.csv'
root, ext = os.path.splitext(File)
output = root + '-new.csv'
with open(File,'r') as csvinput,open(output, 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
reader = csv.reader(csvinput)
rows = [next(reader)]
for line in reader:
line[0] = str(dt.datetime.strptime(line[0],'%m/%d/%Y %H:%M:%S').time())
rows.append(line)
writer.writerows(rows)
Note the list rows has the modified lines from the csvinput.
The resulting output csv file (tested with the first line in the question duplicated) would be
With some simplified data:
#!python3
import csv
import datetime as dt
import os
File = 'data.csv'
root, ext = os.path.splitext(File)
output = root + '-new.csv'
# csv module documents opening with `newline=''` mode in Python 3.
with open(File,'r',newline='') as csvinput,open(output, 'w',newline='') as csvoutput:
writer = csv.writer(csvoutput)
reader = csv.reader(csvinput)
# Copy the header
row = next(reader)
writer.writerow(row)
# Edit the first column of each row.
for row in reader:
row[0] = dt.datetime.strptime(row[0],'%m/%d/%Y %H:%M:%S').time()
writer.writerow(row)
Input:
Time,0.3(/L),0.5(/L)
02/15/2016 13:44:01,88452,16563
02/15/2016 13:44:02,88296,16282
Output:
Time,0.3(/L),0.5(/L)
13:44:01,88452,16563
13:44:02,88296,16282
If actually on Python 2, the csv module documents using binary mode. Replace the with line with:
with open(File,'rb') as csvinput,open(output, 'wb') as csvoutput:
You cannot overwrite a single row in the CSV file. You'll have to write all the rows you want to a new file and then rename it back to the original file name.
Your pattern of usage may fit a database better than a CSV file. Look into the sqlite3 module for a lightweight database.

delete rows by date and add file name column for multiple csv

I have multiple "," delimited csv files with recorded water pipe pressure sensor data, already sorted by date older-newer. For all original files, the first column always contains dates formated as YYYYMMDD. I have looked at similar discussion threads but couldn't find what I need.
Python script to add a new column to every csv file in the directory, where each row of the new column titled as "Pipe" would have a file name, omitting file extension string.
Have the option of specifying a cut off date as YYYYMMDD in order to delete rows in the orginal input file. For example, if some file has dates 20140101 to 20140630, I would like cut out rows of data if their date is < 20140401.
Have the option of either to overwrite the original files after having made these modifications or save each file to a different directory, with file names same as the originals.
Input: PipeRed.csv; Headers: Date,Pressure1,Pressure2,Temperature1,Temperature2 etc,
Output: PipeRed.csv; Headers: Pipe,Date,Pressure1,Pressure2,Temperature1, Temperature2,etc,
I have found some code and modified it a little, but it doesn't delete rows like was described above and adds the file name column last rather than 1st.
import csv
import sys
import glob
import re
for filename in glob.glob(sys.argv[1]):
#def process_file(filename):
# Read the contents of the file into a list of lines.
f = open(filename, 'r')
contents = f.readlines()
f.close()
# Use a CSV reader to parse the contents.
reader = csv.reader(contents)
# Open the output and create a CSV writer for it.
f = open(filename, 'wb')
writer = csv.writer(f)
# Process the header.
writer = csv.writer(f)
writer.writerow( ('Date','Pressure1','Pressure2','Pressure3','Pressure4','Pipe') )
header = reader.next()
header.append(filename.replace('.csv',""))
writer.writerow(header)
# Process each row of the body.
for row in reader:
row.append(filename.replace('.csv',""))
writer.writerow(row)
# Close the file and we're done.
f.close()
This function should be very close to what you want. I've tested it in both Python 2.7.9 and 3.4.2. The initial version I posted had some problems because — as I mention then — it was untested. I'm not sure if you're using Python 2 or 3, but this worked properly in either one.
Another change from the previous version is that the optional keyword date argument's name had been changed from cutoff_date to start_date to better reflect what it is. A cutoff date usually means the last date on which it is possible to do something—the opposite of the way you used it in your question. Also note that any date provided should a string, i.e. start_date='20140401', not as an integer.
One enhancement is that it will now create the output directory if one is specified but doesn't already exist.
import csv
import os
import sys
def open_csv(filename, mode='r'):
""" Open a csv file in proper mode depending on Python verion. """
return (open(filename, mode=mode+'b') if sys.version_info[0] == 2 else
open(filename, mode=mode, newline=''))
def process_file(filename, start_date=None, new_dir=None):
# Read the entire contents of the file into memory skipping rows before
# any start_date given (assuming row[0] is a date column).
with open_csv(filename, 'r') as f:
reader = csv.reader(f)
header = next(reader) # Save first row.
contents = [row for row in reader if start_date and row[0] >= start_date
or not start_date]
# Create different output file path if new_dir was specified.
basename = os.path.basename(filename) # Remove dir name from filename.
output_filename = os.path.join(new_dir, basename) if new_dir else filename
if new_dir and not os.path.isdir(new_dir): # Create directory if necessary.
os.makedirs(new_dir)
# Open the output file and create a CSV writer for it.
with open_csv(output_filename, 'w') as f:
writer = csv.writer(f)
# Add name of new column to header.
header = ['Pipe'] + header # Prepend new column name.
writer.writerow(header)
# Data for new column is the base filename without extension.
new_column = [os.path.splitext( os.path.split(basename)[1] )[0]]
# Process each row of the body by prepending data for new column to it.
writer.writerows((new_column+row for row in contents))

Categories

Resources