Add a date column to csv using python - python

I have the following code which writes a data output to a csv file. I want to add a date variable at the end of each row using the yesterday.strftime variable that i am using in creating the filename. For example:
Thanks!
my current output is like:
columnA
1
2
and I want to add the following column:
Date
2/5/2016
2/5/2016
.
.
.
CODE::
filepath = 'C:\\test\\'
filename = yesterday.strftime('%Y-%m-%d') + '_' + 'test.csv'
f = open( filename, 'wt')
writer = csv.writer(f, lineterminator='\n')
header = [h['name'][3:] for h in results.get('columnHeaders')]
writer.writerow(header)
print(''.join('%30s' % h for h in header))
# Write data table.
if results.get('rows', []):
for row in results.get('rows'):
writer.writerow(row)
print(''.join('%30s' % r for r in row))
else:
print ('No Rows Found')
f.close()

In [26]: import pandas as pd
In [27]: import datetime
In [28]: a = pd.read_csv('a.csv')
In [29]: a
Out[29]:
columnA
0 1
1 2
In [30]: a['Date'] = [datetime.date.today()]*len(a)
In [31]: a
Out[31]:
columnA Date
0 1 2016-02-05
1 2 2016-02-05
In [32]: a.to_csv('adate.csv')
Generally: https://www.airpair.com/python/posts/top-mistakes-python-big-data-analytics

Related

how do i compare two rows from one column in a csv file and create a new column accordingly in Python

I have a CSV file that looks like this:
id data
1 abc
1 htf
2 kji
3 wdc
3 vnc
3 acd
4 mef
5 klm
5 def
... and so on
what I want to do is compare the id from the current row to the previous one, if it's the same then I want to create, in a new CSV file, a new column containing the data from that row. so here's how I want the output CSV file: to look like:
id data1 data2 data3
1 abc htf
2 kji
3 wdc vnc acd
4 mef
5 klm def
is it possible? or is it better to do it in the same CSV file?
This should help
from collections import defaultdict
def manipulate_file():
dictionary = defaultdict(list)
with open("sample.csv", "r") as f:
data = f.read()
data = data.split("\n")
for i in range(len(data)-1):
print(data)
id_, data_ = data[i].split(",")
dictionary[id_].append(data_)
return dictionary
def rewrite(dictionary):
file_ = ""
for id_ in dictionary.keys():
row = str(id_)
for word in dictionary[id_]:
row += "," + word
file_ += row + "\n"
return file_
def main():
dictionary = manipulate_file()
file_ = rewrite(dictionary)
with open("output.csv", "w") as f:
f.write(file_)
main()

Create and Append CSV file in Python with header once

I am creating CSV file using Python code.
I am able to create and store data . But I am unable to Add Header i.e column names in my csv file.
I have Data frame created .
Code for creating and appending csv is as follows:
main_app.py:
def handle_data(data):
msg5 = data.replace("\n"," ").replace("\r"," ").split(",")
print(msg5)
d = dict(s.split(':') for s in msg5)
data_frame = pd.DataFrame(list(d.items())).transpose()
data_frame.columns = data_frame.iloc[0]
data_frame = data_frame.reindex(data_frame.index.drop(0))
print(data_frame)
filename = (value_text + time2 + ".csv")
#print(filename)
fields = list(data_frame.columns)
with open(filename, 'a',newline='') as writeFile:
writeFile = csv.writer(writeFile)
writeFile.writerow(fields)
writeFile.writerows(data_frame.values)
def read_from_port(ser):
while True:
reading = ser.readline()
handle_data(reading.decode("utf-8"))```
read_from_port(serial_port)
This code adds the column names every iteration:
Output I get:
A B C
0 0 0
A B C
1 1 1
.......
Output I need is:
A B C
0 0 0
1 1 1
2 2 2
.....
Can Some one help me out.
Thanks in Advance.
csv.DictWriter can do that for you!
csvfile = csv.DictWriter(writeFile, fieldnames=fields)
csvfile.writeheader()
csvfile.writerows(data_frame.values)

Two type of headers txt to Pandas dataframe

Let's say I have a .txt file like that:
#D=H|ID|STRINGIDENTIFIER
#D=T|SEQ|DATETIME|VALUE
H|879|IDENTIFIER1
T|1|1569972384|7
T|2|1569901951|9
T|3|1569801600|8
H|892|IDENTIFIER2
T|1|1569972300|109
T|2|1569907921|101
T|3|1569803600|151
And I need to create a dataframe like this:
IDENTIFIER SEQ DATETIME VALUE
879_IDENTIFIER1 1 1569972384 7
879_IDENTIFIER1 2 1569901951 9
879_IDENTIFIER1 3 1569801600 8
892_IDENTIFIER2 1 1569972300 109
892_IDENTIFIER2 2 1569907921 101
892_IDENTIFIER2 3 1569803600 151
What would be the possible code?
A basic way to do it might just to be to process the text file and convert it into a csv before using the read_csv function in pandas. Assuming the file you want to process is as consistent as the example:
import pandas as pd
with open('text.txt', 'r') as file:
fileAsRows = file.read().split('\n')
pdInput = 'IDENTIFIER,SEQ,DATETIME,VALUE\n' #addHeader
for row in fileAsRows:
cols = row.split('|') #breakup row
if row.startswith('H'): #get identifier info from H row
Identifier = cols[1]+'_'+cols[2]
if row.startswith('T'): #get other info from T row
Seq = cols[1]
DateTime = cols[2]
Value = cols[3]
tempList = [Identifier,Seq, DateTime, Value]
pdInput += (','.join(tempList)+'\n')
with open("pdInput.csv", "a") as file:
file.write(pdInput)
## import into pandas
df = pd.read_csv("pdInput.csv")

Python to combine Excel spreadsheets

Hello all…a question in using Panda to combine Excel spreadsheets.
The problem is that, sequence of columns are lost when they are combined. If there are more files to combine, the format will be even worse.
If gives an error message, if the number of files are big.
ValueError: column index (256) not an int in range(256)
What I am using is below:
import pandas as pd
df = pd.DataFrame()
for f in ['c:\\1635.xls', 'c:\\1644.xls']:
data = pd.read_excel(f, 'Sheet1')
data.index = [os.path.basename(f)] * len(data)
df = df.append(data)
df.to_excel('c:\\CB.xls')
The original files and combined look like:
what's the best way to combine great amount of such similar Excel files?
thanks.
I usually use xlrd and xlwt:
#!/usr/bin/env python
# encoding: utf-8
import xlwt
import xlrd
import os
current_file = xlwt.Workbook()
write_table = current_file.add_sheet('sheet1', cell_overwrite_ok=True)
key_list = [u'City', u'Country', u'Received Date', u'Shipping Date', u'Weight', u'1635']
for title_index, text in enumerate(key_list):
write_table.write(0, title_index, text)
file_list = ['1635.xlsx', '1644.xlsx']
i = 1
for name in file_list:
data = xlrd.open_workbook(name)
table = data.sheets()[0]
nrows = table.nrows
for row in range(nrows):
if row == 0:
continue
for index, context in enumerate(table.row_values(row)):
write_table.write(i, index, context)
i += 1
current_file.save(os.getcwd() + '/result.xls')
Instead of data.index = [os.path.basename(f)] * len(data) you should use df.reset_index().
For example:
1.xlsx:
a b
1 1
2 2
3 3
2.xlsx:
a b
4 4
5 5
6 6
code:
df = pd.DataFrame()
for f in [r"C:\Users\Adi\Desktop\1.xlsx", r"C:\Users\Adi\Desktop\2.xlsx"]:
data = pd.read_excel(f, 'Sheet1')
df = df.append(data)
df.reset_index(inplace=True, drop=True)
df.to_excel('c:\\CB.xls')
cb.xls:
a b
0 1 1
1 2 2
2 3 3
3 4 4
4 5 5
5 6 6
If you don't want the dataframe's index to be in the output file, you can use df.to_excel('c:\\CB.xls', index=False).

CSV files and Python

I'm working on a Python script that should merge some columns of some CSV files (a lot, something around 200 files).
All the files look like:
Timestamp; ...; ...; ...; Value; ...
date1;...;...;...; FirstValue;...
date2;...;...;...; SecondValue;...
and so on.
From the first file I want to extract the timestamp and the column Value. From the other files I need only the column Values.
My script for now is:
#!/usr/bin/python
import csv
import os, sys
# Open a file
path = "Z:/myfolder"
dirs = os.listdir( path )
#Conto il numero di file nella cartella
print len(dirs)
#Assegno il nome del primo file
file = dirs[0]
#Apro il primo file per la lettura di timestamp e primo valore (Value)
primofile = csv.reader(open(file, 'rb'), delimiter=";", quotechar='|')
timestamp, firstValue = [], []
#Per ogni riga del primofile
for row in primofile:
#Copio timestamp
timestamp.append(row[2])
#e Value
firstValue.append(row[15])
with open("provacript.csv", 'wb') as f:
writer = csv.writer(f, delimiter=';')
i = 0
while i < len(timestamp):
writer.writerow([timestamp[i]] + [firstValue[i]])
i = i+1
So in "provascript.csv" I have the timestamp and the first column with my values from the first file. The next step is to open, one by one, the files in the list "dirs", read the column "Values" (the 15th column), save this column in an array and write it in "provascript.csv".
My code is:
for file in dirs:
data = csv.reader(open(file, 'rb'), delimiter=";", quotechar='|')
column = []
for row in data:
column.append(row[15])
In the array "column" I should have the values. I have to add this values in a new column in "provascript.csv" and move on doing the same thing with all the files. How can I do that?
I would like to have something like
TimestampFromFirstFile;ValueFromFirstFile;ValueFromSecondFile;ValueFromThirdFile;...
date1;value;value,value;...
date2;value;value;value;...
date3;value;value;value;...
So far so good. I fixed it (thanks), but instead of reading and writing Value in the first row I would like to write a part of the name. Instead of having Timestamp;Value;Value;Value I would prefer Timestamp;Temperature1;Temperature2;Presence1;Presence2.
How can I do it?
I should create the full structure and finally i will save it in the output file (assuming that files are ordered between them)
#create the full structure: output_rows
primofile = csv.reader(open(file, 'rb'), delimiter=";", quotechar='|')
output_rows = []
for row in primofile:
output_rows.append([row[2], row[15]])
Once we have an ordered list of lists, complete them with the other files
for file in dirs:
data = csv.reader(open(file, 'rb'), delimiter=";", quotechar='|')
column = []
for idx,row in enumerate(data):
output_rows[idx].append(row[15])
Finally save it to a file
with open("output.csv", 'wb') as f:
writer = csv.writer(f, delimiter=';')
for row in output_rows:
writer.writerow(row)
You can do it with Pandas :
file1 = pd.read_csv("file1", index_col=0, sep=";", skipinitialspace=1)
file2 = pd.read_csv("file2", index_col=0, sep=";", skipinitialspace=1)
file3 = pd.read_csv("file3", index_col=0, sep=";", skipinitialspace=1)
here, you have plenty of options, notably to parse date while reading your csv.
file 1 being :
... ....1 ....2 Value ....3
Timestamp
date1 ... ... ... FirstValue ...
date2 ... ... ... SecondValue ...
f1 = pd.DataFrame(file1.Value)
f2 = pd.DataFrame(file2.Value)
f3 = pd.DataFrame(file3.Value)
f2
Value
Timestamp
date1 AAA
date2 BBB
f3
Value
Timestamp
date1 456
date2 123
Then you define a function for recursive merge :
def recursive_merge(list_df):
suffixe = range(1,len(list_df)+1)
merged = list_df[0]
for i in range(1,len(list_df)):
merged = merged.merge(list_df[i], left_index=True, right_index=True,
suffixes=('_%s' %suffixe[i-1], '_%s' %suffixe[i]))
if len(list_df)%2 !=0 :
merged.rename(
columns = {'Value':"Value_%s" %suffixe[i]},
inplace = True) # if number of recursive merge is odd
return merged
and call :
recursive_merge([f1,f2,f3])
Output :
Value_1 Value_2 Value_3
Timestamp
date1 FirstValue AAA 456
date2 SecondValue BBB 123
And then you can easily write that dataframe with :
recursive_merge([f1,f2,f3]).to_csv("output.csv")
Of course if you have more than 3 files, you can make for-loops and or functions to open files and end up with a list like [f1,f2,f3,...f200]
Hope this helps

Categories

Resources