I am creating CSV file using Python code.
I am able to create and store data . But I am unable to Add Header i.e column names in my csv file.
I have Data frame created .
Code for creating and appending csv is as follows:
main_app.py:
def handle_data(data):
msg5 = data.replace("\n"," ").replace("\r"," ").split(",")
print(msg5)
d = dict(s.split(':') for s in msg5)
data_frame = pd.DataFrame(list(d.items())).transpose()
data_frame.columns = data_frame.iloc[0]
data_frame = data_frame.reindex(data_frame.index.drop(0))
print(data_frame)
filename = (value_text + time2 + ".csv")
#print(filename)
fields = list(data_frame.columns)
with open(filename, 'a',newline='') as writeFile:
writeFile = csv.writer(writeFile)
writeFile.writerow(fields)
writeFile.writerows(data_frame.values)
def read_from_port(ser):
while True:
reading = ser.readline()
handle_data(reading.decode("utf-8"))```
read_from_port(serial_port)
This code adds the column names every iteration:
Output I get:
A B C
0 0 0
A B C
1 1 1
.......
Output I need is:
A B C
0 0 0
1 1 1
2 2 2
.....
Can Some one help me out.
Thanks in Advance.
csv.DictWriter can do that for you!
csvfile = csv.DictWriter(writeFile, fieldnames=fields)
csvfile.writeheader()
csvfile.writerows(data_frame.values)
Related
I have a CSV file that looks like this:
id data
1 abc
1 htf
2 kji
3 wdc
3 vnc
3 acd
4 mef
5 klm
5 def
... and so on
what I want to do is compare the id from the current row to the previous one, if it's the same then I want to create, in a new CSV file, a new column containing the data from that row. so here's how I want the output CSV file: to look like:
id data1 data2 data3
1 abc htf
2 kji
3 wdc vnc acd
4 mef
5 klm def
is it possible? or is it better to do it in the same CSV file?
This should help
from collections import defaultdict
def manipulate_file():
dictionary = defaultdict(list)
with open("sample.csv", "r") as f:
data = f.read()
data = data.split("\n")
for i in range(len(data)-1):
print(data)
id_, data_ = data[i].split(",")
dictionary[id_].append(data_)
return dictionary
def rewrite(dictionary):
file_ = ""
for id_ in dictionary.keys():
row = str(id_)
for word in dictionary[id_]:
row += "," + word
file_ += row + "\n"
return file_
def main():
dictionary = manipulate_file()
file_ = rewrite(dictionary)
with open("output.csv", "w") as f:
f.write(file_)
main()
I am reading multiple csv files and combine it in one csv file. The desired outcome of the combined data looks like the following:
0 4 6 8 10 12
1 2 5 4 2 1
5 3 0 1 5 10
....
But in the following code, I intend the column to go from 0,4,6,8,10,12.
for indx, file in enumerate(files_File1):
if file.endswith('csv'): #reading csv filed in the designated folder
filepath = os.path.join(folder_File1, file) #reading csv filed in the designated folder
current = pd.read_csv(filepath, header=None) #reading csv filed in the designated folder
if indx == 0:
mydata_File1 = current.copy()
mydata_File1.columns.values[1] = 4
print(mydata_File1.columns.values)
else:
mydata_File1[2*indx+4] = current.iloc[:,1]
print(mydata_File1.columns.values)
But instead, the outcome looks like this where the column goes from 0,2,4,6,8,10,12.
0 4 2 6 8 10 12
1 2 5 4 2 1
5 3 0 1 5 10
....
I am not quite sure what causes the column named "2".
Any idea?
If there is some reason you need panda, then this will work. Your code references mydata_File1.columns.values which is the name of the columns, not the value in the columns. If this doesn't answer your question, then please provide more complete answer per #juanpa.arrivillaga comment.
#! python3
import os
import pandas as pd
import glob
folder_File1 = r"C:\Users\Public\Documents\Python\CombineCSVFiles"
csv_only = r"\*.csv"
files_File1 = glob.glob(f'{folder_File1}{csv_only}')
new_csv = f'{folder_File1}\\newcsv.csv'
mydata_File1 = []
for indx, file in enumerate(files_File1):
if file == new_csv:
pass
else:
current = pd.read_csv(file, header=None) #reading csv filed in the designated folder
print (current)
if indx == 0:
mydata_File1 = current.copy()
print(mydata_File1.values)
else:
pass
mydata_File1 = mydata_File1.append(current, ignore_index=True)
print(mydata_File1.values)
mydata_File1.to_csv(new_csv)
If you are really just trying to combine .csv files, no need for panda.
#! python3
import glob
folder_File1 = r"C:\Users\Public\Documents\Python\CombineCSVFiles"
csv_only = r"\*.csv"
files_File1 = glob.glob(f'{folder_File1}{csv_only}')
new_csv = f'{folder_File1}\\newcsv.csv'
lines = []
for file in files_File1:
with open(file) as filein:
if filein.name == new_csv:
pass
else:
for line in filein:
line = line.strip() # or some other preprocessing
lines.append(line) # storing everything in memory!
with open(new_csv, 'w') as out_file:
out_file.writelines(line + u'\n' for line in lines)
I have 13 files with the extension .Las. Each has 80 columns and 180 thousand lines.
I need to read my files sequentially one after the other: the first, then the second, and so on.
Next is my script, in which I process data from files.
At the end, the program must output the data to the file with the same extension.
Thank you in advance for your response!!!
And i programming in pandas, Jupyter notedook
cols = ['IK05', 'IK20', 'DA20', 'LLS', 'LLD', 'STP']
data = pd.read_table("data/1.las", delim_whitespace = True,na_values = '-999.25', index_col=False)
ndata = data.STP_AX.as_matrix(columns=None)
nstop = 1
stop = 1
for i in range(len(ndata)):
if(ndata[i]>0.1):
stop = 0
ndata[i] = nstop
else:
if(stop == 0):
stop = 1
nstop = nstop + 1
nstop
data.STP = ndata
data.STP = ndata
df = data[cols]
df1 = df.groupby('STP')
df1.head()
dfp = pd.DataFrame()
for name, group in df1:
df1 = df.groupby('STP')
df1.head()
dfp = pd.DataFrame()
for name, group in df1:
#print(name)
#print(group)
k,p=stats.mstats.normaltest(group[5:-5])
#print(p)
dfp[name] = p
Assuming you don't want to create variables for each file name, write them all in a dict, I'm giving an example using 3 files-
d = ["file1.las", "file2.las", "file3.las"]
Assuming your output file is output.Las
open the file using w+ that allows for writing and appending a file
output = open("output.las", 'w+')
now, you can use a for loop to open each file, process it and then output the data into the output file
for i in d:
file = open(i, 'r')
contents = file.read()
*your processing here*
output.write(processedData)
file.close()
Finally, you might want to close the output file:
output.close()
I have a csv file, which has only a single column , which acts as my input.
I use that input to find my outputs. I have multiple outputs and I need those outputs in another csv file.
Can anyone please suggest me the ways on how to do it ?
Here is the code :
import urllib.request
jd = {input 1}
//
Some Codes to find output - a,b,c,d,e
//
** Code to write output to a csv file.
** Repeat the code with next input of input csv file.
Input CSV File has only a single column and is represented below:
1
2
3
4
5
Output would in a separate csv in a given below format :
It would be in multiple rows and multiple columns format.
a b c d e
Here is a simple example:
The data.csv is a csv with one column and multiple rows.
The results.csv contain the mean and median of the input and is a csv with 1 row and 2 columns (mean is in 1st column and median in 2nd column)
Example:
import numpy as np
import pandas as pd
import csv
#load the data
data = pd.read_csv("data.csv", header=None)
#calculate things for the 1st column that has the data
calculate_mean = [np.mean(data.loc[:,0])]
calculate_median = [np.median(data.loc[:,0])]
results = [calculate_mean, calculate_median]
#write results to csv
row = []
for result in results:
row.append(result)
with open("results.csv", "wb") as file:
writer = csv.writer(file)
writer.writerow(row)
In pseudo code, you'll do something like this:
for each_file in a_folder_that_contains_csv: # go through all the `inputs` - csv files
with open(each_file) as csv_file, open(other_file) as output_file: # open each csv file, and a new csv file
process_the_input_from_each_csv # process the data you read from the csv_file
export_to_output_file # export the data to the new csv file
Now, I won't write a full-working example because it's better for you to start digging and ask specific questions when you have some. You're now just asking: write this for me because I don't know Python.
here is the official documentation
here you can read about the csv module
here you can read about the os module
I think you need read_csv for reading file to Series and to_csv for writing output Series to file in looping by Series.iteritems.
#file content
1
3
5
s = pd.read_csv('file', squeeze=True, names=['a'])
print (s)
0 1
1 3
2 5
Name: a, dtype: int64
for i, val in s.iteritems():
#print (val)
#some operation with scalar value val
df = pd.DataFrame({'a':np.arange(val)})
df['a'] = df['a'] * 10
print (df)
#write to csv, file name by val
df.to_csv(str(val) + '.csv', index=False)
a
0 0
a
0 0
1 10
2 20
a
0 0
1 10
2 20
3 30
4 40
I have the following code which writes a data output to a csv file. I want to add a date variable at the end of each row using the yesterday.strftime variable that i am using in creating the filename. For example:
Thanks!
my current output is like:
columnA
1
2
and I want to add the following column:
Date
2/5/2016
2/5/2016
.
.
.
CODE::
filepath = 'C:\\test\\'
filename = yesterday.strftime('%Y-%m-%d') + '_' + 'test.csv'
f = open( filename, 'wt')
writer = csv.writer(f, lineterminator='\n')
header = [h['name'][3:] for h in results.get('columnHeaders')]
writer.writerow(header)
print(''.join('%30s' % h for h in header))
# Write data table.
if results.get('rows', []):
for row in results.get('rows'):
writer.writerow(row)
print(''.join('%30s' % r for r in row))
else:
print ('No Rows Found')
f.close()
In [26]: import pandas as pd
In [27]: import datetime
In [28]: a = pd.read_csv('a.csv')
In [29]: a
Out[29]:
columnA
0 1
1 2
In [30]: a['Date'] = [datetime.date.today()]*len(a)
In [31]: a
Out[31]:
columnA Date
0 1 2016-02-05
1 2 2016-02-05
In [32]: a.to_csv('adate.csv')
Generally: https://www.airpair.com/python/posts/top-mistakes-python-big-data-analytics