Create a csv file in from a python list - python

I have a list like the following with the \n separating each new lines
['Data,9,record,timestamp,"896018545",s,position_lat,"504719750",semicircles,position_long,"-998493490",semicircles,distance,"10.87",m,altitude,"285.79999999999995",m,speed,"1.773",m/s,unknown,"3929",,unknown,"1002",,enhanced_altitude,"285.79999999999995",m,enhanced_speed,"1.773",m/s,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,\n', 'Data,9,record,timestamp,"896018560",s,position_lat,"504717676",semicircles,position_long,"-998501870",semicircles,distance,"71.85",m,altitude,"285.0",m,speed,"5.533",m/s,unknown,"3924",,unknown,"1001",,enhanced_altitude,"285.0",m,enhanced_speed,"5.533",m/s,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,]
This is hard to read, so I need to extract the following out of this list into a CSV file in python in the below format without using pandas:
timestamp, position_lat,altitude
"896018545","504719750","285.79999999999995"
"896018560","504717676","285.0"
I have the following, but I am confused about how to add the data into the CSV file:
header = ['timestamp','latitude','altitude']
with open('target.csv', 'w', encoding='UTF8', newline='') as f:
writer = csv.writer(f)
# write the header
writer.writerow(header)
# write the data

If I'm understanding your question correctly, all you need to do is write additional rows that include your data.
...
...
writer.writerow(["896018545","504719750","285.79999999999995"])
writer.writerow(["896018560","504717676","285.0"])
# alternatively,
data = [["896018545","504719750","285.79999999999995"],
["896018560","504717676","285.0"]]
...
...
for row in data:
writer.writerow(row)

Related

Reading a CSV and writing the exact file back with one column changed

I'm looking to read a CSV file and make some calculations to the data in there (that part I have done).
After that I need to write all the data into a new file Exactly the way it is in the original with the exception of one column which will be changed to the result of the calculations.
I can't show the actual code (confidentiality issues) but here is an example of the code.
headings = ["losts", "of", "headings"]
with open("read_file.csv", mode="r") as read,\
open("write.csv", mode='w') as write:
reader = csv.DictReader(read, delimiter = ',')
writer = csv.DictWriter(write, fieldnames=headings)
writer.writeheader()
for row in reader:
writer.writerows()
At this stage I am just trying to return the same CSV in "write" as I have in "read"
I haven't used CSV much so not sure if I'm going about it the wrong way, also understand that this example is super simple but I can't seem to get my head around the logic of it.
You're really close!
headings = ["losts", "of", "headings"]
with open("read_file.csv", mode="r") as read, open("write.csv", mode='w') as write:
reader = csv.DictReader(read, delimiter = ',')
writer = csv.DictWriter(write, fieldnames=headings)
writer.writeheader()
for row in reader:
# do processing here to change the values in that one column
processed_row = some_function(row)
writer.writerow(processed_row)
Why you don't use pandas?
import pandas as pd
csv_file = pd.read_csv('read_file.csv')
# do_something_and_add_a_column_here()
csv_file.to_csv('write_file.csv')

CSV file formatting – start a new row after every nth entry

I am currently trying to organize data collected by a web scraper into a .csv file. The data is collected in a list and then written as csv.
My problem is that the program is writing the data into a single row in the file.
How can I tell the program to start a new row after for example every fifth entry?
Here is what my code looks like right now:
import csv
csvFile = open('practive.csv', 'w+')
try:
writer = csv.writer(csvFile, delimiter=',')
writer.writerow(('kaufpreis', 'ort', 'wohnfläche','zimmer', 'company'))
writer.writerows([dataliste])
finally:
csvFile.close()
Split your list into chunks of 5 elements, using one of the techniques in How do you split a list into evenly sized chunks?.
Then write the chunked list to the CSV file.
writer.writerows(chunks(dataliste, 5))
it depends on how you are giving data to the writer
import csv
csvFile = open('practive.csv', 'w+')
dataliste = [[1,2,3,4,5], [11,22,33,44,55]]
try:
writer = csv.writer(csvFile, delimiter=',')
writer.writerow(('kaufpreis', 'ort',
'wohnfläche','zimmer', 'company'))
writer.writerows(data for data in dataliste)
finally:
csvFile.close()

add a new column to an existing csv file

I have a csv file with 5 columns and I want to add data in a 6th column. The data I have is in an array.
Right now, the code that I have will insert the data I would want in the 6th column only AFTER all the data that already exists in the csv file.
For instance I have:
wind, site, date, time, value
10, 01, 01-01-2013, 00:00, 5.1
89.6 ---> this is the value I want to add in a 6th column but it puts it after all the data from the csv file
Here is the code I am using:
csvfile = 'filename'
with open(csvfile, 'a') as output:
writer = csv.writer(output, lineterminator='\n')
for val in data:
writer.writerow([val])
I thought using 'a' would append the data in a new column, but instead it just puts it after ('under') all the other data... I don't know what to do!
Appending writes data to the end of a file, not to the end of each row.
Instead, create a new file and append the new value to each row.
csvfile = 'filename'
with open(csvfile, 'r') as fin, open('new_'+csvfile, 'w') as fout:
reader = csv.reader(fin, newline='', lineterminator='\n')
writer = csv.writer(fout, newline='', lineterminator='\n')
if you_have_headers:
writer.writerow(next(reader) + [new_heading])
for row, val in zip(reader, data)
writer.writerow(row + [data])
On Python 2.x, remove the newline='' arguments and change the filemodes from 'r' and 'w' to 'rb' and 'wb', respectively.
Once you are sure this is working correctly, you can replace the original file with the new one:
import os
os.remove(csvfile) # not needed on unix
os.rename('new_'+csvfile, csvfile)
csv module does not support writing or appending column. So the only thing you can do is: read from one file, append 6th column data, and write to another file. This shows as below:
with open('in.txt') as fin, open('out.txt', 'w') as fout:
index = 0
for line in fin:
fout.write(line.replace('\n', ', ' + str(data[index]) + '\n'))
index += 1
data is a int list.
I test these codes in python, it runs fine.
We have a CSV file i.e. data.csv and its contents are:
#data.csv
1,Joi,Python
2,Mark,Laravel
3,Elon,Wordpress
4,Emily,PHP
5,Sam,HTML
Now we want to add a column in this csv file and all the entries in this column should contain the same value i.e. Something text.
Example
from csv import writer
from csv import reader
new_column_text = 'Something text'
with open('data.csv', 'r') as read_object, \
open('data_output.csv', 'w', newline='') as write_object:
csv_reader = reader(read_object)
csv_writer = writer(write_object)
for row in csv_reader:
row.append(new_column_text)
csv_writer.writerow(row)
Output
#data_output.csv
1,Joi,Python,Something text
2,Mark,Laravel,Something text
3,Elon,Wordpress,Something text
4,Emily,PHP,Something text
5,Sam,HTML,Something text
The append mode of opening files is meant to add data to the end of a file. what you need to do is provide random access to your file writing. you need to use the seek() method
you can see and example here:
http://www.tutorialspoint.com/python/file_seek.htm
or read the python docs on it here: https://docs.python.org/2.4/lib/bltin-file-objects.html which isn't terribly useful
if you want to add to the end of a column you may want to open the file read a line to figure out it's length then seek to the end.

Python: appending/merging multiple csv files respecting headers and write to csv

[Using Python3] I'm very new to (Python) programming but nonetheless am writing a script that scans a folder for certain csv files, then I want to read them all and append them and write them into another csv file.
In between it is required that data is returned only where the values in a certain columns are matched to a set criteria.
All csv files have the same columns, and would look somewhere like this:
header1 header2 header3 header4 ...
string float string float ...
string float string float ...
string float string float ...
string float string float ...
... ... ... ... ...
The code I'm working with right now is the following (below), however it just keeps on overwriting the data from the previous file. That does make sense to me, I just cannot figure out how to get it working though.
Code:
import csv
import datetime
import sys
import glob
import itertools
from collections import defaultdict
# Raw data files have the format like '2013-06-04'. To be able to use this script during the whole of 2013, the glob is set to search for the pattern '2013-*.csv'
files = [f for f in glob.glob('2013-*.csv')]
# Output file looks like '20130620-filtered.csv'
outfile = '{:%Y%m%d}-filtered.csv'.format(datetime.datetime.now())
# List of 'Header4' values to be filtered for writing output
header4 = ['string1', 'string2', 'string3', 'string4']
for f in files:
with open(f, 'r') as f_in:
dict_reader = csv.DictReader(f_in)
with open(outfile, 'w') as f_out:
dict_writer = csv.DictWriter(f_out, lineterminator='\n', fieldnames=dict_reader.fieldnames)
dict_writer.writeheader()
for row in dict_reader:
if row['Campaign'] in campaign_names:
dict_writer.writerow(row)
I also tried something like readers = list(itertools.chain(*map(lambda f: csv.DictReader(open(f)), files))), and trying to iterate over the readers however then I cannot figure out how to work with the headers. (I get the error that itertools.chain() does not have the fieldnames attribute).
Any help is very much appreciated!
You keep re-opening the file and overwriting it.
Open outfile once, before your loops start. For the first file you read, write the header and the rows. For rest of the files, just write the rows.
Something like
with open(outfile, 'w') as f_out:
dict_writer = None
for f in files:
with open(f, 'r') as f_in:
dict_reader = csv.DictReader(f_in)
if not dict_writer:
dict_writer = csv.DictWriter(f_out, lineterminator='\n', fieldnames=dict_reader.fieldnames)
dict_writer.writeheader()
for row in dict_reader:
if row['Campaign'] in campaign_names:
dict_writer.writerow(row)

Parsing a pipe-delimited file in Python

I'm trying to parse a pipe-delimited file and pass the values into a list, so that later I can print selective values from the list.
The file looks like:
name|age|address|phone|||||||||||..etc
It has more than 100 columns.
Use the 'csv' library.
First, register your dialect:
import csv
csv.register_dialect('piper', delimiter='|', quoting=csv.QUOTE_NONE)
Then, use your dialect on the file:
with open(myfile, "rb") as csvfile:
for row in csv.DictReader(csvfile, dialect='piper'):
print row['name']
Use Pandas:
import pandas as pd
pd.read_csv(filename, sep="|")
This will store the file in a dataframe. For each column, you can apply conditions to select the required values to print. It takes a very short time to execute. I tried with 111,047 rows.
If you're parsing a very simple file that won't contain any | characters in the actual field values, you can use split:
fileHandle = open('file', 'r')
for line in fileHandle:
fields = line.split('|')
print(fields[0]) # prints the first fields value
print(fields[1]) # prints the second fields value
fileHandle.close()
A more robust way to parse tabular data would be to use the csv library as mentioned in Spencer Rathbun's answer.
In 2022, with Python 3.8 or above, you can simply do:
import csv
with open(file_path, "r") as csvfile:
reader = csv.reader(csvfile, delimiter='|')
for row in reader:
print(row[0], row[1])

Categories

Resources