I have the following code which I am using to create and add a new row to a csv file.
def calcPrice(data):
fieldnames = ["ReferenceID","clientName","Date","From","To","Rate","Price"]
with open('rec2.csv', 'a') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
writer.writerow(data)
return
However, it as the header as a new row as well. How can I prevent this?
Here's a link to the gist with the whole code: https://gist.github.com/chriskinyua/5ff8a527b31451ddc7d7cf157c719bba
You could check if the file already exists
import os
def calcPrice(data):
filename = 'rec2.csv'
write_header = not os.path.exists(filename)
fieldnames = ["ReferenceID","clientName","Date","From","To","Rate","Price"]
with open(filename, 'a') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
if write_header:
writer.writeheader()
writer.writerow(data)
Let's assume there's a function we can call that will tell us whether we should write out the header or not, so the code would look like this:
import csv
def calcPrice(data):
fieldnames = ["ReferenceID","clientName","Date","From","To","Rate","Price"]
with open('rec2.csv', 'a') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
if should_write_header(csvfile):
writer.writeheader()
writer.writerow(data)
What will should_write_header look like? Here are three possibilities. For all of them, we will need to import the io module:
import io
The logic of all these functions is the same: we want to work out if the end of the file is the same as the beginning of the file. If that is true, then we want to write the header row.
This function is the most verbose: it finds the current position using the file's tell method, moves to the beginning of the file using its seek method, then runs tell again to see if the reported positions are the same. If they are not it seeks back to the end of the file before returning the result. We don't simply compare the value of EOF to zero because the Python docs state that the result of tell for text files does not necessarily correspond to the actual position of the file pointer.
def should_write_header1(fileobj):
EOF = fileobj.tell()
fileobj.seek(0, io.SEEK_SET)
res = fileobj.tell() == EOF
if not res:
fileobj.seek(EOF, io.SEEK_SET)
return res
This version assumes that while the tell method does not necessarily correspond to the position of the file pointer in general, tell will always return zero for an empty file. This will probably work in common cases.
def should_write_header2(fileobj):
return fileobj.tell() == 0
This version accesses the tell method of the binary stream that TextIOWrapper (the text file object class) wraps. For binary streams, tell is documented to return the actual file pointer position. This is removes the uncertainty of should_write_header2, but unfortunately buffer is not guaranteed to exist in all Python implementations, so this isn't portable.
def should_write_header3(fileobj):
return fileobj.buffer.tell() == 0
So for 100% certainty, use should_write_header1. For less certainty but shorter code, use one of the others. If performance is a concern favour should_write_header3, because tell in binary streams is faster than tell in text streams.
Related
Recently I came across a strange behavior of the with open() statement in Python.
The following code returns output just for the first read-statement, having an empty lines-list.
input_csv = []
with open(self.path, 'r') as f: # Opening the CSV
r = csv.DictReader(f)
for row in r:
input_csv.append(row) # Storing its contents in a dictionary for later use
lines = f.readlines() # Reading it in as a list too
f.close()
While splitting it into two open () statements returns the objects as desired.
input_csv = []
with open(self.path, 'r') as f: # Opening the CSV
r = csv.DictReader(f)
for row in r:
input_csv.append(row) # Storing its contents in a dictionary for later use
f.close()
with open(self.path, 'r') as f: # Opening the CSV
lines = f.readlines() # Reading it in as a list too
f.close()
Why is the f variable just used once in the first statement?
Many thanks
If you look into documentation of csv.reader() which is used for DictReader().reader:
Return a reader object which will iterate over lines in the given csvfile. csvfile can be any object which supports the iterator protocol and returns a string each time its __next__() method is called...
Hence, it uses behavior of file-like object for which each iteration essentially is f.readline(). An operation which also advances current position in the file... until EOF is reached, which when iteration raises StopIteration exception. It is the same behavior you would observe trying:
with open(self.path, 'r') as f:
for l in f:
pass # each line was read
print(f.readlines())
You can add print(f.tell()) to see how the position changes as you execute each line.
If you (re)open a new file, you start at position 0 (again). If you've read through once and wanted to use the same handle again, you need to return to the beginning of the file: f.seek(0).
Note: you really do not need to perform f.close() in a managed context using with. Once you leave it, it'll close the file handle for you.
I am trying to add a new row to my old CSV file. Basically, it gets updated each time I run the Python script.
Right now I am storing the old CSV rows values in a list and then deleting the CSV file and creating it again with the new list value.
I wanted to know are there any better ways of doing this.
with open('document.csv','a') as fd:
fd.write(myCsvRow)
Opening a file with the 'a' parameter allows you to append to the end of the file instead of simply overwriting the existing content. Try that.
I prefer this solution using the csv module from the standard library and the with statement to avoid leaving the file open.
The key point is using 'a' for appending when you open the file.
import csv
fields=['first','second','third']
with open(r'name', 'a') as f:
writer = csv.writer(f)
writer.writerow(fields)
If you are using Python 2.7 you may experience superfluous new lines in Windows. You can try to avoid them using 'ab' instead of 'a' this will, however, cause you TypeError: a bytes-like object is required, not 'str' in python and CSV in Python 3.6. Adding the newline='', as Natacha suggests, will cause you a backward incompatibility between Python 2 and 3.
Based in the answer of #G M and paying attention to the #John La Rooy's warning, I was able to append a new row opening the file in 'a'mode.
Even in windows, in order to avoid the newline problem, you must declare it as newline=''.
Now you can open the file in 'a'mode (without the b).
import csv
with open(r'names.csv', 'a', newline='') as csvfile:
fieldnames = ['This','aNew']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writerow({'This':'is', 'aNew':'Row'})
I didn't try with the regular writer (without the Dict), but I think that it'll be ok too.
If you use pandas, you can append your dataframes to an existing CSV file this way:
df.to_csv('log.csv', mode='a', index=False, header=False)
With mode='a' we ensure that we append, rather than overwrite, and with header=False we ensure that we append only the values of df rows, rather than header + values.
Are you opening the file with mode of 'a' instead of 'w'?
See Reading and Writing Files in the python docs
7.2. Reading and Writing Files
open() returns a file object, and is most commonly used with two arguments: open(filename, mode).
>>> f = open('workfile', 'w')
>>> print f <open file 'workfile', mode 'w' at 80a0960>
The first argument is a string containing the filename. The second argument is
another string containing a few characters describing the way in which
the file will be used. mode can be 'r' when the file will only be
read, 'w' for only writing (an existing file with the same name will
be erased), and 'a' opens the file for appending; any data written to
the file is automatically added to the end. 'r+' opens the file for
both reading and writing. The mode argument is optional; 'r' will be
assumed if it’s omitted.
On Windows, 'b' appended to the mode opens the file in binary mode, so
there are also modes like 'rb', 'wb', and 'r+b'. Python on Windows
makes a distinction between text and binary files; the end-of-line
characters in text files are automatically altered slightly when data
is read or written. This behind-the-scenes modification to file data
is fine for ASCII text files, but it’ll corrupt binary data like that
in JPEG or EXE files. Be very careful to use binary mode when reading
and writing such files. On Unix, it doesn’t hurt to append a 'b' to
the mode, so you can use it platform-independently for all binary
files.
If the file exists and contains data, then it is possible to generate the fieldname parameter for csv.DictWriter automatically:
# read header automatically
with open(myFile, "r") as f:
reader = csv.reader(f)
for header in reader:
break
# add row to CSV file
with open(myFile, "a", newline='') as f:
writer = csv.DictWriter(f, fieldnames=header)
writer.writerow(myDict)
I use the following approach to append a new line in a .csv file:
pose_x = 1
pose_y = 2
with open('path-to-your-csv-file.csv', mode='a') as file_:
file_.write("{},{}".format(pose_x, pose_y))
file_.write("\n") # Next line.
[NOTE]:
mode='a' is append mode.
# I like using the codecs opening in a with
field_names = ['latitude', 'longitude', 'date', 'user', 'text']
with codecs.open(filename,"ab", encoding='utf-8') as logfile:
logger = csv.DictWriter(logfile, fieldnames=field_names)
logger.writeheader()
# some more code stuff
for video in aList:
video_result = {}
video_result['date'] = video['snippet']['publishedAt']
video_result['user'] = video['id']
video_result['text'] = video['snippet']['description'].encode('utf8')
logger.writerow(video_result)
I have a large CSV file (~250000 rows) and before I work on fully parsing and sorting it I was trying to display only a part of it by writing it to a text file.
csvfile = open(file_path, "rb")
rows = csvfile.readlines()
text_file = open("output.txt", "w")
row_num = 0
while row_num < 20:
text_file.write(", ".join(row[row_num]))
row_num += 1
text_file.close()
I want to iterate through the CSV file and write only a small section of it to a text file so I can look at how it does this and see if it would be of any use to me. Currently the text file ends up empty.
A way I thought might do this would be to iterate through the file with a for loop that exits after a certain number of iteration but I could be wrong and I'm not sure how to do this, any ideas?
There's nothing specifically wrong with what you're doing, but it's not particularly Pythonic. In particular reading the whole file into memory with readlines() at the start seems pointless if you're only using 20 lines.
Instead you could use a for loop with enumerate and break when necessary.
csvfile = open(file_path, "rb")
text_file = open("output.txt", "w")
for i, row in enumerate(csvfile):
text_file.write(row)
if row_num >= 20:
break
text_file.close()
You could further improve this by using with blocks to open the files, rather than closing them explicitly. For example:
with open(file_path, "rb") as csvfile:
#your code here involving csvfile
#now the csvfile is closed!
Also note that Python might not be the best tool for this - you could do it directly from Bash, for example, with just head -n20 csvfile.csv > output.txt.
A simple solution would be to just do :
#!/usr/bin/python
# -*- encoding: utf-8 -*-
file_path = './test.csv'
with open(file_path, 'rb') as csvfile:
with open('output.txt', 'wb') as textfile:
for i, row in enumerate(csvfile):
textfile.write(row)
if i >= 20:
break
Explanation :
with open(file_path, 'rb') as csvfile:
with open('output.txt', 'wb') as textfile:
Instead of using open and close, it is recommended to use this line instead. Just write the lines that you want to execute when your file is opened into a new level of indentation.
'rb' and 'wb' are the keywords you need to open a file in respectively 'reading' and 'writing' in 'binary mode'
for i, row in enumerate(csvfile):
This line allows you to read line by line your CSV file, and using a tuple (i, row) gives you both the content of the row and its index. That's one of the awesome built-in functions from Python : check out here for more about it.
Hope this helps !
EDIT : Note that Python has a CSV package that can do that without enumerate :
# -*- encoding: utf-8 -*-
import csv
file_path = './test.csv'
with open(file_path, 'rb') as csvfile:
reader = csv.reader(csvfile)
with open('output.txt', 'wb') as textfile:
writer = csv.writer(textfile)
i = 0
while i<20:
row = next(reader)
writer.writerow(row)
i += 1
All we need to use is its reader and writer. They have functions next (that reads one line) and writerow (that writes one). Note that here, the variable row is not a string, but a list of strings, because the function does the split job by itself. It might be faster than the previous solution.
Also, this has the major advantage of allowing you to look anywhere you want in the file, no necessarily from the beginning (just change the bounds for i)
What is the 'Python way' regarding working with a CSV file? If I want to run some methods on the data in a particular column, should copy the whole think into an array, or should I pass the open file into a series of methods?
I tried to return the open file and got this error:
ValueError: I/O operation on closed file
here's the code:
import sys
import os
import csv
def main():
pass
def openCSVFile(CSVFile, openMode):
with open(CSVFile, openMode) as csvfile:
zipreader = csv.reader(csvfile, delimiter=',')
return zipreader
if __name__ == '__main__':
zipfile = openCSVFile('propertyOutput.csv','rb')
numRows = sum(1 for row in zipfile)
print"Rows equals %d." % numRows
Well there are many ways you could go about manipulating csv files. It depends
largely on how big your data is and how often you will perform these operations.
I will build on the already good answers and comments to present a somewhat more
complex handling, that wouldn't be far off from a real world example.
First of all, I prefer csv.DictReader because most csv files have a header
row with the column names. csv.DictReader takes advantage of that and gives
you the opportunity to grab it's cell value by its name.
Also, most of the times you need to perform various validation and normalization
operations on said data, so we're going to associate some functions with specific
columns.
Suppose we have a csv with information about products.
e.g.
Product Name,Release Date,Price
foo product,2012/03/23,99.9
awesome product,2013/10/14,40.5
.... and so on ........
Let's write a program to parse it and normalize the values
into appropriate native python objects.
import csv
import datetime
from decimal import Decimal
def stripper(value):
# Strip any whitespace from the left and right
return value.strip()
def to_decimal(value):
return Decimal(value)
def to_date(value):
# We expect dates like: "2013/05/23"
datetime.datetime.strptime(value, '%Y/%m/%d').date()
OPERATIONS = {
'Product Name': [stripper],
'Release Date': [stripper, to_date],
'Price': [stripper, to_decimal]
}
def parse_csv(filepath):
with open(filepath, 'r') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
for column in row:
operations = OPERATIONS[column]
value = row[column]
for op in operations:
value = op(value)
# Print the cleaned value or store it somewhere
print value
Things to note:
1) We operate on the csv in a line by line basis. DictReader yields lines
one at a time and that means we can handle arbitrary sizes of csv files,
since we are not going to load the whole file into memory.
2) You can go crazy with normalizing the values of a csv, by building special
classes with magic methods or whatnot. As I said, it depends on the complexity
of your files, the quality of the data and the operations you need to perform
on them.
Have fun.
csv module provides one row at a time, understanding its content by spliting it as a list object (or dict in case of DictReader).
As Python knows how to loop on such an object, if you're just interested in some specific fields, building a list with these fields seems 'Pythonic' to me. Using an iterator is also valid if each item shall be considered separatly from the others.
You probably need to read PEP 343: The 'with' statement
Relevant quote:
Some standard Python objects now support the context management protocol and can be used with the 'with' statement. File objects are one example:
with open('/etc/passwd', 'r') as f:
for line in f:
print line
... more processing code ...
After this statement has executed, the file object in f will have been automatically closed, even if the 'for' loop raised an exception part-way through the block.
So your csvfile is closed outside with statement, and outside openCSVFile function. You need to not to use with statement,
def openCSVFile(CSVFile, openMode):
csvfile = open(CSVFile, openMode)
return csv.reader(csvfile, delimiter=',')
or move it to __main__:
def get_csv_reader(filelike):
return csv.reader(csvfile, delimiter=',')
if __name__ == '__main__':
with open('propertyOutput.csv', 'rb') as csvfile:
zipfile = get_csv_reader(csvfile)
numRows = sum(1 for row in zipfile)
print"Rows equals %d." % numRows
Firstly, the reason you're getting ValueError: I/O operation on closed file is that in the following, the with acting as a context manager is operating on an opened file which is the underlying fileobj that zipreader is then set to work on. What happens, is that as soon as the with block is exited, the file that was opened is then closed, which leaves the file unusable for zipreader to read from...
with open(CSVFile, openMode) as csvfile:
zipreader = csv.reader(csvfile, delimiter=',')
return zipreader
Generally, acquire the resource and then pass it a function if needed. So, in your main program open the file and create the csv.reader and then pass that to something and have it closed in the main program when it makes more sense that "you're done with it now".
I am trying to parse a "pseudo-CSV" file with the python CSV reader, and am having some doubts about how to add some extra logic. The reason I call it a "pseudo-CSV" file is because some of the lines in the input file will have text (30-40 chars) before the actual CSV data starts. I am trying to figure out the best way to remove this text.
Currently, I have found 3 options for removing said text:
From Python, call grep and sed and pipe the output to a temp file which can then be fed to the csv reader
(Ugh, I would like to avoid this option)
Create a CSV dialect to remove the unwanted text
(This option just feels wrong)
Extend the File object, implementing the next() function to remove the unwanted text as necessary.
I have no control over how the input file is generated, so its not an option to modify the generation.
Here is the related code I had when I realized the problem with the input file.
with open('myFile', 'r') as csvfile:
theReader = csv.reader(csvfile)
for row in theReader:
# my logic here
If I go with option 3 above, the solution is quite straight-forward, but
then I wont be able to incorporate the with open() syntax.
So, here is my question (2 actually): Is option 3 the best way to solve this
problem? If so, how can I incorporate it with the with open() syntax?
Edit: Forgot to mention that Im using Python 2.7 on Linux.
csv.reader accepts an arbitrary iterable besides files:
with open('myFile', 'rb') as csvfile:
reader = csv.reader(filter_line(line) for line in csvfile)
for row in reader:
# my logic here
You can just use contextlib and create your own context manager.
from contextlib import contextmanager
#contextmanager
def csv_factory(filename, mode="r"):
# setup here
fileobj = open(filename, mode)
reader = mycsv.reader(fileobj)
try:
yield reader # return value for usage in with
finally:
fileobj.close() # clean up here
with csv_factory("myFile") as csvfile:
for line in csvfile:
print(line)