io.StringIO vs open() in Python 3 - python

All I could find is this statement:
The easiest way to create a text stream is with open(), optionally
specifying an encoding:
f = open("myfile.txt", "r", encoding="utf-8")
In-memory text streams are also available as StringIO objects:
f = io.StringIO("some initial text data")
But this gives no insight at all on when I should use open() over io.StringIO and vice-versa. I know that they do not work exactly the same behind the scene. But why would someone go for open() in Python 3 ?

The difference is: open takes a file name (and some other arguments like mode or encoding), io.StringIO takes a plain string and both return file-like objects.
Hence:
Use open to read files ;
Use StringIO when you need a file-like object and you want to pass the content of a string.
An example with StringIO:
import csv
import io
reader = csv.reader(io.StringIO("a,b,c\n1,2,3"))
print ([r for r in reader])
# output [['a', 'b', 'c'], ['1', '2', '3']]
It's very useful because you can use a string where a file was expected.
In the usual case, with a csv file on your disk, you would write something like:
with open(<path/to/file.csv>, ...) as f:
reader = csv.reader(f, ...)

Related

Creating Excel-compatible utf8-CSV with io.String

I want to create a CSV string which is being sent by mail. I'd like to avoid temporary files because it's just not necessary for me to deal with the file system. As I said, the data is passed to a mail class.
I found this neat solution
output = io.StringIO()
writer = csv.writer(output, delimiter=';', dialect='excel-tab')
writer.writerows(data)
It works perfectly except that it creates a UTF-8 file which will have messed up special characters if you open it in Excel.
I tried to pass a BOM to the StringIO constructor, but nothing worked:
output = io.StringIO('\ufeff')
I tried to somehow set the encoding to utf-8-sig but I couldn't find a way except using a file...
Any ideas how to solve this problem?
Thanks!
StringIO is strings which means unicode. To do what you want, I think you need to use BytesIO instead.
After reading this and this I think the problem is that StringIO is strings only and therefore no encoding.
You could try this [extrapolated from the first link above]:
bio = io.BytesIO()
StreamWriter = codecs.getwriter('utf-8-sig')
wrapper_file = StreamWriter(bio)
csv.writer(wrapper_file,...
writer.writerows(data)
Python character encoding makes my head hurt...

Convert bytes to a file object in python

I have a small application that reads local files using:
open(diefile_path, 'r') as csv_file
open(diefile_path, 'r') as file
and also uses linecache module
I need to expand the use to files that send from a remote server.
The content that is received by the server type is bytes.
I couldn't find a lot of information about handling IOBytes type and I was wondering if there is a way that I can convert the bytes chunk to a file-like object.
My goal is to use the API is specified above (open,linecache)
I was able to convert the bytes into a string using data.decode("utf-8"),
but I can't use the methods above (open and linecache)
a small example to illustrate
data = 'b'First line\nSecond line\nThird line\n'
with open(data) as file:
line = file.readline()
print(line)
output:
First line
Second line
Third line
can it be done?
open is used to open actual files, returning a file-like object. Here, you already have the data in memory, not in a file, so you can instantiate the file-like object directly.
import io
data = b'First line\nSecond line\nThird line\n'
file = io.StringIO(data.decode())
for line in file:
print(line.strip())
However, if what you are getting is really just a newline-separated string, you can simply split it into a list directly.
lines = data.decode().strip().split('\n')
The main difference is that the StringIO version is slightly lazier; it has a smaller memory foot print compared to the list, as it splits strings off as requested by the iterator.
The answer above that using StringIO would need to specify an encoding, which may cause wrong conversion.
from Python Documentation using BytesIO:
from io import BytesIO
f = BytesIO(b"some initial binary data: \x00\x01")

Convert file into BytesIO object using python

I have a file and want to convert it into BytesIO object so that it can be stored in database's varbinary column.
Please can anyone help me convert it using python.
Below is my code:
f = open(filepath, "rb")
print(f.read())
myBytesIO = io.BytesIO(f)
myBytesIO.seek(0)
print(type(myBytesIO))
Opening a file with open and mode read-binary already gives you a Binary I/O object.
Documentation:
The easiest way to create a binary stream is with open() with 'b' in the mode string:
f = open("myfile.jpg", "rb")
So in normal circumstances, you'd be fine just passing the file handle wherever you need to supply it. If you really want/need to get a BytesIO instance, just pass the bytes you've read from the file when creating your BytesIO instance like so:
from io import BytesIO
with open(filepath, "rb") as fh:
buf = BytesIO(fh.read())
This has the disadvantage of loading the entire file into memory, which might be avoidable if the code you're passing the instance to is smart enough to stream the file without keeping it in memory. Note that the example uses open as a context manager that will reliably close the file, even in case of errors.

Write to a csv file multiple times? [duplicate]

I am trying to add a new row to my old CSV file. Basically, it gets updated each time I run the Python script.
Right now I am storing the old CSV rows values in a list and then deleting the CSV file and creating it again with the new list value.
I wanted to know are there any better ways of doing this.
with open('document.csv','a') as fd:
fd.write(myCsvRow)
Opening a file with the 'a' parameter allows you to append to the end of the file instead of simply overwriting the existing content. Try that.
I prefer this solution using the csv module from the standard library and the with statement to avoid leaving the file open.
The key point is using 'a' for appending when you open the file.
import csv
fields=['first','second','third']
with open(r'name', 'a') as f:
writer = csv.writer(f)
writer.writerow(fields)
If you are using Python 2.7 you may experience superfluous new lines in Windows. You can try to avoid them using 'ab' instead of 'a' this will, however, cause you TypeError: a bytes-like object is required, not 'str' in python and CSV in Python 3.6. Adding the newline='', as Natacha suggests, will cause you a backward incompatibility between Python 2 and 3.
Based in the answer of #G M and paying attention to the #John La Rooy's warning, I was able to append a new row opening the file in 'a'mode.
Even in windows, in order to avoid the newline problem, you must declare it as newline=''.
Now you can open the file in 'a'mode (without the b).
import csv
with open(r'names.csv', 'a', newline='') as csvfile:
fieldnames = ['This','aNew']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writerow({'This':'is', 'aNew':'Row'})
I didn't try with the regular writer (without the Dict), but I think that it'll be ok too.
If you use pandas, you can append your dataframes to an existing CSV file this way:
df.to_csv('log.csv', mode='a', index=False, header=False)
With mode='a' we ensure that we append, rather than overwrite, and with header=False we ensure that we append only the values of df rows, rather than header + values.
Are you opening the file with mode of 'a' instead of 'w'?
See Reading and Writing Files in the python docs
7.2. Reading and Writing Files
open() returns a file object, and is most commonly used with two arguments: open(filename, mode).
>>> f = open('workfile', 'w')
>>> print f <open file 'workfile', mode 'w' at 80a0960>
The first argument is a string containing the filename. The second argument is
another string containing a few characters describing the way in which
the file will be used. mode can be 'r' when the file will only be
read, 'w' for only writing (an existing file with the same name will
be erased), and 'a' opens the file for appending; any data written to
the file is automatically added to the end. 'r+' opens the file for
both reading and writing. The mode argument is optional; 'r' will be
assumed if it’s omitted.
On Windows, 'b' appended to the mode opens the file in binary mode, so
there are also modes like 'rb', 'wb', and 'r+b'. Python on Windows
makes a distinction between text and binary files; the end-of-line
characters in text files are automatically altered slightly when data
is read or written. This behind-the-scenes modification to file data
is fine for ASCII text files, but it’ll corrupt binary data like that
in JPEG or EXE files. Be very careful to use binary mode when reading
and writing such files. On Unix, it doesn’t hurt to append a 'b' to
the mode, so you can use it platform-independently for all binary
files.
If the file exists and contains data, then it is possible to generate the fieldname parameter for csv.DictWriter automatically:
# read header automatically
with open(myFile, "r") as f:
reader = csv.reader(f)
for header in reader:
break
# add row to CSV file
with open(myFile, "a", newline='') as f:
writer = csv.DictWriter(f, fieldnames=header)
writer.writerow(myDict)
I use the following approach to append a new line in a .csv file:
pose_x = 1
pose_y = 2
with open('path-to-your-csv-file.csv', mode='a') as file_:
file_.write("{},{}".format(pose_x, pose_y))
file_.write("\n") # Next line.
[NOTE]:
mode='a' is append mode.
# I like using the codecs opening in a with
field_names = ['latitude', 'longitude', 'date', 'user', 'text']
with codecs.open(filename,"ab", encoding='utf-8') as logfile:
logger = csv.DictWriter(logfile, fieldnames=field_names)
logger.writeheader()
# some more code stuff
for video in aList:
video_result = {}
video_result['date'] = video['snippet']['publishedAt']
video_result['user'] = video['id']
video_result['text'] = video['snippet']['description'].encode('utf8')
logger.writerow(video_result)

How can I use io.StringIO() with the csv module?

I tried to backport a Python 3 program to 2.7, and I'm stuck with a strange problem:
>>> import io
>>> import csv
>>> output = io.StringIO()
>>> output.write("Hello!") # Fail: io.StringIO expects Unicode
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unicode argument expected, got 'str'
>>> output.write(u"Hello!") # This works as expected.
6L
>>> writer = csv.writer(output) # Now let's try this with the csv module:
>>> csvdata = [u"Hello", u"Goodbye"] # Look ma, all Unicode! (?)
>>> writer.writerow(csvdata) # Sadly, no.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unicode argument expected, got 'str'
According to the docs, io.StringIO() returns an in-memory stream for Unicode text. It works correctly when I try and feed it a Unicode string manually. Why does it fail in conjunction with the csv module, even if all the strings being written are Unicode strings? Where does the str come from that causes the Exception?
(I do know that I can use StringIO.StringIO() instead, but I'm wondering what's wrong with io.StringIO() in this scenario)
The Python 2.7 csv module doesn't support Unicode input: see the note at the beginning of the documentation.
It seems that you'll have to encode the Unicode strings to byte strings, and use io.BytesIO, instead of io.StringIO.
The examples section of the documentation includes examples for a UnicodeReader and UnicodeWriter wrapper classes (thanks #AlexeyKachayev for the pointer).
Please use StringIO.StringIO().
http://docs.python.org/library/io.html#io.StringIO
http://docs.python.org/library/stringio.html
io.StringIO is a class. It handles Unicode. It reflects the preferred Python 3 library structure.
StringIO.StringIO is a class. It handles strings. It reflects the legacy Python 2 library structure.
I found this when I tried to serve a CSV file via Flask directly without creating the CSV file on the file system. This works:
import io
import csv
data = [[u'cell one', u'cell two'], [u'cell three', u'cell four']]
output = io.BytesIO()
writer = csv.writer(output, delimiter=',')
writer.writerows(data)
your_csv_string = output.getvalue()
See also
More about CSV
The Flask part
A few notes about Strings / Unicode
From csv documentation:
The csv module doesn’t directly support reading and writing Unicode,
but it is 8-bit-clean save for some problems with ASCII NUL
characters. So you can write functions or classes that handle the
encoding and decoding for you as long as you avoid encodings like
UTF-16 that use NULs. UTF-8 is recommended.
You can find example of UnicodeReader, UnicodeWriter here http://docs.python.org/2/library/csv.html
To use CSV reader/writer with 'memory files' in python 2.7:
from io import BytesIO
import csv
csv_data = """a,b,c
foo,bar,foo"""
# creates and stores your csv data into a file the csv reader can read (bytes)
memory_file_in = BytesIO(csv_data.encode(encoding='utf-8'))
# classic reader
reader = csv.DictReader(memory_file_in)
# writes a csv file
fieldnames = reader.fieldnames # here we use the data from the above csv file
memory_file_out = BytesIO() # create a memory file (bytes)
# classic writer (here we copy the first file in the second file)
writer = csv.DictWriter(memory_file_out, fieldnames)
for row in reader:
print(row)
writer.writerow(row)

Categories

Resources