How can I use io.StringIO() with the csv module? - python

I tried to backport a Python 3 program to 2.7, and I'm stuck with a strange problem:
>>> import io
>>> import csv
>>> output = io.StringIO()
>>> output.write("Hello!") # Fail: io.StringIO expects Unicode
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unicode argument expected, got 'str'
>>> output.write(u"Hello!") # This works as expected.
6L
>>> writer = csv.writer(output) # Now let's try this with the csv module:
>>> csvdata = [u"Hello", u"Goodbye"] # Look ma, all Unicode! (?)
>>> writer.writerow(csvdata) # Sadly, no.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unicode argument expected, got 'str'
According to the docs, io.StringIO() returns an in-memory stream for Unicode text. It works correctly when I try and feed it a Unicode string manually. Why does it fail in conjunction with the csv module, even if all the strings being written are Unicode strings? Where does the str come from that causes the Exception?
(I do know that I can use StringIO.StringIO() instead, but I'm wondering what's wrong with io.StringIO() in this scenario)

The Python 2.7 csv module doesn't support Unicode input: see the note at the beginning of the documentation.
It seems that you'll have to encode the Unicode strings to byte strings, and use io.BytesIO, instead of io.StringIO.
The examples section of the documentation includes examples for a UnicodeReader and UnicodeWriter wrapper classes (thanks #AlexeyKachayev for the pointer).

Please use StringIO.StringIO().
http://docs.python.org/library/io.html#io.StringIO
http://docs.python.org/library/stringio.html
io.StringIO is a class. It handles Unicode. It reflects the preferred Python 3 library structure.
StringIO.StringIO is a class. It handles strings. It reflects the legacy Python 2 library structure.

I found this when I tried to serve a CSV file via Flask directly without creating the CSV file on the file system. This works:
import io
import csv
data = [[u'cell one', u'cell two'], [u'cell three', u'cell four']]
output = io.BytesIO()
writer = csv.writer(output, delimiter=',')
writer.writerows(data)
your_csv_string = output.getvalue()
See also
More about CSV
The Flask part
A few notes about Strings / Unicode

From csv documentation:
The csv module doesn’t directly support reading and writing Unicode,
but it is 8-bit-clean save for some problems with ASCII NUL
characters. So you can write functions or classes that handle the
encoding and decoding for you as long as you avoid encodings like
UTF-16 that use NULs. UTF-8 is recommended.
You can find example of UnicodeReader, UnicodeWriter here http://docs.python.org/2/library/csv.html

To use CSV reader/writer with 'memory files' in python 2.7:
from io import BytesIO
import csv
csv_data = """a,b,c
foo,bar,foo"""
# creates and stores your csv data into a file the csv reader can read (bytes)
memory_file_in = BytesIO(csv_data.encode(encoding='utf-8'))
# classic reader
reader = csv.DictReader(memory_file_in)
# writes a csv file
fieldnames = reader.fieldnames # here we use the data from the above csv file
memory_file_out = BytesIO() # create a memory file (bytes)
# classic writer (here we copy the first file in the second file)
writer = csv.DictWriter(memory_file_out, fieldnames)
for row in reader:
print(row)
writer.writerow(row)

Related

Read in-memory csv file with open() in python

I have a dataframe. I want to write this dataframe from an excel file to a csv without saving it to disk. Looked like StringIO() was the clear solution. I then wanted to open the file like object from in memory with open() but was getting a type error. How do I get around this type error and read the in memory csv file with open()? Or, actually, is it wrong to assume the open() will work?
TypeError: expected str, bytes or os.PathLike object, not StringIO
The error cites the row below. From the code that's even further below.
f = open(writer_file)
To get a proper example I had to open the file "pandas_example" after creation. I then removed a row, then the code runs into an empty row.
from pandas import util
df = util.testing.makeDataFrame()
df.to_excel('pandas_example.xlsx')
df_1 = pd.read_excel('pandas_example.xlsx')
writer_file = io.StringIO()
write_to_the_in_mem_file = csv.writer(writer_file, dialect='excel', delimiter=',')
write_to_the_in_mem_file.writerow(df_1)
f = open(writer_file)
while f.readline() not in (',,,,,,,,,,,,,,,,,,\n', '\n'):
pass
final_df = pd.read_csv(f, header=None)
f.close()
Think of writer_file as what is returned from open(), you don't need to open it again.
For example:
import pandas as pd
from pandas import util
import io
# Create test file
df = util.testing.makeDataFrame()
df.to_excel('pandas_example.xlsx')
df_1 = pd.read_excel('pandas_example.xlsx')
writer_file = io.StringIO()
df_1.to_csv(writer_file)
writer_file.seek(0) # Rewind back to the start
for line in writer_file:
print(line.strip())
The to_csv() call writes the dataframe into your memory file in CSV format.
After
writer_file = io.StringIO()
, writer_file is already a file-like object. It already has a readline method, as well as read, write, seek, etc. See io.TextIOBase, which io.StringIO inherits from.
In other words, open(writer_file) is unnecessary or, rather, it results in a type error (as you already observed).

Convert bytes to a file object in python

I have a small application that reads local files using:
open(diefile_path, 'r') as csv_file
open(diefile_path, 'r') as file
and also uses linecache module
I need to expand the use to files that send from a remote server.
The content that is received by the server type is bytes.
I couldn't find a lot of information about handling IOBytes type and I was wondering if there is a way that I can convert the bytes chunk to a file-like object.
My goal is to use the API is specified above (open,linecache)
I was able to convert the bytes into a string using data.decode("utf-8"),
but I can't use the methods above (open and linecache)
a small example to illustrate
data = 'b'First line\nSecond line\nThird line\n'
with open(data) as file:
line = file.readline()
print(line)
output:
First line
Second line
Third line
can it be done?
open is used to open actual files, returning a file-like object. Here, you already have the data in memory, not in a file, so you can instantiate the file-like object directly.
import io
data = b'First line\nSecond line\nThird line\n'
file = io.StringIO(data.decode())
for line in file:
print(line.strip())
However, if what you are getting is really just a newline-separated string, you can simply split it into a list directly.
lines = data.decode().strip().split('\n')
The main difference is that the StringIO version is slightly lazier; it has a smaller memory foot print compared to the list, as it splits strings off as requested by the iterator.
The answer above that using StringIO would need to specify an encoding, which may cause wrong conversion.
from Python Documentation using BytesIO:
from io import BytesIO
f = BytesIO(b"some initial binary data: \x00\x01")

Import Data from scraping into CSV

I'm using pycharm and Python 3.7.
I would like to write data in a csv, but my code writes in the File just the first line of my data... someone knows why?
This is my code:
from pytrends.request import TrendReq
import csv
pytrend = TrendReq()
pytrend.build_payload(kw_list=['auto model A',
'auto model C'])
# Interest Over Time
interest_over_time_df = pytrend.interest_over_time()
print(interest_over_time_df.head(100))
writer=csv.writer(open("C:\\Users\\
Desktop\\Data\\c.csv", 'w', encoding='utf-8'))
writer.writerow(interest_over_time_df)
try using pandas,
import pandas as pd
interest_over_time_df.to_csv("file.csv")
Once i encountered the same problem and solve it like below:
with open("file.csv", "rb", encoding="utf-8) as fh:
precise details:
r = read mode
b = mode specifier in the open() states that the file shall be treated as binary,
so contents will remain a bytes. No decoding attempt will happen this way.
As we know python tries to convert a byte-array (a bytes which it assumes to be a utf-8-encoded string) to a unicode string (str). This process of course is a decoding according to utf-8 rules. When it tries this, it encounters a byte sequence which is not allowed in utf-8-encoded strings (namely this 0xff at position 0).
You could try something like:
import csv
with open(<path to output_csv>, "wb") as csv_file:
writer = csv.writer(csv_file, delimiter=',')
for line in interest_over_time_df:
writer.writerow(line)
Read more here: https://www.pythonforbeginners.com/files/with-statement-in-python
You need to loop over the data and write in line by line

io.StringIO vs open() in Python 3

All I could find is this statement:
The easiest way to create a text stream is with open(), optionally
specifying an encoding:
f = open("myfile.txt", "r", encoding="utf-8")
In-memory text streams are also available as StringIO objects:
f = io.StringIO("some initial text data")
But this gives no insight at all on when I should use open() over io.StringIO and vice-versa. I know that they do not work exactly the same behind the scene. But why would someone go for open() in Python 3 ?
The difference is: open takes a file name (and some other arguments like mode or encoding), io.StringIO takes a plain string and both return file-like objects.
Hence:
Use open to read files ;
Use StringIO when you need a file-like object and you want to pass the content of a string.
An example with StringIO:
import csv
import io
reader = csv.reader(io.StringIO("a,b,c\n1,2,3"))
print ([r for r in reader])
# output [['a', 'b', 'c'], ['1', '2', '3']]
It's very useful because you can use a string where a file was expected.
In the usual case, with a csv file on your disk, you would write something like:
with open(<path/to/file.csv>, ...) as f:
reader = csv.reader(f, ...)

industrial strength csv reader (python)

Here's my use case: It's my job to clean CSV files which are often scrapped from web pages (most are english but some german and other weird non unicode characters sneak in there). Python 3 is "utf-8" by default and the usual
import csv
#open file
with open('input.csv','r',encoding = 'utf-8')
reader = csv.reader(f)
fails with UnicodeEncodeError even with try/catch blocks everywhere
I can't figure out how to clean the input if I can't even open it. My end goal is simply to read each line into a list I call text.
I'm out of ideas I've even tried the following:
for encoding in ('utf-8','latin-1',etc, etc):
try:
//open the file
I can't make any assumptions about the encoding as they may be written on a unix machine in another part of the world and I'm on a windows machine. The input are just simple strings otherwise example
test case: "This is an example of a test case and the test may wrap around to a new line when opened in a text processor"
Maybe try reading in the contents entirely, then using bytes.decode() in much the same way you mentioned:
#!python3
import csv
from io import StringIO
with open('input.csv', 'rb') as binfile:
csv_bytes = binfile.readall()
for enc in ('utf-8', 'utf-16', 'latin1'):
try:
csv_string = csv_bytes.decode(encoding=enc, errors='strict')
break
except UnicodeError as e:
last_err = e
else: #none worked
raise last_err
with StringIO(csv_string) as csvfile:
csv = csv.reader(csvfile)
for row in csv:
print(row[0])

Categories

Resources