Streaming a generated CSV with Flask

Streaming a generated CSV with Flask - python

I have this function for streaming text files:
def txt_response(filename, iterator):
if not filename.endswith('.txt'):
filename += '.txt'
filename = filename.format(date=str(datetime.date.today()).replace(' ', '_'))
response = Response((_.encode('utf-8')+'\r\n' for _ in iterator), mimetype='text/txt')
response.headers['Content-Disposition'] = 'attachment; filename={filename}'.format(filename=filename)
return response
I am working out how to stream a CSV in a similar manner. This page gives an example, but I wish to use the CSV module.
I can use StringIO and create a fresh "file" and CSV writer for each line, but it seems very inefficient. Is there a better way?

According to this answer how do I clear a stringio object? it is quicker to just create a new StringIO object for each line in the file than the method I use below. However if you still don't want to create new StringIO instances you can achieve what you want like this:
import csv
import StringIO
from flask import Response
def iter_csv(data):
line = StringIO.StringIO()
writer = csv.writer(line)
for csv_line in data:
writer.writerow(csv_line)
line.seek(0)
yield line.read()
line.truncate(0)
line.seek(0) # required for Python 3
def csv_response(data):
response = Response(iter_csv(data), mimetype='text/csv')
response.headers['Content-Disposition'] = 'attachment; filename=data.csv'
return response
If you just want to stream back the results as they are created by csv.writer you can create a custom object implementing an interface the writer expects.
import csv
from flask import Response
class Line(object):
def __init__(self):
self._line = None
def write(self, line):
self._line = line
def read(self):
return self._line
def iter_csv(data):
line = Line()
writer = csv.writer(line)
for csv_line in data:
writer.writerow(csv_line)
yield line.read()
def csv_response(data):
response = Response(iter_csv(data), mimetype='text/csv')
response.headers['Content-Disposition'] = 'attachment; filename=data.csv'
return response

A slight improvement to Justin's existing great answer. You can take advantage of the fact that csv.writerow() returns the value returned by the underlying file's write call.
import csv
from flask import Response
class DummyWriter:
def write(self, line):
return line
def iter_csv(data):
writer = csv.writer(DummyWriter())
for row in data:
yield writer.writerow(row)
def csv_response(data):
response = Response(iter_csv(data), mimetype='text/csv')
response.headers['Content-Disposition'] = 'attachment; filename=data.csv'
return response

If you are dealing with large amounts of data that you don't want to store in memory then you could use SpooledTemporaryFile. This would use StringIO until it reaches a max_size after that it will roll over to disk.
However, I would stick with the recommended answer if you just want to stream back the results as they are created.

Related

How to have multiple return statement in python django?

In my Views.py, I try to download the csv file first to client side then only wish to redirect to another page.
Below was my code
def main(request):
...
...
url = '//file1.km.in.com/sd/recipe/' +"/"+ model + "_" + code + ".csv"
filename=model + "_" + code + ".csv"
download_csv(url,filename)
data = {"TableForm": TableForm, "devicelist": devicelist_1}
time.sleep(10)
return redirect('/ProductList/BetaTest', data)
def download_csv(url,filename):
csv = process_file(url)
response = HttpResponse(csv, content_type='text/csv')
response['Content-Disposition'] = 'attachment; filename='+filename
return response
def process_file(file_handle):
df = pd.read_csv(file_handle, index_col=False)
return df.to_csv(index=False)
However, download function didn't work but it directly redirect to BetaTest page.
I try to edit to below code, and now the download function is work but it cannot redirect to another page:
def main(request):
...
...
data = {"TableForm": TableForm, "devicelist": devicelist_1}
csv = process_file(url)
response = HttpResponse(csv, content_type='text/csv')
response['Content-Disposition'] = 'attachment; filename='+filename
return response
time.sleep(10)
return redirect('/ProductList/BetaTest', data)
Is there any ideas to overcome this issue?

A return statement is used to end the execution of the function call and “returns” the result (value of the expression following the return keyword) to the caller. The statements after the return statements are not executed. If the return statement is without any expression, then the special value None is returned.
So there is no possibility to call two return statements in a single call.

You can return multiple values by simply returning them separated by commas.
return example1, example2

Instead of using return, you should use yield:
def stream_response(request):
def generator():
for x in range(1,11):
yield f"{x}\n"
time.sleep(1)
return StreamingHttpResponse(generator())

Try this:
views.py
def main(request):
...
...
data = {"TableForm": TableForm, "devicelist": devicelist_1}
if request.method=='GET':
csv = process_file(url)
response = HttpResponse(csv, content_type='text/csv')
response['Content-Disposition'] = 'attachment; filename='+filename
return response
return redirect('Url_NAME', data)

Python requests: Stream iter_content chunks into a pandas read_csv function

I'm trying to read a huge csv.gz file from a url into chunks and write it into a database on the fly. I have to do all this in memory, no data can exist on disk.
I have the below generator function that generates the response chunks into Dataframe objects.
It works using the request's response.raw as input for the pd.read_csv function, but it appears unreliable and can sometimes throw the timeout error: urllib3.exceptions.ProtocolError: ('Connection broken: OSError("(10054, \'WSAECONNRESET\')",)', OSError("(10054, 'WSAECONNRESET')",))
response = session.get(target, stream=True)
df_it = pd.read_csv(response.raw, compression='gzip', chunksize=10**6,
header=None, dtype=str, names=columns, parse_dates=['datetime'])
for i, df in enumerate(self.process_df(df_it)):
if df.empty:
continue
if (i % 10) == 0:
time.sleep(10)
yield df
I decided to use iter_content instead, as I read it should be more reliable. I have implemented the below functionality, but I'm getting this error: EOFError: Compressed file ended before the end-of-stream marker was reached.
I think it's to do with the fact I'm passing in a compressed Bytes object (?) but I'm not sure how to pass pandas.read_csv an object it will accept.
response = session.get(target, stream=True)
for chunk in response.iter_content(chunk_size=10**6):
file_obj = io.BytesIO()
file_obj.write(chunk)
file_obj.seek(0)
df_it = pd.read_csv(file_obj, compression='gzip', dtype=str,
header=None, names=columns, parse_dates=['datetime'])
for i, df in enumerate(self.process_df(df_it)):
if df.empty:
continue
if (i % 10) == 0:
time.sleep(10)
yield df
Any ideas greatly appreciated !
Thanks

You may wish to try this:
def iterable_to_stream(iterable, buffer_size=io.DEFAULT_BUFFER_SIZE):
"""
Lets you use an iterable (e.g. a generator) that yields bytestrings as a read-only
input stream.
The stream implements Python 3's newer I/O API (available in Python 2's io module).
For efficiency, the stream is buffered.
"""
class IterStream(io.RawIOBase):
def __init__(self):
self.leftover = None
def readable(self):
return True
def readinto(self, b):
try:
l = len(b) # We're supposed to return at most this much
chunk = self.leftover or next(iterable)
output, self.leftover = chunk[:l], chunk[l:]
b[:len(output)] = output
return len(output)
except StopIteration:
return 0 # indicate EOF
return io.BufferedReader(IterStream(), buffer_size=buffer_size)
Then
response = session.get(target, stream=True)
response.raw.decode_content = decode
df = pd.read_csv(iterable_to_stream(response.iter_content()), sep=';')
I use this to stream csv files in odsclient. It seems to work, although I did not try with gz compression.
Source: https://stackoverflow.com/a/20260030/7262247

Django csv taking very long time Outputting CSV

I have a model which has a text entry of about 90,000 and I am outputting Django CSV but it is not converting outputting CVS i left browser for half an hour aand no output.But the method i used it worked fine when data is low.
My method:-
def usertype_csv(request):
response = HttpResponse(content_type='text/csv')
response['Content-Disposition'] = 'attachment; filename="university_list.csv"'
writer = csv.writer(response)
news_obj = Users.objects.using('cms').all()
writer.writerow(['NAME', 'USERNAME', 'E-MAIL ID','USER TYPE','USER TYPE'])
for item in news_obj:
writer.writerow([item.name.encode('UTF-8'),item.username.encode('UTF-8'),item.email.encode('UTF-8'),
item.userTypeId.userType.encode('UTF-8'),item.universityId.name.encode('UTF-8')])
return response
I have testing this for smaller data it worked but for very larger files it is not working.
Thanks in advance

Any large CSV files you generate should ideally be streamed back to the user

Here you can find an example on how django handles it.
This is how I would do it:
import csv
from django.http import StreamingHttpResponse
class Echo(object):
"""An object that implements just the write method of the file-like
interface.
"""
def write(self, value):
"""Write the value by returning it, instead of storing in a buffer."""
return value
def some_streaming_csv_view(request):
"""A view that streams a large CSV file."""
# Generate a sequence of rows. The range is based on the maximum number of
# rows that can be handled by a single sheet in most spreadsheet
# applications.
rows = (["Row {}".format(idx), str(idx)] for idx in range(165536))
pseudo_buffer = Echo()
news_obj = Users.objects.using('cms').all()
writer = csv.writer(pseudo_buffer)
response = StreamingHttpResponse((writer.writerow([item.name.encode('UTF-8'), item.username.encode('UTF-8'), item.email.encode('UTF-8'), item.userTypeId.userType.encode('UTF-8'), item.universityId.name.encode('UTF-8')]) for item in news_obj), content_type="text/csv")
response['Content-Disposition'] = 'attachment; filename="university_list.csv"'
return response

Editing a downloaded CSV in memory before writing

Forewarning: I am very new to Python and programming in general. I am trying to use Python 3 to get some CSV data and making some changes to it before writing it to a file. My problem lies in accessing the CSV data from a variable, like so:
import csv
import requests
csvfile = session.get(url)
reader = csv.reader(csvfile.content)
for row in reader:
do(something)
This returns:
_csv.Error: iterator should return strings, not int (did you open the file in text mode?)
Googling revealed that I should be feeding the reader text instead of bytes, so I also attempted:
reader = csv.reader(csvfile.text)
This also does not work as the loop works through it letter by letter instead of line by line. I also experimented with TextIOWrapper and similar options with no success. The only way I have managed to get this to work is by writing the data to a file, reading it, and then making changes, like so:
csvfile = session.get(url)
with open("temp.txt", 'wb') as f:
f.write(csvfile.content)
with open("temp.txt", 'rU', encoding="utf8") as data:
reader = csv.reader(data)
for row in reader:
do(something)
I feel like this is far from the most optimal way of doing this, even if it works. What is the proper way to read and edit the CSV data directly from memory, without having to save it to a temporary file?

you don't have to write to a temp file, here is what I would do, using the "csv" and "requests" modules:
import csv
import requests
__csvfilepathname__ = r'c:\test\test.csv'
__url__ = 'https://server.domain.com/test.csv'
def csv_reader(filename, enc = 'utf_8'):
with open(filename, 'r', encoding = enc) as openfileobject:
reader = csv.reader(openfileobject)
for row in reader:
#do something
print(row)
return
def csv_from_url(url):
line = ''
datalist = []
s = requests.Session()
r = s.get(url)
for x in r.text.replace('\r',''):
if not x[0] == '\n':
line = line + str(x[0])
else:
datalist.append(line)
line = ''
datalist.append(line)
# at this point you already have a data list 'datalist'
# no need really to use the csv.reader object, but here goes:
reader = csv.reader(datalist)
for row in reader:
#do something
print(row)
return
def main():
csv_reader(__csvfilepathname__)
csv_from_url(__url__)
return
if __name__ == '__main__':
main ()
not very pretty, and probably not very good in regards to memory/performance, depending on how "big" your csv/data is
HTH, Edwin.

Python CSV function outputs blank CSV when not enough rows?

I have a function which takes a list of custom objects, conforms some values then writes them to a CSV file. Something really strange is happening in that when the list only contains a few objects, the resulting CSV file is always blank. When the list is longer, the function works fine. Is it some kind of weird anomaly with the temporary file perhaps?
I have to point out that this function returns the temporary file to a web server allowing the user to download the CSV. The web server function is below the main function.
def makeCSV(things):
from tempfile import NamedTemporaryFile
# make the csv headers from an object
headers = [h for h in dir(things[0]) if not h.startswith('_')]
# this just pretties up the object and returns it as a dict
def cleanVals(item):
new_item = {}
for h in headers:
try:
new_item[h] = getattr(item, h)
except:
new_item[h] = ''
if isinstance(new_item[h], list):
if new_item[h]:
new_item[h] = [z.__str__() for z in new_item[h]]
new_item[h] = ', '.join(new_item[h])
else:
new_item[h] = ''
new_item[h] = new_item[h].__str__()
return new_item
things = map(cleanVals, things)
f = NamedTemporaryFile(delete=True)
dw = csv.DictWriter(f,sorted(headers),restval='',extrasaction='ignore')
dw.writer.writerow(dw.fieldnames)
for t in things:
try:
dw.writerow(t)
# I can always see the dicts here...
print t
except Exception as e:
# and there are no exceptions
print e
return f
Web server function:
f = makeCSV(search_results)
response = FileResponse(f.name)
response.headers['Content-Disposition'] = (
"attachment; filename=export_%s.csv" % collection)
return response
Any help or advice greatly appreciated!

Summarizing eumiro's answer: the file needs to be flushed. Call f.flush() at the end of makeCSV().

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Streaming a generated CSV with Flask - python

Related

How to have multiple return statement in python django?

Python requests: Stream iter_content chunks into a pandas read_csv function

Django csv taking very long time Outputting CSV

Editing a downloaded CSV in memory before writing

Python CSV function outputs blank CSV when not enough rows?

Categories

Resources