Python: generate xlsx in memory and stream file download?

Python: generate xlsx in memory and stream file download? - python

for example the following code creates the xlsx file first and then streams it as a download but I'm wondering if it is possible to send the xlsx data as it is being created. For example, imagine if a very large xlsx file needs to be generated, the user has to wait until it is finished and then receive the download, what I'd like is to start the xlsx file download in the user browser, and then send over the data as it is being generated. It seems trivial with a .csv file but not so with an xlsx file.
try:
import cStringIO as StringIO
except ImportError:
import StringIO
from django.http import HttpResponse
from xlsxwriter.workbook import Workbook
def your_view(request):
# your view logic here
# create a workbook in memory
output = StringIO.StringIO()
book = Workbook(output)
sheet = book.add_worksheet('test')
sheet.write(0, 0, 'Hello, world!')
book.close()
# construct response
output.seek(0)
response = HttpResponse(output.read(), mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")
response['Content-Disposition'] = "attachment; filename=test.xlsx"
return response

Are you able to write tempfiles to disk while generating the XLSX?
If you are able to use tempfile you won't be memory bound, which is nice, but the download will still only start when the XLSX writer is done assembling the document.
If you can't write tempfiles, you'll have to follow this example http://xlsxwriter.readthedocs.org/en/latest/example_http_server.html and your code is unfortunately completely memory bound.
Streaming CSV is very easy, on the other hand. Here is code we use to stream any iterator of rows in a CSV response:
import csv
import io
def csv_generator(data_generator):
csvfile = io.BytesIO()
csvwriter = csv.writer(csvfile)
def read_and_flush():
csvfile.seek(0)
data = csvfile.read()
csvfile.seek(0)
csvfile.truncate()
return data
for row in data_generator:
csvwriter.writerow(row)
yield read_and_flush()
def csv_stream_response(response, iterator, file_name="xxxx.csv"):
response.content_type = 'text/csv'
response.content_disposition = 'attachment;filename="' + file_name + '"'
response.charset = 'utf8'
response.content_encoding = 'utf8'
response.app_iter = csv_generator(iterator)
return response

xlsx format is a zip file that contains several individual files, so you can't create it on the fly and send it out as it is being created.

Related

Django: How do I download .xls file through a django view

I have a button which downloads a excel file with extension .xls. I am using module xlrd to parse the file and return it back to the user. However it appears to add the object name into the excel file instead of the data.
How can I return the file to the user with the data rather than the objects name?
View
def download_file(self, testname):
import csv, socket, os, xlrd
extension = '.xls'
path = r"C:\tests\{}_Report{}".format(testname, extension)
try:
f = xlrd.open_workbook(path)
response = HttpResponse(f, content_type='application/ms-excel')
response['Content-Disposition'] = 'attachment; filename={}_Report{}'.format(testname, extension)
return response
except Exception as Error:
return HttpResponse(Error)
return redirect('emissions_dashboard:overview_view_record')
Excel result
Download successful:
Content:
Note: I understand this is an old file format but is required for this particular project.

You are trying to send a xlrd.book.Book object, not a file.
You used xlrd to do your things in the workbook, and then saved to a file.
workbook = xlrd.open_workbook(path)
#... do something
workbook.save(path)
Now you send it like any other file:
with open(path, 'rb') as f:
response = HttpResponse(f.read(), content_type="application/ms-excel")
response['Content-Disposition'] = 'attachment; filename={}_Report{}'.format(testname, extension)

convert dataframe to excel and download

I have created a flask application where it take excel file and it cleans the data and gives the output in excel file. basically what happens is user uploads the excel file after submitting browser should download the filtered excel file.
can someone suggest me references? I need to know how to set the path. I tried converting it into the HTML by using but this code doesn't download but it automatically saves the cleaned file as HTML.
data1 = df.to_html()
#write html to file
text_file = open("data1.html", "w")
text_file.write(data1)
text_file.close()
return render_template("success.html", name = text_file)

I have an app that receive an input file, read it with pandas, process it (with a make_processing() function I created) and return it as .csv. Is almost the same for an excel file.
file = request.files['file']
content = file.read()
df = pd.read_csv(io.BytesIO(content))
df2 = make_processing(df)
si = io.StringIO()
df2.to_csv(si, index=False, encoding='utf8')
output = flask.make_response(si.getvalue())
output.headers["Content-Disposition"] = f"attachment; filename=periodicidad.csv"
output.headers["Content-type"] = "text/csv"
return output

Python Load CSV File from API and iterate over it in memory

I'm using the requests library in python to hit an API endpoint that is supposed to return a CSV file. In the API's documentation they give an example on how to get the file.
requestDownload = requests.request("GET", requestDownloadUrl, headers=headers, stream=True)
# with open("RequestFile.zip", "w") as f:
for chunk in requestDownload.iter_lines(chunk_size=1024):
f.write(chunk)
zipfile.ZipFile("RequestFile.zip").extractall("MyDownload")
I don't want to write out the file to a zip or anything else. I just want to iterate over each row. I've tried the following but it's returning binary instead of text.
from contextlib import closing
import csv
import requests
with closing(
requests.get(
'api_URL/csvfile',
stream=True,
)
) as r:
reader = csv.reader(
(line.replace('\0','') for line in r.iter_lines()),
delimiter=',',
quotechar='"'
)
for row in reader:
# Handle each row here...
print row
The result of printing out row is a bunch of the following:
['\x13\xa4\xa3\xedr\xae\xe6\x0b\x9b\x08\x9c\xabX\xda\xa3d%\\+\xcd\xd5\xfat\x13\xf3']
['51W\x91o\xe2\xef(\x19\x18\xa9\xe2}\xe2\xbca\xd4]\x93\x1d#8:\x8d\xab\xa0\x08\xe6\xd4\xc7\xc5\xcdb\xaf\x8d\xf6\xa2\x8d~s\xb5\xea?\x04\x1c\xfb\xc5\xed9\x
8c']
What do I need to do to see the actual text instead?

You can use the io module to read the url into a file-like object and then use that to create an in-memory zipfile. In this example, I didn't use streaming because the entire zipfile needs to be in memory to extract from it. At the point where the zipfile is created there are several copies of the data in memory which could be problematic on large files. You could potentially build a file-like object that wraps resp.iter_content but that was a bit much for this example.
from contextlib import closing
import requests
import zipfile
import io
import csv
with closing(requests.get("http://localhost:8000/test.zip")) as resp:
incore_zip = zipfile.ZipFile(io.BytesIO(resp.content))
try:
with incore_zip.open('test.csv') as fp:
reader = csv.reader(io.TextIOWrapper(fp, encoding="utf-8"))
for row in reader:
print(row)
finally:
del incore_zip

How to generate a file without saving it to disk in python?

I'm using Python 2.7 and Django 1.7.
I have a method in my admin interface that generates some kind of a csv file.
def generate_csv(args):
...
#some code that generates a dictionary to be written as csv
....
# this creates a directory and returns its filepath
dirname = create_csv_dir('stock')
csvpath = os.path.join(dirname, 'mycsv_file.csv')
fieldnames = [#some field names]
# this function creates the csv file in the directory shown by the csvpath
newcsv(data, csvheader, csvpath, fieldnames)
# this automatically starts a download from that directory
return HttpResponseRedirect('/media/csv/stock/%s' % csvfile)
All in all I create a csv file, save it somewhere on the disk, and then pass its URL to the user for download.
I was thinking if all this can be done without writing to disc. I googled around a bit and maybe content disposition attachment might help me, but I got lost in documentation a bit.
Anyway if there's an easier way of doing this I'd love to know.

Thanks to #Ragora, you pointed me towards the right direction.
I rewrote the newcsv method:
from io import StringIO
import csv
def newcsv(data, csvheader, fieldnames):
"""
Create a new csv file that represents generated data.
"""
new_csvfile = StringIO.StringIO()
wr = csv.writer(new_csvfile, quoting=csv.QUOTE_ALL)
wr.writerow(csvheader)
wr = csv.DictWriter(new_csvfile, fieldnames = fieldnames)
for key in data.keys():
wr.writerow(data[key])
return new_csvfile
and in the admin:
csvfile = newcsv(data, csvheader, fieldnames)
response = HttpResponse(csvfile.getvalue(), content_type='text/csv')
response['Content-Disposition'] = 'attachment; filename=stock.csv'
return response

If it annoys you that you are saving a file to disk, just add the application/octet-stream content-type to the Content-Disposition header then delete the file from disk.
If this header (Content-Disposition) is used in a response with the application/octet- stream content-type, the implied suggestion is that the user agent should not display the response, but directly enter a `save response as...' dialog.

Sending multiple .CSV files to .ZIP without storing to disk in Python

I'm working on a reporting application for my Django powered website. I want to run several reports and have each report generate a .csv file in memory that can be downloaded in batch as a .zip. I would like to do this without storing any files to disk. So far, to generate a single .csv file, I am following the common operation:
mem_file = StringIO.StringIO()
writer = csv.writer(mem_file)
writer.writerow(["My content", my_value])
mem_file.seek(0)
response = HttpResponse(mem_file, content_type='text/csv')
response['Content-Disposition'] = 'attachment; filename=my_file.csv'
This works fine, but only for a single, unzipped .csv. If I had, for example, a list of .csv files created with a StringIO stream:
firstFile = StringIO.StringIO()
# write some data to the file
secondFile = StringIO.StringIO()
# write some data to the file
thirdFile = StringIO.StringIO()
# write some data to the file
myFiles = [firstFile, secondFile, thirdFile]
How could I return a compressed file that contains all objects in myFiles and can be properly unzipped to reveal three .csv files?

zipfile is a standard library module that does exactly what you're looking for. For your use-case, the meat and potatoes is a method called "writestr" that takes a name of a file and the data contained within it that you'd like to zip.
In the code below, I've used a sequential naming scheme for the files when they're unzipped, but this can be switched to whatever you'd like.
import zipfile
import StringIO
zipped_file = StringIO.StringIO()
with zipfile.ZipFile(zipped_file, 'w') as zip:
for i, file in enumerate(files):
file.seek(0)
zip.writestr("{}.csv".format(i), file.read())
zipped_file.seek(0)
If you want to future-proof your code (hint hint Python 3 hint hint), you might want to switch over to using io.BytesIO instead of StringIO, since Python 3 is all about the bytes. Another bonus is that explicit seeks are not necessary with io.BytesIO before reads (I haven't tested this behavior with Django's HttpResponse, so I've left that final seek in there just in case).
import io
import zipfile
zipped_file = io.BytesIO()
with zipfile.ZipFile(zipped_file, 'w') as f:
for i, file in enumerate(files):
f.writestr("{}.csv".format(i), file.getvalue())
zipped_file.seek(0)

The stdlib comes with the module zipfile, and the main class, ZipFile, accepts a file or file-like object:
from zipfile import ZipFile
temp_file = StringIO.StringIO()
zipped = ZipFile(temp_file, 'w')
# create temp csv_files = [(name1, data1), (name2, data2), ... ]
for name, data in csv_files:
data.seek(0)
zipped.writestr(name, data.read())
zipped.close()
temp_file.seek(0)
# etc. etc.
I'm not a user of StringIO so I may have the seek and read out of place, but hopefully you get the idea.

def zipFiles(files):
outfile = StringIO() # io.BytesIO() for python 3
with zipfile.ZipFile(outfile, 'w') as zf:
for n, f in enumarate(files):
zf.writestr("{}.csv".format(n), f.getvalue())
return outfile.getvalue()
zipped_file = zip_files(myfiles)
response = HttpResponse(zipped_file, content_type='application/octet-stream')
response['Content-Disposition'] = 'attachment; filename=my_file.zip'
StringIO has getvalue method which return the entire contents. You can compress the zipfile
by zipfile.ZipFile(outfile, 'w', zipfile.ZIP_DEFLATED). Default value of compression is ZIP_STORED which will create zip file without compressing.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: generate xlsx in memory and stream file download? - python

xlsx format is a zip file that contains several individual files, so you can't create it on the fly and send it out as it is being created.

Related

Django: How do I download .xls file through a django view

convert dataframe to excel and download

Python Load CSV File from API and iterate over it in memory

How to generate a file without saving it to disk in python?

Sending multiple .CSV files to .ZIP without storing to disk in Python

Categories

Resources