StringIO generated csv file that includes BOM - python

I'm trying to generate a CSV that opens correctly in Excel, but using StringIO instead of a file.
output = StringIO("\xef\xbb\xbf") # Tried just writing a BOM here, didn't work
fieldnames = ['id', 'value']
writer = csv.DictWriter(output, fieldnames, dialect='excel')
writer.writeheader()
for d in Data.objects.all():
writer.writerow({
'id': d.id,
'value': d.value
})
response = HttpResponse(output.getvalue(), content_type='text/csv')
response['Content-Disposition'] = 'attachment; filename=data.csv')
return response
This is part of a Django view, so I'd really rather not get into the business of dumping out temporary files for this.
I've also tried the following:
response = HttpResponse(output.getvalue().encode('utf-8').decode('utf-8-sig'), content_type='text/csv')
with no luck
What can I do to get the output file correctly encoded in utf-8-sig, with a BOM, so that Excel will open the file and show multi-byte unicode characters correctly?

HttpResponse accepts bytes:
output = StringIO()
...
response = HttpResponse(output.getvalue().encode('utf-8-sig'), content_type='text/csv')
or let Django do encoding:
response = HttpResponse(output.getvalue(), content_type='text/csv; charset=utf-8-sig')

An alternate way:
import codecs
csv: StringIO
# ..
resp = response.text(f'{codecs.BOM_UTF8.decode("utf-8")}{csv.getvalue()}',
content_type='text/csv; charset=utf-8',
headers={'Content-Disposition': f'Attachment; filename={filename}'})

Related

How to read a csv django http response

In a view, I create a Django HttpResponse object composed entirely of a csv using a simply csv writer:
response = HttpResponse(content_type='text/csv')
response['Content-Disposition'] = 'attachment; filename="foobar.csv"'
writer = csv.writer(response)
table_headers = ['Foo', 'Bar']
writer.writerow(table_headers)
bunch_of_rows = [['foo', 'bar'], ['foo2', 'bar2']]
for row in bunch_of_rows:
writer.writerow(row)
return response
In a unit test, I want to test some aspects of this csv, so I need to read it. I'm trying to do so like so:
response = views.myview(args)
reader = csv.reader(response.content)
headers = next(reader)
row_count = 1 + sum(1 for row in reader)
self.assertEqual(row_count, 3) # header + 1 row for each attempt
self.assertIn('Foo', headers)
But the test fails with the following on the headers = next(reader) line:
nose.proxy.Error: iterator should return strings, not int (did you open the file in text mode?)
I see in the HttpResponse source that response.content is spitting the string back out as a byte-string, but I'm not sure the correct way to deal with that to let csv.reader read the file correctly. I thought I would be able to just replace response.content with response (since you write to the object itself, not it's content), but that just resulted in a slight variation in the error:
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
Which seems closer but obviously still wrong. Reading the csv docs, I assume I am failing to open the file correctly. How do I "open" this file-like object so that csv.reader can parse it?
response.content provides bytes. You need to decode this to a string:
foo = response.content.decode('utf-8')
Then pass this string to the csv reader using io.StringIO:
import io
reader = csv.reader(io.StringIO(foo))
You can use io.TextIOWrapper to convert the provided bytestring to a text stream:
import io
reader = csv.reader(io.TextIOWrapper(io.BytesIO(response.content), encoding='utf-8'))
This will convert the bytes to strings as they're being read by the reader.

In Memory CSV download file name

I am using csv writer to create a csv in memory and returning it as a response using django.
My code looks like this.
response = HttpResponse (content_type='text/csv')
writer = csv.writer(response)
writer.writerow(['rizwan','mumtaz'])
......
return response
The code is working fine, but every time I get 'download.csv' how can I change the name 'download.csv' to somethigelse.csv
response['Content-Disposition'] = 'attachment; filename="somefilename.csv"'
https://docs.djangoproject.com/en/stable/howto/outputting-csv/

How to generate a file without saving it to disk in python?

I'm using Python 2.7 and Django 1.7.
I have a method in my admin interface that generates some kind of a csv file.
def generate_csv(args):
...
#some code that generates a dictionary to be written as csv
....
# this creates a directory and returns its filepath
dirname = create_csv_dir('stock')
csvpath = os.path.join(dirname, 'mycsv_file.csv')
fieldnames = [#some field names]
# this function creates the csv file in the directory shown by the csvpath
newcsv(data, csvheader, csvpath, fieldnames)
# this automatically starts a download from that directory
return HttpResponseRedirect('/media/csv/stock/%s' % csvfile)
All in all I create a csv file, save it somewhere on the disk, and then pass its URL to the user for download.
I was thinking if all this can be done without writing to disc. I googled around a bit and maybe content disposition attachment might help me, but I got lost in documentation a bit.
Anyway if there's an easier way of doing this I'd love to know.
Thanks to #Ragora, you pointed me towards the right direction.
I rewrote the newcsv method:
from io import StringIO
import csv
def newcsv(data, csvheader, fieldnames):
"""
Create a new csv file that represents generated data.
"""
new_csvfile = StringIO.StringIO()
wr = csv.writer(new_csvfile, quoting=csv.QUOTE_ALL)
wr.writerow(csvheader)
wr = csv.DictWriter(new_csvfile, fieldnames = fieldnames)
for key in data.keys():
wr.writerow(data[key])
return new_csvfile
and in the admin:
csvfile = newcsv(data, csvheader, fieldnames)
response = HttpResponse(csvfile.getvalue(), content_type='text/csv')
response['Content-Disposition'] = 'attachment; filename=stock.csv'
return response
If it annoys you that you are saving a file to disk, just add the application/octet-stream content-type to the Content-Disposition header then delete the file from disk.
If this header (Content-Disposition) is used in a response with the application/octet- stream content-type, the implied suggestion is that the user agent should not display the response, but directly enter a `save response as...' dialog.

Python: generate xlsx in memory and stream file download?

for example the following code creates the xlsx file first and then streams it as a download but I'm wondering if it is possible to send the xlsx data as it is being created. For example, imagine if a very large xlsx file needs to be generated, the user has to wait until it is finished and then receive the download, what I'd like is to start the xlsx file download in the user browser, and then send over the data as it is being generated. It seems trivial with a .csv file but not so with an xlsx file.
try:
import cStringIO as StringIO
except ImportError:
import StringIO
from django.http import HttpResponse
from xlsxwriter.workbook import Workbook
def your_view(request):
# your view logic here
# create a workbook in memory
output = StringIO.StringIO()
book = Workbook(output)
sheet = book.add_worksheet('test')
sheet.write(0, 0, 'Hello, world!')
book.close()
# construct response
output.seek(0)
response = HttpResponse(output.read(), mimetype="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")
response['Content-Disposition'] = "attachment; filename=test.xlsx"
return response
Are you able to write tempfiles to disk while generating the XLSX?
If you are able to use tempfile you won't be memory bound, which is nice, but the download will still only start when the XLSX writer is done assembling the document.
If you can't write tempfiles, you'll have to follow this example http://xlsxwriter.readthedocs.org/en/latest/example_http_server.html and your code is unfortunately completely memory bound.
Streaming CSV is very easy, on the other hand. Here is code we use to stream any iterator of rows in a CSV response:
import csv
import io
def csv_generator(data_generator):
csvfile = io.BytesIO()
csvwriter = csv.writer(csvfile)
def read_and_flush():
csvfile.seek(0)
data = csvfile.read()
csvfile.seek(0)
csvfile.truncate()
return data
for row in data_generator:
csvwriter.writerow(row)
yield read_and_flush()
def csv_stream_response(response, iterator, file_name="xxxx.csv"):
response.content_type = 'text/csv'
response.content_disposition = 'attachment;filename="' + file_name + '"'
response.charset = 'utf8'
response.content_encoding = 'utf8'
response.app_iter = csv_generator(iterator)
return response
xlsx format is a zip file that contains several individual files, so you can't create it on the fly and send it out as it is being created.

Confused about making a CSV file into a ZIP file in django

I have a view that takes data from my site and then makes it into a zip compressed csv file. Here is my working code sans zip:
def backup_to_csv(request):
response = HttpResponse(mimetype='text/csv')
response['Content-Disposition'] = 'attachment; filename=backup.csv'
writer = csv.writer(response, dialect='excel')
#code for writing csv file go here...
return response
and it works great. Now I want that file to be compressed before it gets sent out. This is where I get stuck.
def backup_to_csv(request):
output = StringIO.StringIO() ## temp output file
writer = csv.writer(output, dialect='excel')
#code for writing csv file go here...
response = HttpResponse(mimetype='application/zip')
response['Content-Disposition'] = 'attachment; filename=backup.csv.zip'
z = zipfile.ZipFile(response,'w') ## write zip to response
z.writestr("filename.csv", output) ## write csv file to zip
return response
But thats not it and I have no idea how to do this.
OK I got it. Here is my new function:
def backup_to_csv(request):
output = StringIO.StringIO() ## temp output file
writer = csv.writer(output, dialect='excel')
#code for writing csv file go here...
response = HttpResponse(mimetype='application/zip')
response['Content-Disposition'] = 'attachment; filename=backup.csv.zip'
z = zipfile.ZipFile(response,'w') ## write zip to response
z.writestr("filename.csv", output.getvalue()) ## write csv file to zip
return response
Note how, in the working case, you return response... and in the NON-working case you return z, which is NOT an HttpResponse of course (while it should be!).
So: use your csv_writer NOT on response but on a temporary file; zip the temporary file; and write THAT zipped bytestream into the response!
zipfile.ZipFile(response,'w')
doesn't seem to work in python 2.7.9. The response is a django.HttpResponse object (which is said to be file-like) but it gives an error "HttpResponse object does not have an attribute 'seek'. When the same code is run in python 2.7.0 or 2.7.6 (I haven't tested it in other versions) it is OK... So you'd better test it with python 2.7.9 and see if you get the same behaviour.

Categories

Resources