Returning a python bytearray in an HttpResponse - python

I have a django view that I want to return an Excel file. The code is below:
def get_template(request, spec_pk):
spec = get_object_or_404(Spec, pk=spec_pk)
response = HttpResponse(spec.get_template(), mimetype='application/ms-excel')
response['Content-Disposition'] = 'attachment; filename=%s_template.xls' % spec.name
return response
In that example, the type of spec.get_template() is <type 'bytearray'> which contains the binary data of an Excel spreadsheet.
The problem is, when I try to download that view, and open it with Excel, it comes in as garbled binary data. I know that the bytearray is correct though, because if I do the following:
f = open('temp.xls', 'wb')
f.write(spec.get_template())
I can open temp.xls in the Excel perfectly.
I've even gone so far as to modify my view to:
def get_template(request, spec_pk):
spec = get_object_or_404(Spec, pk=spec_pk)
f = open('/home/user/temp.xls', 'wb')
f.write(spec.get_template())
f.close()
f = open('/home/user/temp.xls', 'rb')
response = HttpResponse(f.read(), mimetype='application/ms-excel')
response['Content-Disposition'] = 'attachment; filename=%s_template.xls' % spec.name
return response
And it works perfectly- I can open the xls file from the browser into Excel and everything is alright.
So my question is- what do I need to do that bytearray before I pass it to the HttpResponse. Why does saving it as binary, then re-opening it work perfectly, but passing the bytearray itself results in garbled data?

Okay, through completely random (and very persistent) trial and error, I found a solution using the python binascii module.
This works:
response = HttpResponse(binascii.a2b_qp(spec.get_template()), mimetype='application/ms-excel')
According to the python docs for binascii.a2b_qp:
Convert a block of quoted-printable data back to binary and return the binary data. More than one line may be passed at a time. If the optional argument header is present and true, underscores will be decoded as spaces.
Would love for someone to tell me why saving it as binary, then reopening it worked though.

TLDR: Cast the bytearray to bytes
The problem is that Django's HttpResponse doesn't treat bytearray objects the same as bytes objects. HttpResponse has a special case for bytes which sends them to the client as-is, but it doesn't have a similar case for bytearray objects. They get handled by a catchall case which treats them as an iterable of int.
If you open the corrupted Excel file in a text editor, you'll probably see a bunch of ascii numbers, which are the numeric values of the bytes you were trying to return from the bytearray
Mr. Digital Flapjack gives a very complete explanation here: https://www.digitalflapjack.com/blog/2021/4/27/bytes-not-bytearrays-with-django-please

Related

How can I input the image as byte data instead of string?

I'm new to python and was playing around with how to change my Instagram profile picture. The part I just can't get past is how I can put my image into the program. This is my code:
from instagram_private_api import Client, ClientCompatPatch
user_name = 'my_username'
password = 'my_password'
api = Client(user_name, password)
api.change_profile_picture('image.png')
Now, from what I read on the API Documentation, I can't just put in an image. It needs to be byte data. On the API documentation, the parameter is described like this:
photo_data – byte string of image
I converted the image on an encoding website and now I have the file image.txt with the byte data of the image. So I changed the last line to this:
api.change_profile_picture('image.txt')
But this still doesn't work. The program doesn't read it as byte data. I get the following error:
Exception has occurred: TypeError
a bytes-like object is required, not 'str'
What is the right way to put in the picture?
The error is telling you that "input.txt" (or "image.png") is a string, and it's always going to say that as long as you pass in a filename because filenames are always strings. Doesn't matter what's in the file, because the API doesn't read the file.
It doesn't want the filename of the image, it wants the actual image data that's in that file. That's why the parameter is named photo_data and not photo_filename. So read it (in binary mode, so you get bytes rather than text) and pass that instead.
with open("image.png", "rb") as imgfile:
api.change_profile_picture(imgfile.read())
The with statement ensures that the file is closed after you're done with it.
if you have .png or .jpeg or ... then use this.
with open("image.png", "rb") as f:
api.change_profile_picture(f.read())
and if you have a .txt file then use this.
with open("image.txt", "rb") as f:
api.change_profile_picture(f.read())

'application/octet-stream' instead of application/csv?

I am quite new to Python. I want to confirm that the type of the dataset (URL in the code below) is indeed a csv file. However, when checking via the headers I get 'application/octet-stream' instead of 'application/csv'.
I assume that I defined something in the wrong way when reading in the data, but I don't know what.
import requests
url="https://opendata.ecdc.europa.eu/covid19/casedistribution/csv/data.csv"
d1 = requests.get( url )
filePath = 'data/data_notebook-1_covid-new.csv'
with open(filePath, "wb") as f:
f.write(d1.content)
## data type via headers #PROBLEM
import requests
headerDict=d1.headers
#accessing content-type header
if "Content-Type" in headerDict:
print("Content-Type:")
print( headerDict['Content-Type'] )
I assume that I defined something in the wrong way when reading in the data
No, you didn't. The Content-Type header is supposed to indicate what the response body is, but there is nothing you can do to force the server to set that to a value you expect. Some servers are just badly configured and don't play along.
application/octet-stream is the most generic content type of them all - it gives you no more info than "it's a bunch of bytes, have fun".
What's more, there isn't necessarily One True Type for each kind of content, only more-or-less widely agreed-upon conventions. For CSV, a common one would be text/csv.
So if you're sure what the content is, feel free to ignore the Content-Type header.
import requests
url = "https://opendata.ecdc.europa.eu/covid19/casedistribution/csv/data.csv"
response = requests.get(url)
filePath = 'data/data_notebook-1_covid-new.csv'
with open(filePath, "wb") as f:
f.write(response.content)
Writing to file in binary mode is a good idea in the absence of any further information, because this will retain the original bytes exactly as they were.
In order to convert that to string, it needs to be decoded using a certain encoding. Since the Content-Type did not give any indication here (it could have said Content-Type: text/csv; charset=XYZ), the best first assumption for data from the Internet would be UTF-8:
import csv
filePath = 'data/data_notebook-1_covid-new.csv'
with open(filePath, encoding='utf-8') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
print(row)
Should that turn out to be wrong (i.e. there are decoding errors or garbled characters), you can try a different encoding until you find one that works. That would not be possible if you had written the file in text mode in the beginning, as any data corruption from wrong decoding would have made it into the file.

FTP downloaded data contains literal unicode such as \x00 and \n

I'm currently pulling from an FTP server to run a parser on a group of files when parsing the file from downloading in-browser and feeding the file into the parser, I get no errors since it's a plain TXT file.
But, when I pull the data from the server through Python's ftplib, I get differently formatted data. An example is the extra \x00 and 'b' characters inside my data. It's important that the data has integrity since I'm working with a COBOL database structure which can't be manipulated in the slightest else it will ruin the entire store.
I am successfully downloading using the following functions:
data = []
def handle_binary(more_data):
data.append(str(more_data))
def get_file(filename):
resp = ftp.retrbinary("RETR {0}".format(filename), callback=handle_binary)
file = "".join(data)
# returning the data as well as saving it to inspect
save = open("save.txt", "w+")
save.write(file)
save.close()
return file
I tried to change my store from:
data.append(str(more_data))
To
data.append(more_data)
As well as changing my join function to be b"" to indicate a byte join, but I got errors following that.
An example of a long string of data that wasn't in the original download:
\x00\x00\x00'b'\x00\n
Edit: Upon comparison, it seems that the FTP-downloaded data has no newlines (which makes sense since there's a missed newline in the above) as the downloaded copy does.
Thanks for any help regarding this question.

How to write array of raw bytes to google cloud storage as binary file

I'm making an AE app that algorithmically produces a raw binary file (mostly 16-bit bytes with some 32-bit header bytes) which is then downloaded by the user. I can produce this array and write it to a file in normal python using the following simple code:
import numpy as np
import array as arr
toy_data_to_write = arr.array('h', np.zeros(10, dtype=np.int16))
output_file = open('testFile.xyz', 'wb')
toy_data_to_write.tofile(output_file)
Obviously, this won't work in AE because writes are not allowed, so I tried to do something similar in the GCS:
import cloudstorage as gcs
self.response.headers['Content-Type'] = 'application/octet-stream'
self.response.headers['Content-Disposition'] = 'attachment; filename=testFile.xyz'
testfilename = '/' + 'xyz-app.appspot.com' + '/testfile'
gcs_file = gcs.open(testfilename,'w',content_type='application/octet-stream')
testdata = arr.array('h', np.zeros(10, dtype=np.int16))
gcs_file.write(testdata)
gcs_file.close()
gcs_file = gcs.open(testfilename)
self.response.write(gcs_file.read())
gcs_file.close()
This code gives me a TypeError, basically saying that write() expects a string and nothing else. Note that the array module's .tofile() function does not work with AE/GCS.
Is there any way for me to write a file like this? My understanding is that I can't somehow encode a string to be raw binary that will be written correctly (not ASCII or somesuch), so what I want may be impossible? :( Is the Blobstore any different? Also there's a similar(-sounding) question (here), but it's not at all what I'm trying to do.
It's worth mentioning that I don't really need to write the file -- I just need to be able to serve it back to the user, if that helps.
Use the array.tostring function:
gcs_file.write(testdata.tostring())

Write gzipped-already data into a file

I have a database with some of the data is binary (blob datatype in MySQL), which was actually webpages scrapped and gzipped. Now I want to extract them and write each record into a gzip file, which I'd assume to be doable - after all they are gzipped-data right?
The question is, however, how would I do that? By searching I could find a million of examples on how to write gzip file from original data, not gzipped one. Writing the gzipped string directly into a file doesn't result in a gzip file, not to mention I got a load of "ordinal not in range" exceptions.
Could you guys help? Thanks in advance. I'm a newbie to Python...
Edit: Here is the method I used:
def store_cache(self, content, news_id):
if not content:
return
# some of the records may contain normal data (not gzipp-ed), hence this try block
try:
content = self.gunzip(content)
except:
return
import gzip
with gzip.open('static/cache/%s' % (self.base36encode(news_id), ), 'wb') as f:
f.write(content)
f.close()
This causes an exception:
<type 'exceptions.UnicodeEncodeError'> at /migrate
'ascii' codec can't encode character u'\u1edb' in position 186: ordinal not in range(128)
And this is the innermost traceback:
E:\Python27\lib\gzip.py in write
self.crc = zlib.crc32(data, self.crc) & 0xffffffffL
You said it yourself: extract them and then write them into a gzip file. There is nothing special about writing "from gzipped data": you un-gzip the data to get the original data, and then write the original data as if it were original data (because it is). The documentation shows you how to do these things.
However, gzip is just a compression format, not an archive format. It is not built to handle multiple files, so you must use something else to create a single file from the multiple inputs. Typically this is done by making a tar archive which is then gzipped. You can do this in Python using the tarfile module. Since your data will come from gzip-decompression streams, you will want to use the TarFile.addfile(tarinfo, fileobj) method to add them to the archive. You should be able to use the gzip.GzipFile instance as the fileobj to add this way.

Categories

Resources