How do you get Google App Engine to gunzip during download? - python

I am trying to get Google App Engine to gunzip my .gz blob file (single file compressed) automatically by setting the response headers as follows:
class download(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
resource = str(urllib.unquote(resource))
blob_info = blobstore.BlobInfo.get(resource)
self.response.headers['Content-Encoding'] = str('gzip')
# self.response.headers['Content-type'] = str('application/x-gzip')
self.response.headers['Content-type'] = str(blob_info.content_type)
self.response.headers['Content-Length'] = str(blob_info.size)
cd = 'attachment; filename=%s' % (blob_info.filename)
self.response.headers['Content-Disposition'] = str(cd)
self.response.headers['Cache-Control'] = str('must-revalidate, post-check=0, pre-check=0')
self.response.headers['Pragma'] = str(' public')
self.send_blob(blob_info)
When this runs, the file is downloaded without the .gz extension. However, the downloaded file is still gzipped. The file size of the downloaded data match the .gz file size on the server. Also, I can confirm this by manually gunzipping the downloaded file. I am trying to avoid the manual gunzip step.
I am trying to get the blob file to automatically gunzip during the download. What am I doing wrong?
By the way, the gzip file contains only a single file. On my self-hosted (non Google) server, I could accomplish the automatic gunzip by setting same response headers; albeit, my code there is written in PHP.
UPDATE:
I rewrote the handler to serve data from the bucket. However, this generates HTML 500 error. The file is partially downloaded before the failure. The rewrite is as follows:
class download(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
resource = str(urllib.unquote(resource))
blob_info = blobstore.BlobInfo.get(resource)
file = '/gs/mydatabucket/%s' % blob_info.filename
print file
self.response.headers['Content-Encoding'] = str('gzip')
self.response.headers['Content-Type'] = str('application/x-gzip')
# self.response.headers['Content-Length'] = str(blob_info.size)
cd = 'filename=%s' % (file)
self.response.headers['Content-Disposition'] = str(cd)
self.response.headers['Cache-Control'] = str('must-revalidate, post-check=0, pre-check=0')
self.response.headers['Pragma'] = str(' public')
self.send_blob(file)
This downloads 540,672 bytes of the 6,094,848 bytes file to the client before the server terminate and issued a 500 error. When I issue 'file' on the partially downloaded file from the command line, Mac OS seems to correctly identify the file format as 'SQLite 3.x database' file. Any idea of why the 500 error on the server? How can I fix the problem?

You should first check to see if your requesting client supports gzipped content. If it does support gzip content encoding, then you may pass the gzipped blob as is with the proper content-encoding and content-type headers, otherwise you need to decompress the blob for the client. You should also verify that your blob's content_type isn't gzip (this depends on how you created your blob to begin with!)
You may also want to look at Google Cloud Storage as this automatically handles gzip transportation so long as you properly compress the data before storing it with the proper content-encoding and content-type metadata.
See this SO question: Google cloud storage console Content-Encoding to gzip
Or the GCS Docs: https://cloud.google.com/storage/docs/gsutil/addlhelp/WorkingWithObjectMetadata#content-encoding
You may use GCS as easily (if not more easily) as you use the blobstore in AppEngine and it seems to be the preferred storage layer to use going forward. I say this because the File API has been deprecated which made blobstore interaction easier and great efforts and advancements have been made to the GCS libraries making the API similar to the base python file interaction API
UPDATE:
Since the objects are stored in GCS, you can use 302 redirects to point users to files rather than relying on the Blobstore API. This eliminates any unknown behavior of the Blobstore API and GAE delivering your stored objects with the content-type and content-encoding you intended to use. For objects with a public-read ACL, you may simply direct them to either storage.googleapis.com/<bucket>/<object> or <bucket>.storage.googleapis.com/<object>. Alternatively, if you'd like to have application logic dictate access, you should keep the ACL to the objects private and can use GCS Signed URLs to create short lived URLs to use when doing a 302 redirect.
Its worth noting that if you want users to be able to upload objects via GAE, you'd still use the Blobstore API to handle storing the file in GCS, but you'd have to modify the object after it was uploaded to ensure proper gzip compressing and content-encoding meta data is used.
class legacy_download(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
filename = str(urllib.unquote(resource))
url = 'https://storage.googleapis.com/mybucket/' + filename
self.redirect(url)

GAE already serves everything using gzip if the client supports it.
So I think what's happening after your update is that the browser expects there to be more of the file, but GAE thinks it's already at the end of the file since it's already gzipped. That's why you get the 500.
(if that makes sense)
Anyway, since GAE already handles compression for you, the easiest way is probably to put non compressed files in GCS and let the Google infrastructure handle the compression automatically for you when you serve them.

Related

How to export CSV to absolute path in Flask, using Content-disposition?

I have the following code in Flask app:
#app.route('/transform', methods=['POST'])
def transform_view():
//other irrelevant codes
resp = make_response(data1.to_csv())
resp.headers['Content-Disposition'] = \
'attachment; filename= export.csv'
resp.headers['Content-Type'] = 'text/csv'
return resp
Whenever I click my download csv file button in frontend, the export.csv is downloaded in Downloads
folder. But I want to export it to the directory where my project resides, which is C:\Users\Admin\Desktop\crap9\flask
I tried doing:
resp.headers['Content-Disposition'] = \
"attachment; filename= r'C:\Users\Admin\Desktop\crap9\flask\export.csv'"
but throws
File ".\app.py", line 480
"attachment; filename= C:\Users\Admin\Desktop\crap9\flask\export.csv"
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes
in position 24-25: truncated \UXXXXXXXX escape
This isn't really answering your exception/unicode error, but some advice regarding the use of Content-Disposition...
The attachment directive isn't meant to include a full path. See MDN docs:
The filename is always optional and must not be used blindly by the application: path information should be stripped, and conversion to the server file system rules should be done. This parameter provides mostly indicative information. When used in combination with Content-Disposition: attachment, it is used as the default filename for an eventual "Save As" dialog presented to the user.
So when you mention:
export.csv is downloaded in Downloads folder. But I want to export it to the directory where my project resides, which is C:\Users\Admin\Desktop\crap9\flask
Selection of the Downloads folder is because your browser config has this as the default Save As location. This can't be tweaked via a header from your server. If you really want to save that to your the server's filesystem, it may be advisable to do this with a standard python method. Something like:
with open('export.csv', 'wb') as f:
f.write(data1.to_csv())
I think you may be confusing the server's filesystem with the client filesystem, because you're running the app locally at this stage. When deployed these are two completely separate things. That full path will not exist on the client filesystm and your transform_view function, which makes the file downloadable to the end user, cannot dictate which directory this will be saved to once downloaded by the browser.
So probably wise to make the decision of whether to actually save to the server's filesystem, or make the file downloadable where the client gets to pick the save location (you can still set the filename itself).

Send file via JSON instead of uploading to server, Django

I have an app that currently allows a user to upload a file and it saves the file on the web server. My client has now decided to use a third party cloud hosting service for their file storage needs. The company has their own API for doing CRUD operations on their server, so I wrote a script to test their API and it sends a file as a base64 encoded JSON payload to the API. The script works fine but now I'm stuck on how exactly how I should implement this functionality into Django.
json_testing.py
import base64
import json
import requests
import magic
filename = 'test.txt'
# Open file and read file and encode it as a base64 string
with open(filename, "rb") as test_file:
encoded_string = base64.b64encode(test_file.read())
# Get MIME type using magic module
mime = magic.Magic(mime=True)
mime_type = mime.from_file(filename)
# Concatenate MIME type and encoded string with string data
# Use .decode() on byte data for mime_type and encoded string
file_string = 'data:%s;base64,%s' % (mime_type.decode(), encoded_string.decode())
payload = {
"client_id": 1,
"file": file_string
}
headers = {
"token": "AuthTokenGoesHere",
"content-type": "application/json",
}
request = requests.post('https://api.website.com/api/files/', json=payload, headers=headers)
print(request.json())
models.py
def upload_location(instance, filename):
return '%s/documents/%s' % (instance.user.username, filename)
class Document(models.Model):
user = models.ForeignKey(settings.AUTH_USER_MODEL)
category = models.ForeignKey(Category, on_delete=models.CASCADE)
file = models.FileField(upload_to=upload_location)
def __str__(self):
return self.filename()
def filename(self):
return os.path.basename(self.file.name)
So to reiterate, when a user uploads a file, instead of storing the file somewhere on the web server, I want to base64 encode the file so I can send the file as a JSON payload. Any ideas on what would be the best way to approach this?
The simplest way I can put this is that I want to avoid saving the
file to the web server entirely. I just want to encode the file, send
it as a payload, and discard it, if that's possible.
From the django docs:
Upload Handlers
When a user uploads a file, Django passes off the file data to an
upload handler – a small class that handles file data as it gets
uploaded. Upload handlers are initially defined in the
FILE_UPLOAD_HANDLERS setting, which defaults to:
["django.core.files.uploadhandler.MemoryFileUploadHandler",
"django.core.files.uploadhandler.TemporaryFileUploadHandler"]
Together
MemoryFileUploadHandler and TemporaryFileUploadHandler provide
Django’s default file upload behavior of reading small files into
memory and large ones onto disk.
You can write custom handlers that customize how Django handles files.
You could, for example, use custom handlers to enforce user-level
quotas, compress data on the fly, render progress bars, and even send
data to another storage location directly without storing it locally.
See Writing custom upload handlers for details on how you can
customize or completely replace upload behavior.
Contrary thoughts:
I think you should consider sticking with the default file upload handlers because they keep someone from uploading a file that will overwhelm the server's memory.
Where uploaded data is stored
Before you save uploaded files, the data needs to be stored somewhere.
By default, if an uploaded file is smaller than 2.5 megabytes, Django
will hold the entire contents of the upload in memory. This means that
saving the file involves only a read from memory and a write to disk
and thus is very fast.
However, if an uploaded file is too large, Django will write the
uploaded file to a temporary file stored in your system’s temporary
directory. On a Unix-like platform this means you can expect Django to
generate a file called something like /tmp/tmpzfp6I6.upload. If an
upload is large enough, you can watch this file grow in size as Django
streams the data onto disk.
These specifics – 2.5 megabytes; /tmp; etc. – are simply “reasonable
defaults” which can be customized as described in the next section.
request.FILES info:
#forms.py:
from django import forms
class UploadFileForm(forms.Form):
title = forms.CharField(max_length=50)
json_file = forms.FileField()
A view handling this form will receive the file data in request.FILES,
which is a dictionary containing a key for each FileField (or
ImageField, or other FileField subclass) in the form. So the data from
the above form would be accessible as request.FILES[‘json_file’].
Note that request.FILES will only contain data if the request method
was POST and the <form> that posted the request has the attribute
enctype="multipart/form-data". Otherwise, request.FILES will be empty.
HttpRequest.FILES
A dictionary-like object containing all uploaded files. Each key in
FILES is the name from the <input type="file" name="" />. Each value
in FILES is an UploadedFile.
Upload Handlers
When a user uploads a file, Django passes off the file data to an
upload handler – a small class that handles file data as it gets
uploaded. Upload handlers are initially defined in the
FILE_UPLOAD_HANDLERS setting, which defaults to:
["django.core.files.uploadhandler.MemoryFileUploadHandler",
"django.core.files.uploadhandler.TemporaryFileUploadHandler"]
The source code for TemporaryFileUploadHandler contains this:
lass TemporaryFileUploadHandler(FileUploadHandler):
"""
Upload handler that streams data into a temporary file.
"""
...
...
def new_file(self, *args, **kwargs):
"""
Create the file object to append to as data is coming in.
"""
...
self.file = TemporaryUploadedFile(....) #<***HERE
And the source code for TemporaryUploadedFile contains this:
class TemporaryUploadedFile(UploadedFile):
"""
A file uploaded to a temporary location (i.e. stream-to-disk).
"""
def __init__(self, name, content_type, size, charset, content_type_extra=None):
...
file = tempfile.NamedTemporaryFile(suffix='.upload') #<***HERE
And the python tempfile docs say this:
tempfile.NamedTemporaryFile(...., delete=True)
...
If delete is true (the default), the file is deleted as soon as it is closed.
Similarly, the other of the two default file upload handlers, MemoryFileUploadHandler, creates a file of type BytesIO:
A stream implementation using an in-memory bytes buffer. It inherits
BufferedIOBase. The buffer is discarded when the close() method is
called.
Therefore, all you have to do is close request.FILES[“field_name”] to erase the file (whether the file contents are stored in memory or on disk in the /tmp file directory), e.g.:
uploaded_file = request.FILES[“json_file”]
file_contents = uploaded_file.read()
#Send file_contents to other server here.
uploaded_file.close() #erases file
If for some reason you don't want django to write to the server's /tmp directory at all, then you'll need to write a custom file upload handler to reject uploaded files that are too large.

Export spreadsheet as text/csv using Drive v3 gives 500 Internal Error

I was trying to export a Google Spreadsheet in csv format using the Google client library for Python:
# OAuth and setups...
req = g['service'].files().export_media(fileId=fileid, mimeType=MIMEType)
fh = io.BytesIO()
downloader = http.MediaIoBaseDownload(fh, req)
# Other file IO handling...
This works for MIMEType: application/pdf, MS Excel, etc.
According to Google's documentation, text/csv is supported. But when I try to make a request, the server gives a 500 Internal Error.
Even using google's Drive API playground, it gives the same error.
Tried:
Like in v2, I added a field:
gid = 0
to the request to specify the worksheet, but then it's a bad request.
This is a known bug in Google's code. https://code.google.com/a/google.com/p/apps-api-issues/issues/detail?id=4289
However, if you manually build your own request, you can download the whole file in bytes (the media management stuff won't work).
With file as the file ID, http as the http object that you've authorized against you can download a file with:
from apiclient.http import HttpRequest
def postproc(*args):
return args[1]
data = HttpRequest(http=http,
postproc=postproc,
uri='https://docs.google.com/feeds/download/spreadsheets/Export?key=%s&exportFormat=csv' % file,
headers={ }).execute()
data here is a bytes object that contains your CSV. You can open it something like:
import io
lines = io.TextIOWrapper(io.BytesIO(data), encoding='utf-8', errors='replace')
for line in lines:
#Do whatever
You just need to implement an Exponential Backoff.
Looking at this documentation of ExponentialBackOffPolicy.
The idea is that the servers are only temporarily unavailable, and they should not be overwhelmed when they are trying to get back up.
The default implementation requires back off for 500 and 503 status codes. Subclasses may override if different status codes are required.
Here is an snippet of an implementation of Exponential Backoff from the first link:
ExponentialBackOff backoff = ExponentialBackOff.builder()
.setInitialIntervalMillis(500)
.setMaxElapsedTimeMillis(900000)
.setMaxIntervalMillis(6000)
.setMultiplier(1.5)
.setRandomizationFactor(0.5)
.build();
request.setUnsuccessfulResponseHandler(new HttpBackOffUnsuccessfulResponseHandler(backoff));
You may want to look at this documentation for the summary of the ExponentialBackoff implementation.

GAE - how to use blobstore stub in testbed?

My code goes like this:
self.testbed.init_blobstore_stub()
upload_url = blobstore.create_upload_url('/image')
upload_url = re.sub('^http://testbed\.example\.com', '', upload_url)
response = self.testapp.post(upload_url, params={
'shopid': id,
'description': 'JLo',
}, upload_files=[('file', imgPath)])
self.assertEqual(response.status_int, 200)
how come it shows 404 error? For some reasons the upload path does not seem to exist at all.
You can't do this. I think the problem is that webtest (which I assume is where self.testapp came from) doesn't work well with testbed blobstore functionality. You can find some info at this question.
My solution was to override unittest.TestCase and add the following methods:
def create_blob(self, contents, mime_type):
"Since uploading blobs doesn't work in testing, create them this way."
fn = files.blobstore.create(mime_type = mime_type,
_blobinfo_uploaded_filename = "foo.blt")
with files.open(fn, 'a') as f:
f.write(contents)
files.finalize(fn)
return files.blobstore.get_blob_key(fn)
def get_blob(self, key):
return self.blobstore_stub.storage.OpenBlob(key).read()
You will also need the solution here.
For my tests where I would normally do a get or post to a blobstore handler, I instead call one of the two methods above. It is a bit hacky but it works.
Another solution I am considering is to use Selenium's HtmlUnit driver. This would require the dev server to be running but should allow full testing of blobstore and also javascript (as a side benefit).
I think Kekito is right, you cannot POST to the upload_url directly.
But if you want to test the BlobstoreUploadHandler, you can fake the POST request it would normally received from the blobstore in the following way. Assuming your handler is at /handler :
import email
...
def test_upload(self):
blob_key = 'abcd'
# The blobstore upload handler receives a multipart form request
# containing uploaded files. But instead of containing the actual
# content, the files contain an 'email' message that has some meta
# information about the file. They also contain a blob-key that is
# the key to get the blob from the blobstore
# see blobstore._get_upload_content
m = email.message.Message()
m.add_header('Content-Type', 'image/png')
m.add_header('Content-Length', '100')
m.add_header('X-AppEngine-Upload-Creation', '2014-03-02 23:04:05.123456')
# This needs to be valie base64 encoded
m.add_header('content-md5', 'd74682ee47c3fffd5dcd749f840fcdd4')
payload = m.as_string()
# The blob-key in the Content-type is important
params = [('file', webtest.forms.Upload('test.png', payload,
'image/png; blob-key='+blob_key))]
self.testapp.post('/handler', params, content_type='blob-key')
I figured that out by digging into the blobstore code. The important bit is that the POST request that the blobstore sends to the UploadHandler doesn't contain the file content. Instead, it contains an "email message" (well, informations encoded like in an email) with metadata about the file (content-type, content-length, upload time and md5). It also contains a blob-key that can be used to retrieve the file from the blobstore.

how to create a downloadable csv file in appengine

I use python Appengine. I'm trying to create a link on a webpage, which a user can click to download a csv file. How can I do this?
I've looked at csv module, but it seems to want to open a file on the server, but appengine doesn't allow that.
I've looked at remote_api, but it seems that its only for uploading or downloading using app config, and from account owner's terminal.
Any help thanks.
Pass a StringIO object as the first parameter to csv.writer; then set the content-type and content-disposition on the response appropriately (probably "text/csv" and "attachment", respectively) and send the StringIO as the content.
I used this code:
self.response.headers['Content-Type'] = 'application/csv'
writer = csv.writer(self.response.out)
writer.writerow(['foo','foo,bar', 'bar'])
Put it in your handler's get method. When user requests it, user's browser will download the list content automatically.
Got from: generating a CSV file online on Google App Engine

Categories

Resources