Python App Engine serving files with Google Cloud Storage

Python App Engine serving files with Google Cloud Storage - python

I currently use the following code for allowing my users to upload files;
uploadurl = blobstore.create_upload_url('/process?session=' + session, gs_bucket_name='mybucketname')
and I can serve images like this;
imgurl = get_serving_url(blob_key, size=1600, crop=False, secure_url=True)
After content is uploaded using the method in the first code snipped, the blob key contains encoded_gs_file: and that's how it knows to serve it from Google Cloud Service and not the blobstore as standard.
However, I'm unsure how I'd serve any other kind of file (for example .pdf, or .rtf). I do not want the content to be displayed in the browser, but rather sent to the client as a download (so they get the save file dialog and choose a location on their computer to save it).
How would I go about doing this? Thanks.

Using a google serving_url works only for images.
To serve a pdf from the blobstore you can use:
class DynServe(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
(blob_key, extension) = resource.rpartition('.')[::2]
blob_info = blobstore.BlobInfo.get(blob_key)
if not blob_info:
logging.error('Blob NOT FOUND %s' % resource)
self.abort(404)
self.response.headers[b'Content-Type'] = mimetypes.guess_type(blob_info.filename)
self.send_blob(blob_key, save_as=blob_info.filename)
The webapp2 route for this handler looks like:
webapp2.Route(r'/dynserve/<resource:(.*)>', handler=DynServe)
To serve:
PDF download

I'm going to answer my own question based on the answer from #voscausa
This is what my handler looks like (inside a file named view.py);
class DynServe(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
blob_key = resource
if not blobstore.get(blob_key):
logging.warning('Blob NOT FOUND %s' % resource)
self.abort(404)
return
else:
blob_info = blobstore.BlobInfo.get(blob_key)
self.send_blob(blob_key, save_as=blob_info.filename)
We need this in app.yaml;
- url: /download/.*
script: view.app
secure: always
secure: always is optional, but I always use it while handling user data.
Put this at the bottom of view.py;
app = webapp.WSGIApplication([('/download/([^/]+)?', DynServe),
], debug=False)
Now visit /download/BLOB_KEY_HERE. (you can check the datastore for your blob key)
That's a fully working example which works with both the standard blob store AND Google Cloud Service.
NOTE: All blob keys which are part of the GCS will start with encoded_gs_file: and the ones which don't are in the standard blobstore; app engine automatically uses this to determine where to locate the file

Related

get_uploads from Blobstore Upload returns empty list although file shows up in bucket

I've recently deployed my python GAE app from the development server and my image upload function stopped working properly...
After a bit of testing, it seems that the get_uploads function from blobstore is returning an empty list and hence I get an out of index error from the upload handler (also tried the get_file_infos function and had the same result)
However, when I check the GCS browser, the file is properly uploaded so my problem seems to be that I can't find a way to extract the image link from the post to Upload Handler
Anybody have clues as to why this is happening? and if there's a way around this?
(The form uses a post method with multipart/form-data so hopefully that isn't an issue)
Here's the function I'm calling to post to the upload handler:
upload_url = blobstore.create_upload_url('/upload', gs_bucket_name='BUCKET')
result = urlfetch.fetch(url= upload_url,
payload=self.request.body,
method=urlfetch.POST,
headers=self.request.headers)
And here's the code for the upload handler:
class UploadHandler(blobstore_handlers.BlobstoreUploadHandler):
def post(self):
upload_files = self.get_uploads('file')
blob_info = upload_files[0]
self.response.write(str(blob_info.key()))

What do you try to do?
It looks like you try to post a received body to GCS. Why not write it using the Google Cloud Storage Client Library.
with gcs.open(gcs_filename, 'w', content_type, options={b'x-goog-acl': b'public-read'}) as f:
f.write(blob)

How to serve a pdf file from GCS in GAE?

I'm using Google App Engine in Python to handle a small webapp.
I have some files stored in my GCS that I want to serve only if the user is logged in.
I though it was really easy, but for sure I'm missing a step since my code:
import cloudstorage as gcs
class Handler(webapp2.RequestHandler):
def write(self, *a, **kw):
self.response.out.write(*a, **kw)
class testHandler (Handler):
def get (self):
bucket = '/my_bucket'
filename = '/pdf/somedoc.pdf'
user = users.get_current_user()
if user:
pdf = gcs.open(bucket+filename)
self.write(pdf)
only gives:
<cloudstorage.storage_api.ReadBuffer object at 0xfbb931d0>
and what I need is the file itself.
Anyone can tell me which is the step I'm missing?
Thanks

After some thinking, shower and coffee, I realized I had two problems.
First I was writing the address of the file, not the file.
So the correct call would be:
self.write(pdf.read())
Also, I had to change the 'Content-Type' header to 'application/pdf', to allow the browser to serve the file and not a text file.
Anyhow, the result was:
class pHandler(webapp2.RequestHandler):
def write(self, *a, **kw):
self.response.headers['Content-Type']='application/pdf'
self.response.out.write(*a, **kw)
class testHandler (pHandler):
def get (self):
bucket = '/my_bucket'
filename = '/pdf/somedoc.pdf'
user = users.get_current_user()
if user:
pdf = gcs.open(bucket+filename)
self.write(pdf.read())

Even the PO has answered his question, just want to add a few thoughts.
PO's code is to write the content of pdf file into http response.
self.write(pdf.read())
According to GAE quota limitation, if the response size is larger than 32MB, it will fail.
Also, it would be good to set the urlfetch_timeout value, as default value of 5 seconds may not be enough in some circumstance, and would result in DeadlineExceededError.
I would recommend to try, when a request is received, use Google Cloud Storage API ( Not the GAE one ) to copy the file to a temporary location. Also Make sure to set the acl of the new object as publicly readable, then serve the public url of the new object.
Also, send a request to a taskqueue, set eta of the task to a timeout value of your choice. Once the task is executed, remove the file from the temporary location, so that it could no longer be accessed.
UPDATE:
Use Service Account Auth, Generate a new JSON key, get the private key.
Set the scope to FULL_CONTROL as we need to change acl settings.
I havn't test the code yet as I am at work. But will do when i have time.
import httplib2
from apiclient.discovery import build
from apiclient.errors import HttpError
from oauth2client.client import SignedJwtAssertionCredentials
# Need to modify ACL, therefore need full control access
GCS_SCOPE = 'https://www.googleapis.com/auth/devstorage.full_control'
def get_gcs_client( project_id,
service_account=None,
private_key=None):
credentials = SignedJwtAssertionCredentials(service_account, private_key, scope=GCS_SCOPE)
http = httplib2.Http()
http = credentials.authorize(http)
service = build('storage', 'v2', http=http)
return service

I think you'd be better off using the BlobStore API on GCS to serve this kind of files. Based on Using the Blobstore API with Google Cloud Storage, I've come up with this approach:
import cloudstorage as gcs
import webapp2
from google.appengine.ext import blobstore
from google.appengine.ext.webapp import blobstore_handlers
GCS_PREFIX = '/gs'
BUCKET = '/my_bucket'
FILE = '/pdf/somedoc.pdf'
BLOBSTORE_FILENAME = GCS_PREFIX + BUCKET + FILE
class GCSWebAppHandler(webapp2.RequestHandler):
def get(self):
blob_key = blobstore.create_gs_key(BLOBSTORE_FILENAME)
self.response.headers['Content-Type'] = 'application/pdf'
self.response.write(blobstore.fetch_data(blob_key, 0, blobstore.MAX_BLOB_FETCH_SIZE - 1))
class GCSBlobDlHandler(blobstore_handlers.BlobstoreDownloadHandler):
def get(self):
blob_key = blobstore.create_gs_key(BLOBSTORE_FILENAME)
self.send_blob(blob_key)
app = webapp2.WSGIApplication([
('/webapphandler', GCSWebAppHandler),
('/blobdlhandler', GCSServingHandler)],
debug=True)
As you can see, there are two example handlers you can use here, webapphandler and blobdlhandler. It's probably better to use the latter, since the former is limited by MAX_BLOB_FETCH_SIZE in fetch_data() which is 1MB, but if your served files are smaller than this size, it's ok.

Appengine Blobstore: Index out of range

I'm trying to upload a file using Blobstore API to Google Cloud Storage. The image uploads correctly, but then I try to process it (link it to a user). I'm getting the error:
Index out of range
This is my code:
class UploadHandler(blobstore_handlers.BlobstoreUploadHandler):
def post(self):
upload_files = self.get_file_infos('file') # 'file' is file upload field in the form
file_info = upload_files[0]
#self.response.headers['Content-Type'] = 'application/x-www-form-urlencoded'
#self.response.headers.add_header('Access-Control-Allow-Origin', '*')
gcs_filename = file_info.gs_object_name
file_key = blobstore.create_gs_key(gcs_filename)
File(file=file_key, owner=utils.get_current_user(),
url= images.get_serving_url(file_key)).put()
My code drops in file_info = upload_files[0] line.

Where is your code that puts the file into your Google Cloud Storage bucket?
I think the problem might be these two lines, depending on how you implemented the GCS upload...
gcs_filename = file_info.gs_object_name
file_key = blobstore.create_gs_key(gcs_filename)
The gs_object_name should only return a meaningful result if the item is from GCS. This would cause create_gs_key() to fail as well if the gcs_filename is not correct.
For how to use blobstore API with Google Cloud Storage, please see this article for details - https://cloud.google.com/appengine/docs/python/blobstore/#Python_Using_the_Blobstore_API_with_Google_Cloud_Storage

How do you get Google App Engine to gunzip during download?

I am trying to get Google App Engine to gunzip my .gz blob file (single file compressed) automatically by setting the response headers as follows:
class download(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
resource = str(urllib.unquote(resource))
blob_info = blobstore.BlobInfo.get(resource)
self.response.headers['Content-Encoding'] = str('gzip')
# self.response.headers['Content-type'] = str('application/x-gzip')
self.response.headers['Content-type'] = str(blob_info.content_type)
self.response.headers['Content-Length'] = str(blob_info.size)
cd = 'attachment; filename=%s' % (blob_info.filename)
self.response.headers['Content-Disposition'] = str(cd)
self.response.headers['Cache-Control'] = str('must-revalidate, post-check=0, pre-check=0')
self.response.headers['Pragma'] = str(' public')
self.send_blob(blob_info)
When this runs, the file is downloaded without the .gz extension. However, the downloaded file is still gzipped. The file size of the downloaded data match the .gz file size on the server. Also, I can confirm this by manually gunzipping the downloaded file. I am trying to avoid the manual gunzip step.
I am trying to get the blob file to automatically gunzip during the download. What am I doing wrong?
By the way, the gzip file contains only a single file. On my self-hosted (non Google) server, I could accomplish the automatic gunzip by setting same response headers; albeit, my code there is written in PHP.
UPDATE:
I rewrote the handler to serve data from the bucket. However, this generates HTML 500 error. The file is partially downloaded before the failure. The rewrite is as follows:
class download(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
resource = str(urllib.unquote(resource))
blob_info = blobstore.BlobInfo.get(resource)
file = '/gs/mydatabucket/%s' % blob_info.filename
print file
self.response.headers['Content-Encoding'] = str('gzip')
self.response.headers['Content-Type'] = str('application/x-gzip')
# self.response.headers['Content-Length'] = str(blob_info.size)
cd = 'filename=%s' % (file)
self.response.headers['Content-Disposition'] = str(cd)
self.response.headers['Cache-Control'] = str('must-revalidate, post-check=0, pre-check=0')
self.response.headers['Pragma'] = str(' public')
self.send_blob(file)
This downloads 540,672 bytes of the 6,094,848 bytes file to the client before the server terminate and issued a 500 error. When I issue 'file' on the partially downloaded file from the command line, Mac OS seems to correctly identify the file format as 'SQLite 3.x database' file. Any idea of why the 500 error on the server? How can I fix the problem?

You should first check to see if your requesting client supports gzipped content. If it does support gzip content encoding, then you may pass the gzipped blob as is with the proper content-encoding and content-type headers, otherwise you need to decompress the blob for the client. You should also verify that your blob's content_type isn't gzip (this depends on how you created your blob to begin with!)
You may also want to look at Google Cloud Storage as this automatically handles gzip transportation so long as you properly compress the data before storing it with the proper content-encoding and content-type metadata.
See this SO question: Google cloud storage console Content-Encoding to gzip
Or the GCS Docs: https://cloud.google.com/storage/docs/gsutil/addlhelp/WorkingWithObjectMetadata#content-encoding
You may use GCS as easily (if not more easily) as you use the blobstore in AppEngine and it seems to be the preferred storage layer to use going forward. I say this because the File API has been deprecated which made blobstore interaction easier and great efforts and advancements have been made to the GCS libraries making the API similar to the base python file interaction API
UPDATE:
Since the objects are stored in GCS, you can use 302 redirects to point users to files rather than relying on the Blobstore API. This eliminates any unknown behavior of the Blobstore API and GAE delivering your stored objects with the content-type and content-encoding you intended to use. For objects with a public-read ACL, you may simply direct them to either storage.googleapis.com/<bucket>/<object> or <bucket>.storage.googleapis.com/<object>. Alternatively, if you'd like to have application logic dictate access, you should keep the ACL to the objects private and can use GCS Signed URLs to create short lived URLs to use when doing a 302 redirect.
Its worth noting that if you want users to be able to upload objects via GAE, you'd still use the Blobstore API to handle storing the file in GCS, but you'd have to modify the object after it was uploaded to ensure proper gzip compressing and content-encoding meta data is used.
class legacy_download(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
filename = str(urllib.unquote(resource))
url = 'https://storage.googleapis.com/mybucket/' + filename
self.redirect(url)

GAE already serves everything using gzip if the client supports it.
So I think what's happening after your update is that the browser expects there to be more of the file, but GAE thinks it's already at the end of the file since it's already gzipped. That's why you get the 500.
(if that makes sense)
Anyway, since GAE already handles compression for you, the easiest way is probably to put non compressed files in GCS and let the Google infrastructure handle the compression automatically for you when you serve them.

Uploading file from external URL on Google App Engine with Python

Is it possible to upload an image from an external site (i.e. www.example.com/image.png) to the Google App Engine? This is not the same as uploading a file from a user submitted form.
If it's possible, any solutions?
I'm also using Google Cloud Storage, so if anybody has found a way to accomplish the same thing by uploading straight to Google Cloud Storage, please let me know.
-- UPDATE ---
I've followed the example from here - https://developers.google.com/appengine/docs/python/googlecloudstorageclient/getstarted and
replaced the text write "abcd" with this:
url="example.com/image.jpeg"
opener1 = urllib2.build_opener()
page1 = opener1.open(url)
write_retry_params = gcs.RetryParams(backoff_factor=1.1)
gcs_file = gcs.open(filename,
'w',
content_type='image/jpeg',
options={'x-goog-meta-foo': 'foo',
'x-goog-meta-bar': 'bar'},
retry_params=write_retry_params)
gcs_file.write(page1.read())
gcs_file.close()
The problem is - when I run this code, it tries to download the image to my computer (the client) instead of downloading to the gcs_file. I get the download file popup.... That's not what I'm trying to do. What am I doing wrong?

Yes, you are looking at using the google cloud storage client library. Specifically this:
https://developers.google.com/appengine/docs/python/googlecloudstorageclient/functions#open
Open it for writing then write the url fetched image.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python App Engine serving files with Google Cloud Storage - python

Related

get_uploads from Blobstore Upload returns empty list although file shows up in bucket

How to serve a pdf file from GCS in GAE?

Appengine Blobstore: Index out of range

How do you get Google App Engine to gunzip during download?

Uploading file from external URL on Google App Engine with Python

Categories

Resources