How to serve a pdf file from GCS in GAE?

How to serve a pdf file from GCS in GAE? - python

I'm using Google App Engine in Python to handle a small webapp.
I have some files stored in my GCS that I want to serve only if the user is logged in.
I though it was really easy, but for sure I'm missing a step since my code:
import cloudstorage as gcs
class Handler(webapp2.RequestHandler):
def write(self, *a, **kw):
self.response.out.write(*a, **kw)
class testHandler (Handler):
def get (self):
bucket = '/my_bucket'
filename = '/pdf/somedoc.pdf'
user = users.get_current_user()
if user:
pdf = gcs.open(bucket+filename)
self.write(pdf)
only gives:
<cloudstorage.storage_api.ReadBuffer object at 0xfbb931d0>
and what I need is the file itself.
Anyone can tell me which is the step I'm missing?
Thanks

After some thinking, shower and coffee, I realized I had two problems.
First I was writing the address of the file, not the file.
So the correct call would be:
self.write(pdf.read())
Also, I had to change the 'Content-Type' header to 'application/pdf', to allow the browser to serve the file and not a text file.
Anyhow, the result was:
class pHandler(webapp2.RequestHandler):
def write(self, *a, **kw):
self.response.headers['Content-Type']='application/pdf'
self.response.out.write(*a, **kw)
class testHandler (pHandler):
def get (self):
bucket = '/my_bucket'
filename = '/pdf/somedoc.pdf'
user = users.get_current_user()
if user:
pdf = gcs.open(bucket+filename)
self.write(pdf.read())

Even the PO has answered his question, just want to add a few thoughts.
PO's code is to write the content of pdf file into http response.
self.write(pdf.read())
According to GAE quota limitation, if the response size is larger than 32MB, it will fail.
Also, it would be good to set the urlfetch_timeout value, as default value of 5 seconds may not be enough in some circumstance, and would result in DeadlineExceededError.
I would recommend to try, when a request is received, use Google Cloud Storage API ( Not the GAE one ) to copy the file to a temporary location. Also Make sure to set the acl of the new object as publicly readable, then serve the public url of the new object.
Also, send a request to a taskqueue, set eta of the task to a timeout value of your choice. Once the task is executed, remove the file from the temporary location, so that it could no longer be accessed.
UPDATE:
Use Service Account Auth, Generate a new JSON key, get the private key.
Set the scope to FULL_CONTROL as we need to change acl settings.
I havn't test the code yet as I am at work. But will do when i have time.
import httplib2
from apiclient.discovery import build
from apiclient.errors import HttpError
from oauth2client.client import SignedJwtAssertionCredentials
# Need to modify ACL, therefore need full control access
GCS_SCOPE = 'https://www.googleapis.com/auth/devstorage.full_control'
def get_gcs_client( project_id,
service_account=None,
private_key=None):
credentials = SignedJwtAssertionCredentials(service_account, private_key, scope=GCS_SCOPE)
http = httplib2.Http()
http = credentials.authorize(http)
service = build('storage', 'v2', http=http)
return service

I think you'd be better off using the BlobStore API on GCS to serve this kind of files. Based on Using the Blobstore API with Google Cloud Storage, I've come up with this approach:
import cloudstorage as gcs
import webapp2
from google.appengine.ext import blobstore
from google.appengine.ext.webapp import blobstore_handlers
GCS_PREFIX = '/gs'
BUCKET = '/my_bucket'
FILE = '/pdf/somedoc.pdf'
BLOBSTORE_FILENAME = GCS_PREFIX + BUCKET + FILE
class GCSWebAppHandler(webapp2.RequestHandler):
def get(self):
blob_key = blobstore.create_gs_key(BLOBSTORE_FILENAME)
self.response.headers['Content-Type'] = 'application/pdf'
self.response.write(blobstore.fetch_data(blob_key, 0, blobstore.MAX_BLOB_FETCH_SIZE - 1))
class GCSBlobDlHandler(blobstore_handlers.BlobstoreDownloadHandler):
def get(self):
blob_key = blobstore.create_gs_key(BLOBSTORE_FILENAME)
self.send_blob(blob_key)
app = webapp2.WSGIApplication([
('/webapphandler', GCSWebAppHandler),
('/blobdlhandler', GCSServingHandler)],
debug=True)
As you can see, there are two example handlers you can use here, webapphandler and blobdlhandler. It's probably better to use the latter, since the former is limited by MAX_BLOB_FETCH_SIZE in fetch_data() which is 1MB, but if your served files are smaller than this size, it's ok.

Related

Setting ["GOOGLE_APPLICATION_CREDENTIALS"] from a dict rather than file path

I'm trying to set the environment variable from a dict but getting and error when connecting.
#service account pulls in airflow variable that contains the json dict with service_account credentials
service_account = Variable.get('google_cloud_credentials')
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]=str(service_account)
error
PermissionDeniedError: Error executing an HTTP request: HTTP response code 403 with body '<?xml version='1.0' encoding='UTF-8'?><Error><Code>AccessDenied</Code><Message>Access denied.</Message><Details>Anonymous caller does not have storage.objects.get access to the Google Cloud Storage object.</Details></Error>'
when reading if I use and point to file then there are no issues.
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]=/file/path/service_account.json
I'm wondering is there a way to convert the dict object to an os path like object? I don't want to store the json file on the container and airflow/google documentation isn't clear at all.

The Python stringio package lets you create a file-like object backed by a string, but that won't help here because the consumer of this environment variable is expecting a file path, not a file-like object. I don't think it's possible to do what you're trying to do. Is there a reason you don't want to just put the credentials in a file?

There is a way to do it, but the Google documentation is terrible. So I wrote a Github gist to document the recipe that I and a colleague (Imee Cuison) developed to use the key securely. Sample code below:
import json
from google.oauth2.service_account import Credentials
from google.cloud import secretmanager
def access_secret(project_id:str, secret_id:str, version_id:str="latest")->str:
"""Return the secret in string format"""
# Create the Secret Manager client.
client = secretmanager.SecretManagerServiceClient()
# Build the resource name of the secret version.
name = f"projects/{project_id}/secrets/{secret_id}/versions/{version_id}"
# Access the secret version.
response = client.access_secret_version(name=name)
# Return the decoded payload.
return response.payload.data.decode('UTF-8')
def get_credentials_from_token(token:str)->Credentials:
"""Given an authentication token, return a Credentials object"""
credential_dict = json.loads(secret_payload)
return Credentials.from_service_account_info(credential_dict)
credentials_secret = access_secret("my_project", "my_secret")
creds = get_credentials_from_token(credentials_secret)
# And now you can use the `creds` Credentials object to authenticate to an API

Putting the service account into the repository is not a good practice. As a best practice; You need to use authentication propagating from the default google auth within your application.
For instance, using Google Cloud Kubernetes you can use the following python code :
from google.cloud.container_v1 import ClusterManagerClient
credentials, project = google.auth.default(
scopes=['https://www.googleapis.com/auth/cloud-platform', ])
credentials.refresh(google.auth.transport.requests.Request())
cluster_manager = ClusterManagerClient(credentials=credentials)

Trouble with Google Application Credentials

Hi there first and foremost this is my first time using Googles services. I'm trying to develop an app with the Google AutoML Vision Api (Custom Model). I have already build a custom model and generated the API keys(I hope I did it correctly tho).
After many attempts of developing via Ionics & Android and failing to connect to the to the API.
I have now taken the prediction modelling given codes in Python (on Google Colab) and even with that I still get an error message saying that Could not automatically determine credentials. I'm not sure where I have gone wrong in this. Please help. Dying.
#installing & importing libraries
!pip3 install google-cloud-automl
import sys
from google.cloud import automl_v1beta1
from google.cloud.automl_v1beta1.proto import service_pb2
#import key.json file generated by GOOGLE_APPLICATION_CREDENTIALS
from google.colab import files
credentials = files.upload()
#explicit function given by Google accounts
[https://cloud.google.com/docs/authentication/production#auth-cloud-implicit-python][1]
def explicit():
from google.cloud import storage
# Explicitly use service account credentials by specifying the private key
# file.
storage_client = storage.Client.from_service_account_json(credentials)
# Make an authenticated API request
buckets = list(storage_client.list_buckets())
print(buckets)
#import image for prediction
from google.colab import files
YOUR_LOCAL_IMAGE_FILE = files.upload()
#prediction code from modelling
def get_prediction(content, project_id, model_id):
prediction_client = automl_v1beta1.PredictionServiceClient()
name = 'projects/{}/locations/uscentral1/models/{}'.format(project_id,
model_id)
payload = {'image': {'image_bytes': content }}
params = {}
request = prediction_client.predict(name, payload, params)
return request # waits till request is returned
#print function substitute with values
content = YOUR_LOCAL_IMAGE_FILE
project_id = "REDACTED_PROJECT_ID"
model_id = "REDACTED_MODEL_ID"
print (get_prediction(content, project_id, model_id))
Error Message when run the last line of code:

credentials = files.upload()
storage_client = storage.Client.from_service_account_json(credentials)
these two lines are the issue I think.
The first one actually loads the contents of the file, but the second one expects a path to a file, instead of the contents.
Lets tackle the first line first:
I see that just passing the credentials you get after calling credentials = files.upload() will not work as explained in the docs for it. Doing it like you're doing, the credentials don't actually contain the value of the file directly, but rather a dictionary for filenames & contents.
Assuming you're only uploading the 1 credentials file, you can get the contents of the file like this (stolen from this SO answer):
from google.colab import files
uploaded = files.upload()
credentials_as_string = uploaded[uploaded.keys()[0]]
So now we actually have the contents of the uploaded file as a string, next step is to create an actual credentials object out of it.
This answer on Github shows how to create a credentials object from a string converted to json.
import json
from google.oauth2 import service_account
credentials_as_dict = json.loads(credentials_as_string)
credentials = service_account.Credentials.from_service_account_info(credentials_as_dict)
Finally we can create the storage client object using this credentials object:
storage_client = storage.Client(credentials=credentials)
Please note I've not tested this though, so please give it a go and see if it actually works.

Appengine Blobstore: Index out of range

I'm trying to upload a file using Blobstore API to Google Cloud Storage. The image uploads correctly, but then I try to process it (link it to a user). I'm getting the error:
Index out of range
This is my code:
class UploadHandler(blobstore_handlers.BlobstoreUploadHandler):
def post(self):
upload_files = self.get_file_infos('file') # 'file' is file upload field in the form
file_info = upload_files[0]
#self.response.headers['Content-Type'] = 'application/x-www-form-urlencoded'
#self.response.headers.add_header('Access-Control-Allow-Origin', '*')
gcs_filename = file_info.gs_object_name
file_key = blobstore.create_gs_key(gcs_filename)
File(file=file_key, owner=utils.get_current_user(),
url= images.get_serving_url(file_key)).put()
My code drops in file_info = upload_files[0] line.

Where is your code that puts the file into your Google Cloud Storage bucket?
I think the problem might be these two lines, depending on how you implemented the GCS upload...
gcs_filename = file_info.gs_object_name
file_key = blobstore.create_gs_key(gcs_filename)
The gs_object_name should only return a meaningful result if the item is from GCS. This would cause create_gs_key() to fail as well if the gcs_filename is not correct.
For how to use blobstore API with Google Cloud Storage, please see this article for details - https://cloud.google.com/appengine/docs/python/blobstore/#Python_Using_the_Blobstore_API_with_Google_Cloud_Storage

Python App Engine serving files with Google Cloud Storage

I currently use the following code for allowing my users to upload files;
uploadurl = blobstore.create_upload_url('/process?session=' + session, gs_bucket_name='mybucketname')
and I can serve images like this;
imgurl = get_serving_url(blob_key, size=1600, crop=False, secure_url=True)
After content is uploaded using the method in the first code snipped, the blob key contains encoded_gs_file: and that's how it knows to serve it from Google Cloud Service and not the blobstore as standard.
However, I'm unsure how I'd serve any other kind of file (for example .pdf, or .rtf). I do not want the content to be displayed in the browser, but rather sent to the client as a download (so they get the save file dialog and choose a location on their computer to save it).
How would I go about doing this? Thanks.

Using a google serving_url works only for images.
To serve a pdf from the blobstore you can use:
class DynServe(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
(blob_key, extension) = resource.rpartition('.')[::2]
blob_info = blobstore.BlobInfo.get(blob_key)
if not blob_info:
logging.error('Blob NOT FOUND %s' % resource)
self.abort(404)
self.response.headers[b'Content-Type'] = mimetypes.guess_type(blob_info.filename)
self.send_blob(blob_key, save_as=blob_info.filename)
The webapp2 route for this handler looks like:
webapp2.Route(r'/dynserve/<resource:(.*)>', handler=DynServe)
To serve:
PDF download

I'm going to answer my own question based on the answer from #voscausa
This is what my handler looks like (inside a file named view.py);
class DynServe(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
blob_key = resource
if not blobstore.get(blob_key):
logging.warning('Blob NOT FOUND %s' % resource)
self.abort(404)
return
else:
blob_info = blobstore.BlobInfo.get(blob_key)
self.send_blob(blob_key, save_as=blob_info.filename)
We need this in app.yaml;
- url: /download/.*
script: view.app
secure: always
secure: always is optional, but I always use it while handling user data.
Put this at the bottom of view.py;
app = webapp.WSGIApplication([('/download/([^/]+)?', DynServe),
], debug=False)
Now visit /download/BLOB_KEY_HERE. (you can check the datastore for your blob key)
That's a fully working example which works with both the standard blob store AND Google Cloud Service.
NOTE: All blob keys which are part of the GCS will start with encoded_gs_file: and the ones which don't are in the standard blobstore; app engine automatically uses this to determine where to locate the file

GAE - how to use blobstore stub in testbed?

My code goes like this:
self.testbed.init_blobstore_stub()
upload_url = blobstore.create_upload_url('/image')
upload_url = re.sub('^http://testbed\.example\.com', '', upload_url)
response = self.testapp.post(upload_url, params={
'shopid': id,
'description': 'JLo',
}, upload_files=[('file', imgPath)])
self.assertEqual(response.status_int, 200)
how come it shows 404 error? For some reasons the upload path does not seem to exist at all.

You can't do this. I think the problem is that webtest (which I assume is where self.testapp came from) doesn't work well with testbed blobstore functionality. You can find some info at this question.
My solution was to override unittest.TestCase and add the following methods:
def create_blob(self, contents, mime_type):
"Since uploading blobs doesn't work in testing, create them this way."
fn = files.blobstore.create(mime_type = mime_type,
_blobinfo_uploaded_filename = "foo.blt")
with files.open(fn, 'a') as f:
f.write(contents)
files.finalize(fn)
return files.blobstore.get_blob_key(fn)
def get_blob(self, key):
return self.blobstore_stub.storage.OpenBlob(key).read()
You will also need the solution here.
For my tests where I would normally do a get or post to a blobstore handler, I instead call one of the two methods above. It is a bit hacky but it works.
Another solution I am considering is to use Selenium's HtmlUnit driver. This would require the dev server to be running but should allow full testing of blobstore and also javascript (as a side benefit).

I think Kekito is right, you cannot POST to the upload_url directly.
But if you want to test the BlobstoreUploadHandler, you can fake the POST request it would normally received from the blobstore in the following way. Assuming your handler is at /handler :
import email
...
def test_upload(self):
blob_key = 'abcd'
# The blobstore upload handler receives a multipart form request
# containing uploaded files. But instead of containing the actual
# content, the files contain an 'email' message that has some meta
# information about the file. They also contain a blob-key that is
# the key to get the blob from the blobstore
# see blobstore._get_upload_content
m = email.message.Message()
m.add_header('Content-Type', 'image/png')
m.add_header('Content-Length', '100')
m.add_header('X-AppEngine-Upload-Creation', '2014-03-02 23:04:05.123456')
# This needs to be valie base64 encoded
m.add_header('content-md5', 'd74682ee47c3fffd5dcd749f840fcdd4')
payload = m.as_string()
# The blob-key in the Content-type is important
params = [('file', webtest.forms.Upload('test.png', payload,
'image/png; blob-key='+blob_key))]
self.testapp.post('/handler', params, content_type='blob-key')
I figured that out by digging into the blobstore code. The important bit is that the POST request that the blobstore sends to the UploadHandler doesn't contain the file content. Instead, it contains an "email message" (well, informations encoded like in an email) with metadata about the file (content-type, content-length, upload time and md5). It also contains a blob-key that can be used to retrieve the file from the blobstore.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to serve a pdf file from GCS in GAE? - python

Related

Setting ["GOOGLE_APPLICATION_CREDENTIALS"] from a dict rather than file path

Trouble with Google Application Credentials

Appengine Blobstore: Index out of range

Python App Engine serving files with Google Cloud Storage

GAE - how to use blobstore stub in testbed?

Categories

Resources