Appengine Blobstore: Index out of range

Appengine Blobstore: Index out of range - python

I'm trying to upload a file using Blobstore API to Google Cloud Storage. The image uploads correctly, but then I try to process it (link it to a user). I'm getting the error:
Index out of range
This is my code:
class UploadHandler(blobstore_handlers.BlobstoreUploadHandler):
def post(self):
upload_files = self.get_file_infos('file') # 'file' is file upload field in the form
file_info = upload_files[0]
#self.response.headers['Content-Type'] = 'application/x-www-form-urlencoded'
#self.response.headers.add_header('Access-Control-Allow-Origin', '*')
gcs_filename = file_info.gs_object_name
file_key = blobstore.create_gs_key(gcs_filename)
File(file=file_key, owner=utils.get_current_user(),
url= images.get_serving_url(file_key)).put()
My code drops in file_info = upload_files[0] line.

Where is your code that puts the file into your Google Cloud Storage bucket?
I think the problem might be these two lines, depending on how you implemented the GCS upload...
gcs_filename = file_info.gs_object_name
file_key = blobstore.create_gs_key(gcs_filename)
The gs_object_name should only return a meaningful result if the item is from GCS. This would cause create_gs_key() to fail as well if the gcs_filename is not correct.
For how to use blobstore API with Google Cloud Storage, please see this article for details - https://cloud.google.com/appengine/docs/python/blobstore/#Python_Using_the_Blobstore_API_with_Google_Cloud_Storage

Related

Get content_type from Google Cloud file

I have two api endpoints, one that takes a file from an http request and uploads it to a google cloud bucket using the python api, and another that downloads it again. in the first view, i get the file content type from the http request and upload it to the bucket,setting that metadata:
from google.cloud import storage
file_obj = request.FILES['file']
client = storage.Client.from_service_account_json(path.join(
path.realpath(path.dirname(__file__)),
'..',
'settings',
'api-key.json'
))
bucket = client.get_bucket('storage-bucket')
blob = bucket.blob(filename)
blob.upload_from_string(
file_text,
content_type=file_obj.content_type
)
Then in another view, I download the file:
...
bucket = client.get_bucket('storage-bucket')
blob = bucket.blob(filename)
blob.download_to_filename(path)
How can I access the file metadata I set earlier (content_type) ? It's not available on the blob object anymore since a new one was instantiated, but it still holds the file.

You should try
blob = bucket.get_blob(blob_name)
blob.content_type

get_uploads from Blobstore Upload returns empty list although file shows up in bucket

I've recently deployed my python GAE app from the development server and my image upload function stopped working properly...
After a bit of testing, it seems that the get_uploads function from blobstore is returning an empty list and hence I get an out of index error from the upload handler (also tried the get_file_infos function and had the same result)
However, when I check the GCS browser, the file is properly uploaded so my problem seems to be that I can't find a way to extract the image link from the post to Upload Handler
Anybody have clues as to why this is happening? and if there's a way around this?
(The form uses a post method with multipart/form-data so hopefully that isn't an issue)
Here's the function I'm calling to post to the upload handler:
upload_url = blobstore.create_upload_url('/upload', gs_bucket_name='BUCKET')
result = urlfetch.fetch(url= upload_url,
payload=self.request.body,
method=urlfetch.POST,
headers=self.request.headers)
And here's the code for the upload handler:
class UploadHandler(blobstore_handlers.BlobstoreUploadHandler):
def post(self):
upload_files = self.get_uploads('file')
blob_info = upload_files[0]
self.response.write(str(blob_info.key()))

What do you try to do?
It looks like you try to post a received body to GCS. Why not write it using the Google Cloud Storage Client Library.
with gcs.open(gcs_filename, 'w', content_type, options={b'x-goog-acl': b'public-read'}) as f:
f.write(blob)

Python Boto3 AWS Multipart Upload Syntax

I am successfully authenticating with AWS and using the 'put_object' method on the Bucket object to upload a file. Now I want to use the multipart API to accomplish this for large files. I found the accepted answer in this question:
How to save S3 object to a file using boto3
But when trying to implement I am getting "unknown method" errors. What am I doing wrong? My code is below. Thanks!
## Get an AWS Session
self.awsSession = Session(aws_access_key_id=accessKey,
aws_secret_access_key=secretKey,
aws_session_token=session_token,
region_name=region_type)
...
# Upload the file to S3
s3 = self.awsSession.resource('s3')
s3.Bucket('prodbucket').put_object(Key=fileToUpload, Body=data) # WORKS
#s3.Bucket('prodbucket').upload_file(dataFileName, 'prodbucket', fileToUpload) # DOESNT WORK
#s3.upload_file(dataFileName, 'prodbucket', fileToUpload) # DOESNT WORK

The upload_file method has not been ported over to the bucket resource yet. For now you'll need to use the client object directly to do this:
client = self.awsSession.client('s3')
client.upload_file(...)

Libcloud S3 wrapper transparently handles all the splitting and uploading of the parts for you.
Use upload_object_via_stream method to do so:
from libcloud.storage.types import Provider
from libcloud.storage.providers import get_driver
# Path to a very large file you want to upload
FILE_PATH = '/home/user/myfile.tar.gz'
cls = get_driver(Provider.S3)
driver = cls('api key', 'api secret key')
container = driver.get_container(container_name='my-backups-12345')
# This method blocks until all the parts have been uploaded.
extra = {'content_type': 'application/octet-stream'}
with open(FILE_PATH, 'rb') as iterator:
obj = driver.upload_object_via_stream(iterator=iterator,
container=container,
object_name='backup.tar.gz',
extra=extra)
For official documentation on S3 Multipart feature, refer to AWS Official Blog.

How to serve a pdf file from GCS in GAE?

I'm using Google App Engine in Python to handle a small webapp.
I have some files stored in my GCS that I want to serve only if the user is logged in.
I though it was really easy, but for sure I'm missing a step since my code:
import cloudstorage as gcs
class Handler(webapp2.RequestHandler):
def write(self, *a, **kw):
self.response.out.write(*a, **kw)
class testHandler (Handler):
def get (self):
bucket = '/my_bucket'
filename = '/pdf/somedoc.pdf'
user = users.get_current_user()
if user:
pdf = gcs.open(bucket+filename)
self.write(pdf)
only gives:
<cloudstorage.storage_api.ReadBuffer object at 0xfbb931d0>
and what I need is the file itself.
Anyone can tell me which is the step I'm missing?
Thanks

After some thinking, shower and coffee, I realized I had two problems.
First I was writing the address of the file, not the file.
So the correct call would be:
self.write(pdf.read())
Also, I had to change the 'Content-Type' header to 'application/pdf', to allow the browser to serve the file and not a text file.
Anyhow, the result was:
class pHandler(webapp2.RequestHandler):
def write(self, *a, **kw):
self.response.headers['Content-Type']='application/pdf'
self.response.out.write(*a, **kw)
class testHandler (pHandler):
def get (self):
bucket = '/my_bucket'
filename = '/pdf/somedoc.pdf'
user = users.get_current_user()
if user:
pdf = gcs.open(bucket+filename)
self.write(pdf.read())

Even the PO has answered his question, just want to add a few thoughts.
PO's code is to write the content of pdf file into http response.
self.write(pdf.read())
According to GAE quota limitation, if the response size is larger than 32MB, it will fail.
Also, it would be good to set the urlfetch_timeout value, as default value of 5 seconds may not be enough in some circumstance, and would result in DeadlineExceededError.
I would recommend to try, when a request is received, use Google Cloud Storage API ( Not the GAE one ) to copy the file to a temporary location. Also Make sure to set the acl of the new object as publicly readable, then serve the public url of the new object.
Also, send a request to a taskqueue, set eta of the task to a timeout value of your choice. Once the task is executed, remove the file from the temporary location, so that it could no longer be accessed.
UPDATE:
Use Service Account Auth, Generate a new JSON key, get the private key.
Set the scope to FULL_CONTROL as we need to change acl settings.
I havn't test the code yet as I am at work. But will do when i have time.
import httplib2
from apiclient.discovery import build
from apiclient.errors import HttpError
from oauth2client.client import SignedJwtAssertionCredentials
# Need to modify ACL, therefore need full control access
GCS_SCOPE = 'https://www.googleapis.com/auth/devstorage.full_control'
def get_gcs_client( project_id,
service_account=None,
private_key=None):
credentials = SignedJwtAssertionCredentials(service_account, private_key, scope=GCS_SCOPE)
http = httplib2.Http()
http = credentials.authorize(http)
service = build('storage', 'v2', http=http)
return service

I think you'd be better off using the BlobStore API on GCS to serve this kind of files. Based on Using the Blobstore API with Google Cloud Storage, I've come up with this approach:
import cloudstorage as gcs
import webapp2
from google.appengine.ext import blobstore
from google.appengine.ext.webapp import blobstore_handlers
GCS_PREFIX = '/gs'
BUCKET = '/my_bucket'
FILE = '/pdf/somedoc.pdf'
BLOBSTORE_FILENAME = GCS_PREFIX + BUCKET + FILE
class GCSWebAppHandler(webapp2.RequestHandler):
def get(self):
blob_key = blobstore.create_gs_key(BLOBSTORE_FILENAME)
self.response.headers['Content-Type'] = 'application/pdf'
self.response.write(blobstore.fetch_data(blob_key, 0, blobstore.MAX_BLOB_FETCH_SIZE - 1))
class GCSBlobDlHandler(blobstore_handlers.BlobstoreDownloadHandler):
def get(self):
blob_key = blobstore.create_gs_key(BLOBSTORE_FILENAME)
self.send_blob(blob_key)
app = webapp2.WSGIApplication([
('/webapphandler', GCSWebAppHandler),
('/blobdlhandler', GCSServingHandler)],
debug=True)
As you can see, there are two example handlers you can use here, webapphandler and blobdlhandler. It's probably better to use the latter, since the former is limited by MAX_BLOB_FETCH_SIZE in fetch_data() which is 1MB, but if your served files are smaller than this size, it's ok.

Python App Engine serving files with Google Cloud Storage

I currently use the following code for allowing my users to upload files;
uploadurl = blobstore.create_upload_url('/process?session=' + session, gs_bucket_name='mybucketname')
and I can serve images like this;
imgurl = get_serving_url(blob_key, size=1600, crop=False, secure_url=True)
After content is uploaded using the method in the first code snipped, the blob key contains encoded_gs_file: and that's how it knows to serve it from Google Cloud Service and not the blobstore as standard.
However, I'm unsure how I'd serve any other kind of file (for example .pdf, or .rtf). I do not want the content to be displayed in the browser, but rather sent to the client as a download (so they get the save file dialog and choose a location on their computer to save it).
How would I go about doing this? Thanks.

Using a google serving_url works only for images.
To serve a pdf from the blobstore you can use:
class DynServe(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
(blob_key, extension) = resource.rpartition('.')[::2]
blob_info = blobstore.BlobInfo.get(blob_key)
if not blob_info:
logging.error('Blob NOT FOUND %s' % resource)
self.abort(404)
self.response.headers[b'Content-Type'] = mimetypes.guess_type(blob_info.filename)
self.send_blob(blob_key, save_as=blob_info.filename)
The webapp2 route for this handler looks like:
webapp2.Route(r'/dynserve/<resource:(.*)>', handler=DynServe)
To serve:
PDF download

I'm going to answer my own question based on the answer from #voscausa
This is what my handler looks like (inside a file named view.py);
class DynServe(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
blob_key = resource
if not blobstore.get(blob_key):
logging.warning('Blob NOT FOUND %s' % resource)
self.abort(404)
return
else:
blob_info = blobstore.BlobInfo.get(blob_key)
self.send_blob(blob_key, save_as=blob_info.filename)
We need this in app.yaml;
- url: /download/.*
script: view.app
secure: always
secure: always is optional, but I always use it while handling user data.
Put this at the bottom of view.py;
app = webapp.WSGIApplication([('/download/([^/]+)?', DynServe),
], debug=False)
Now visit /download/BLOB_KEY_HERE. (you can check the datastore for your blob key)
That's a fully working example which works with both the standard blob store AND Google Cloud Service.
NOTE: All blob keys which are part of the GCS will start with encoded_gs_file: and the ones which don't are in the standard blobstore; app engine automatically uses this to determine where to locate the file

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Appengine Blobstore: Index out of range - python

Related

Get content_type from Google Cloud file

get_uploads from Blobstore Upload returns empty list although file shows up in bucket

Python Boto3 AWS Multipart Upload Syntax

How to serve a pdf file from GCS in GAE?

Python App Engine serving files with Google Cloud Storage

Categories

Resources