Upload image to Appengine Datastore using BlobStore and Endpoints

Upload image to Appengine Datastore using BlobStore and Endpoints - python

How can I upload a file/image to the Appengine Datastore using blobStore? I'm using Google Cloud Endpoints.
This is my model:
class ProductImage(EndpointsModel):
_message_fields_schema = ('product', 'enable', 'image')
product = ndb.KeyProperty(Product)
image = ndb.BlobKeyProperty(required=True)
enable = ndb.BooleanProperty(default=True)
How can I test it from API Explorer? At the frontend I'm using AngularJS.

I couldn't figure out a way to do this with just Endpoints; I had to have a hybrid server with part-endpoints application, part-webapp2 blobstore_handlers application. If you use the webapp2 stuff as per the Blobstore upload examples for those parts, it works. For example, the flow should be:
Client requests an upload URL (use Endpoints for this call, and have
it basically do blobstore.create_upload_url(PATH)
Client uploads
image to the given URL, which is handled by your
blobstore_handlers.BlobstoreUploadHandler method, which pulls out
the upload and dumps the blob_info.key() (in JSON, for example)
Client calls createProduct or whatever, an Endpoint, and passes back
the blobkey it just received, along with the rest of your
ProductImage model. You may want to call get_serving_url in that method and stash it in your model, it shouldn't change later.
Clients can then use that stashed serving url to view image.
Also I had a lot of "fun" with the BlobKeyProperty. In dev deployments, everything worked fine, but in 'production', I'd get invalid image errors while calling get_serving_url() on the stored blobkey. I think this might be due to the blobs actually not being bitmaps, though, and dev not caring.

Related

Storing user images on AWS

I'm implementing a simple app using ionic2, which calls an API built using Flask. When setting up the profile, I give the option to the users to upload their own images.
I thought of storing them in an S3 bucket and serving them through CloudFront.
After some research I can only find information about:
Uploading images from the local storage using python.
Uploading images from a HTML file selector using javascript.
I can't find anything about how to deal with blobs/files when you have a front end interacting with an API. When I started researching the options I had thought of were:
Post the file to Amazon on the client side and return the
CloudFront url directly to the back end. I am not too keen on this
one because it would involve having some kind of secret on the
client side (maybe is not that dangerous, but I would rather have it
on the back end).
Upload the image to the server and somehow tell the back end about
which file we want the back end to choose. I am not too keen on
this approach either because the client would need to have knowledge
about the server itself (not only the API).
Encode the image (I have tought of base64, but with the lack of
examples I think that it is plain wrong) and post it to back end,
which will handle all the S3 upload/store CloudFront URL.
I feel like all these approaches are plain wrong, but I can't think (or find) what is the right way of doing it.
How should I approach it?

Have the server generate a pre-signed URL for the client to upload the image to. That means the server is in control of what the URLs will look like and it doesn't expose any secrets, yet the client can upload the image directly to S3.
Generating a pre-signed URL in Python using boto3 looks something like this:
s3 = boto3.client('s3', aws_access_key_id=..., aws_secret_access_key=...)
params = dict(Bucket='my-bucket', Key='myfile.jpg', ContentType='image/jpeg')
url = s3.generate_presigned_url('put_object', Params=params, ExpiresIn=600)
The ContentType is optional, and the client will have to set the same Content-Type HTTP header during upload to url; I find it handy to limit the allowable file types if known.

Serving images for HTML from GAE datastore

I am developing an application that will take HTML and images from the user and save it in a datastore. So far this part is done. How do I serve these images as resources of the HTML page when a user requests a particular one?

If you're adamant you want to keep images in the GAE datastore (not usually the best approach -- Google Cloud Storage is), you can serve them e.g with a handlers: entry with
handlers:
- url: /img/*
- script: images.app
and in images.py you have something like
app = webapp2.WSGIapplication('/img/(.*)', ImgHandler)
with, earlier in the same file, s/thing like:
class ImgHandler(webapp2.RequestHandler):
def get(self, img_key_urlsafe):
key = ndb.Key(urlsafe=img_key_urlsafe)
img = key.get()
self.response.headers['Content-Type'] = 'image/png'
self.response.write(img.data)
Of course, you'll have to arrange to have the images' URLs on the client side (e.g in HTML from jinja2 templates) properly prepared as
/img/some_image_key_urlsafe
and I'm assuming the images are PNG, etc (you could have the content type as one of the image entity's attributes, of course).
Unless the images are really small, this will add substantial load to your GAE app, which could be minimized by stashing the images up in Google Storage and serving them directly from there... serving them directly from datastore IS feasible (as long as they're pretty small, since a GAE entity is bounded to max 1MB!), but it's usually not optimal.

Low-level reading of a POST multipart request?

I have an example which I'm trying to create which, preferably using Django (or some other comparable framework), will immediately compress uploaded contents chunk-by-chunk into a strange compression format (be it LZMA, 7zip, etc.) which is then written out to another upload request to S3.
Essentially, this is what will happen:
A user initiates a multipart upload to my endpoint at ^/upload/?$.
As chunks are received on the server, (could be 1024 bytes or some other number) they are then passed through a compression algorithm in chunks.
The compressed output is written out over the wire to a S3 bucket.
Step 3 is optional; I could store the file locally and have a message queue do the uploads in a deferred way.
Is step 2 possible using a framework like Django? Is there a low-level way of accessing the incoming data in a file-like object?

The Django Request object provides a file-like interface so you can stream data from it. But, since Django always reads the whole Request into memory (or a temporary File if the file upload is too large) you can only use this API after the whole request is received. If your temporary storage directory is big enough and you do not mind buffering the data on your server you do not need to do anything special. Just upload the data to S3 inside the view. Be careful with timeouts though. If the upload to S3 takes too long the browser will receive a timeout. Therefore I would recommend moving the temporary files to a more permanent directory and initiating the upload via a worker queue like Celery.
If you want to stream directly from the client into Amazon S3 via your server I recommend using gevent. Using gevent you could write a simple greenlet that reads from a queue and writes to S3. This queue is filled by the original greenlet which reads from the request.
You could use a special upload URL like http://upload.example.com/ where you deploy that special server. The Django functions can be used from outside the Django framework if you set the DJANGO_SETTINGS_MODULE environment variable and take care of some things that the middlewares normally do for you (db connect/disconnect, transaction begin/commit/rollback, session handling, etc.).
It is even possible to run your custom WSGI app and Django together in the same WSGI container. Just wrap the Django WSGI app and intercept requests to /upload/. In this case I would recommend using gunicorn with the gevent worker-class as server.
I am not too familiar with the Amazon S3 API, but as far as I know you can also generate a temporary token for file uploads directly from your users. That way you would not need to tunnel the data through your server at all.
Edit: You can indeed allow anonymous uploads to your buckets. See this question which talks about this topic: S3 - Anonymous Upload - Key prefix

how to simulate image upload to google app engine blobstore

I'm uploading images to the GAE blobstore using create_upload_url
uploadURL = blobstore.create_upload_url('/upload')
For the purpose of unit testing the gae code, can you simulate the image upload? OR should I insert the image data in my test bed and assume the upload is successful? If so, how do you upload an image to the test bed?

Agree with #fredrik on what exactly you're testing there.
Anyway, if you're doing some functional/blackbox/similar testing, you could simply use Webtest framework (see post method) and do the actual upload, e.g.
payload = [(fieldname, filename)]
test_app.post(uploadURL, upload_files=payload)
Have a look at Handler Testing for Python for details on how to initialize the above test_app.

Could you provided some code on how your test look?
I think you should be able to fake a request to the upload_url using webapp2. Have a look here for some sample code on how to fake requests.
On the other hand you should think of what the purpose of your test is. Is the purpose to test that the image upload works or is it how your code works after the upload is complete?
When running unit-tests try to break the dependencies to other libraries so that you only test you own code. And then add a new suite of implementation test, ie make a request for and url and check that you get the expected response. As in test_redirect_if_no_session of the example above, make a request to a page that requires a user and expect a redirect (http response code 302).
..fredrik

Asynchronous File Upload to Amazon S3 with Django

I am using this file storage engine to store files to Amazon S3 when they are uploaded:
http://code.welldev.org/django-storages/wiki/Home
It takes quite a long time to upload because the file must first be uploaded from client to web server, and then web server to Amazon S3 before a response is returned to the client.
I would like to make the process of sending the file to S3 asynchronous, so the response can be returned to the user much faster. What is the best way to do this with the file storage engine?
Thanks for your advice!

I've taken another approach to this problem.
My models have 2 file fields, one uses the standard file storage backend and the other one uses the s3 file storage backend. When the user uploads a file it get's stored localy.
I have a management command in my application that uploads all the localy stored files to s3 and updates the models.
So when a request comes for the file I check to see if the model object uses the s3 storage field, if so I send a redirect to the correct url on s3, if not I send a redirect so that nginx can serve the file from disk.
This management command can ofcourse be triggered by any event a cronjob or whatever.

It's possible to have your users upload files directly to S3 from their browser using a special form (with an encrypted policy document in a hidden field). They will be redirected back to your application once the upload completes.
More information here: http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1434

There is an app for that :-)
https://github.com/jezdez/django-queued-storage
It does exactly what you need - and much more, because you can set any "local" storage and any "remote" storage. This app will store your file in fast "local" storage (for example MogileFS storage) and then using Celery (django-celery), will attempt asynchronous uploading to the "remote" storage.
Few remarks:
The tricky thing is - you can setup it to copy&upload, or to upload&delete strategy, that will delete local file once it is uploaded.
Second tricky thing - it will serve file from "local" storage until it is not uploaded.
It also can be configured to make number of retries on uploads failures.
Installation & usage is also very simple and straightforward:
pip install django-queued-storage
append to INSTALLED_APPS:
INSTALLED_APPS += ('queued_storage',)
in models.py:
from queued_storage.backends import QueuedStorage
queued_s3storage = QueuedStorage(
'django.core.files.storage.FileSystemStorage',
'storages.backends.s3boto.S3BotoStorage', task='queued_storage.tasks.TransferAndDelete')
class MyModel(models.Model):
my_file = models.FileField(upload_to='files', storage=queued_s3storage)

You could decouple the process:
the user selects file to upload and sends it to your server. After this he sees a page "Thank you for uploading foofile.txt, it is now stored in our storage backend"
When the users has uploaded the file it is stored temporary directory on your server and, if needed, some metadata is stored in your database.
A background process on your server then uploads the file to S3. This would only possible if you have full access to your server so you can create some kind of "deamon" to to this (or simply use a cronjob).*
The page that is displayed polls asynchronously and displays some kind of progress bar to the user (or s simple "please wait" Message. This would only be needed if the user should be able to "use" (put it in a message, or something like that) it directly after uploading.
[*: In case you have only a shared hosting you could possibly build some solution which uses an hidden Iframe in the users browser to start a script which then uploads the file to S3]

You can directly upload media to the s3 server without using your web application server.
See the following references:
Amazon API Reference : http://docs.amazonwebservices.com/AmazonS3/latest/dev/index.html?UsingHTTPPOST.html
A django implementation : https://github.com/sbc/django-uploadify-s3

As some of the answers here suggest uploading directly to S3, here's a Django S3 Mixin using plupload:
https://github.com/burgalon/plupload-s3mixin

I encountered the same issue with uploaded images. You cannot pass along files to a Celery worker because Celery needs to be able to pickle the arguments to a task. My solution was to deconstruct the image data into a string and get all other info from the file, passing this data and info to the task, where I reconstructed the image. After that you can save it, which will send it to your storage backend (such as S3). If you want to associate the image with a model, just pass along the id of the instance to the task and retrieve it there, bind the image to the instance and save the instance.
When a file has been uploaded via a form, it is available in your view as a UploadedFile file-like object. You can get it directly out of request.FILES, or better first bind it to your form, run is_valid and retrieve the file-like object from form.cleaned_data. At that point at least you know it is the kind of file you want it to be. After that you can get the data using read(), and get the other info using other methods/attributes. See https://docs.djangoproject.com/en/1.4/topics/http/file-uploads/
I actually ended up writing and distributing a little package to save an image asyncly. Have a look at https://github.com/gterzian/django_async Right it's just for images and you could fork it and add functionalities for your situation. I'm using it with https://github.com/duointeractive/django-athumb and S3

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.