Storing user images on AWS - python

I'm implementing a simple app using ionic2, which calls an API built using Flask. When setting up the profile, I give the option to the users to upload their own images.
I thought of storing them in an S3 bucket and serving them through CloudFront.
After some research I can only find information about:
Uploading images from the local storage using python.
Uploading images from a HTML file selector using javascript.
I can't find anything about how to deal with blobs/files when you have a front end interacting with an API. When I started researching the options I had thought of were:
Post the file to Amazon on the client side and return the
CloudFront url directly to the back end. I am not too keen on this
one because it would involve having some kind of secret on the
client side (maybe is not that dangerous, but I would rather have it
on the back end).
Upload the image to the server and somehow tell the back end about
which file we want the back end to choose. I am not too keen on
this approach either because the client would need to have knowledge
about the server itself (not only the API).
Encode the image (I have tought of base64, but with the lack of
examples I think that it is plain wrong) and post it to back end,
which will handle all the S3 upload/store CloudFront URL.
I feel like all these approaches are plain wrong, but I can't think (or find) what is the right way of doing it.
How should I approach it?

Have the server generate a pre-signed URL for the client to upload the image to. That means the server is in control of what the URLs will look like and it doesn't expose any secrets, yet the client can upload the image directly to S3.
Generating a pre-signed URL in Python using boto3 looks something like this:
s3 = boto3.client('s3', aws_access_key_id=..., aws_secret_access_key=...)
params = dict(Bucket='my-bucket', Key='myfile.jpg', ContentType='image/jpeg')
url = s3.generate_presigned_url('put_object', Params=params, ExpiresIn=600)
The ContentType is optional, and the client will have to set the same Content-Type HTTP header during upload to url; I find it handy to limit the allowable file types if known.

Related

Lambda function in python to copy files from S3 to On prem server

I am new to AWS and have to copy a file from S3 bucket to an on-prem server.
As I understand the lambda would get triggered from S3 file upload event notification. But what could be used in lambda to send the file securely.
Your best bet may be to create a hybrid network so AWS lambda can talk directly to an on-prem server. Then you could directly copy the file server-to-server over the network. That's a big topic to cover, with lots of other considerations that probably go well beyond this simple question/answer.
You could send it via an HTTPS web request. You could easily write code to send it like this, but that implies you have something on the other end set up to receive it--some sort of HTTPS web server/API. Again, that could be big topic, but here's a description of how you might do that in Python.
Another option would be to use SNS to notify something on-premise whenever a file is uploaded. Then you could write code to pull the file down from S3 on your end. But the key thing is that this pull is initiated by code on your side. Maybe it gets triggered in response to an SNS email or something like that, but the network flow is on-premise fetching the file from S3 versus lambda pushing it to on-premise.
There are many other options. You probably need to do more research and decide your architectural approach before getting too deep into the implementation details.

Upload image to Appengine Datastore using BlobStore and Endpoints

How can I upload a file/image to the Appengine Datastore using blobStore? I'm using Google Cloud Endpoints.
This is my model:
class ProductImage(EndpointsModel):
_message_fields_schema = ('product', 'enable', 'image')
product = ndb.KeyProperty(Product)
image = ndb.BlobKeyProperty(required=True)
enable = ndb.BooleanProperty(default=True)
How can I test it from API Explorer? At the frontend I'm using AngularJS.
I couldn't figure out a way to do this with just Endpoints; I had to have a hybrid server with part-endpoints application, part-webapp2 blobstore_handlers application. If you use the webapp2 stuff as per the Blobstore upload examples for those parts, it works. For example, the flow should be:
Client requests an upload URL (use Endpoints for this call, and have
it basically do blobstore.create_upload_url(PATH)
Client uploads
image to the given URL, which is handled by your
blobstore_handlers.BlobstoreUploadHandler method, which pulls out
the upload and dumps the blob_info.key() (in JSON, for example)
Client calls createProduct or whatever, an Endpoint, and passes back
the blobkey it just received, along with the rest of your
ProductImage model. You may want to call get_serving_url in that method and stash it in your model, it shouldn't change later.
Clients can then use that stashed serving url to view image.
Also I had a lot of "fun" with the BlobKeyProperty. In dev deployments, everything worked fine, but in 'production', I'd get invalid image errors while calling get_serving_url() on the stored blobkey. I think this might be due to the blobs actually not being bitmaps, though, and dev not caring.

AppEngine: Out of Blobstore and into S3?

I'm uploading data to the blobstore. There should it stay only temporary and be uploaded from within my AppEngine app to Amazon S3.
As it seems I can only get the data through a BlobDonwloadHandler as described at the Blobstore API: http://code.google.com/intl/de-DE/appengine/docs/python/blobstore/overview.html#Serving_a_Blob
So I tried to fetch that blob specific download URL from within my application (remember my question yesterday). Fetching internal URLs from within the AppEngine (not the development server) is working, even it´s bad style - I know.
But getting the blob is not working. My code looks like:
result = urllib2.urlopen('http://my-app-url.appspot.com/get_blob/'+str(qr.blob_key))
And I'm getting
DownloadError: ApplicationError: 2
Raised by:
File
"/base/python_runtime/python_lib/versions/1/google/appengine/api/urlfetch.py",
line 332, in _get_fetch_result
raise DownloadError(str(err))
Even after searching I really don't know what to do.
All tutorials I've seen are focusing only on serving through a URL back to the user. But I want to retrieve the blob from the Blobstore and send it to a S3 bucket. Anyone any idea how I could realize that or is that even not possible?
Thanks in advance.
You can use a BlobReader to access that Blob and send that to S3.
http://code.google.com/intl/de-DE/appengine/docs/python/blobstore/blobreaderclass.html

What's a Django/Python solution for providing a one-time url for people to download files?

I'm looking for a way to sell someone a card at an event that will have a unique code that they will be able to use later in order to download a file (mp3, pdf, etc.) only one time and mask the true file location so a savvy person downloading the file won't be able to download the file more than once. It would be nice to host the file on Amazon S3 to save on bandwidth where our server is co-located.
My thought for the codes would be to pre-generate the unique codes that will get printed on the cards and store those in a database that could also have a field that stores the number of times the file was downloaded. This way we could set how many attempts we would allow the user for downloading the file.
The part that I need direction on is how do I hide/mask the original file location so people can't steal that url and then download the file as many times as they want. I've done Google searches and I'm either not searching using the right keywords or there aren't very many libraries or snippets out there already for this type of thing.
I'm guessing that I might be able to rig something up using django.views.static.serve that acts as a sort of proxy between the actual file and the user downloading the file. The only drawback to this method I would think is that I would need to use the actual web server and wouldn't be able to store the file on Amazon S3.
Any suggestions or thoughts are greatly appreciated.
Neat idea. However, I would warn against the single-download method, because there is no guarantee that their first download attempt will be successful. Perhaps use a time-expiration method instead?
But it is certainly possible to do this with Django. Here is an outline of the basic approach:
Set up a django url for serving these files
Use a GET parameter which is a unique string to identify which file to get.
Keep a database table which has a FileField for the file to download. This table maps the unique strings to the location of the file on the file system.
To serve the file as a download, set the response headers in the view like this:
(path is the location of the file to serve)
with open(path, 'rb') as f:
response = HttpResponse(f.read())
response['Content-Type'] = 'application/octet-stream';
response['Content-Disposition'] = 'attachment; filename="%s"' % 'insert_filename_here'
return response
Since we are using this Django page to serve the file, the user cannot find out the original file location.
You can just use something simple such as mod_xsendfile. This functionality is also available in other popular webservers such lighttpd or nginx.
It works like this: when enabled your application (e.g. a trivial PHP script) can send a special response header, causing the webserver to serve a static file.
If you want it to work with S3 you will need to handle each and every request this way, meaning the traffic will go through your site, from there to AWS, back to your site and back to the client. Does S3 support symbolic links / aliases? If so you might just redirect a valid user to one of the symbolic URLs and delete that symlink after a couple of hours.

Asynchronous File Upload to Amazon S3 with Django

I am using this file storage engine to store files to Amazon S3 when they are uploaded:
http://code.welldev.org/django-storages/wiki/Home
It takes quite a long time to upload because the file must first be uploaded from client to web server, and then web server to Amazon S3 before a response is returned to the client.
I would like to make the process of sending the file to S3 asynchronous, so the response can be returned to the user much faster. What is the best way to do this with the file storage engine?
Thanks for your advice!
I've taken another approach to this problem.
My models have 2 file fields, one uses the standard file storage backend and the other one uses the s3 file storage backend. When the user uploads a file it get's stored localy.
I have a management command in my application that uploads all the localy stored files to s3 and updates the models.
So when a request comes for the file I check to see if the model object uses the s3 storage field, if so I send a redirect to the correct url on s3, if not I send a redirect so that nginx can serve the file from disk.
This management command can ofcourse be triggered by any event a cronjob or whatever.
It's possible to have your users upload files directly to S3 from their browser using a special form (with an encrypted policy document in a hidden field). They will be redirected back to your application once the upload completes.
More information here: http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1434
There is an app for that :-)
https://github.com/jezdez/django-queued-storage
It does exactly what you need - and much more, because you can set any "local" storage and any "remote" storage. This app will store your file in fast "local" storage (for example MogileFS storage) and then using Celery (django-celery), will attempt asynchronous uploading to the "remote" storage.
Few remarks:
The tricky thing is - you can setup it to copy&upload, or to upload&delete strategy, that will delete local file once it is uploaded.
Second tricky thing - it will serve file from "local" storage until it is not uploaded.
It also can be configured to make number of retries on uploads failures.
Installation & usage is also very simple and straightforward:
pip install django-queued-storage
append to INSTALLED_APPS:
INSTALLED_APPS += ('queued_storage',)
in models.py:
from queued_storage.backends import QueuedStorage
queued_s3storage = QueuedStorage(
'django.core.files.storage.FileSystemStorage',
'storages.backends.s3boto.S3BotoStorage', task='queued_storage.tasks.TransferAndDelete')
class MyModel(models.Model):
my_file = models.FileField(upload_to='files', storage=queued_s3storage)
You could decouple the process:
the user selects file to upload and sends it to your server. After this he sees a page "Thank you for uploading foofile.txt, it is now stored in our storage backend"
When the users has uploaded the file it is stored temporary directory on your server and, if needed, some metadata is stored in your database.
A background process on your server then uploads the file to S3. This would only possible if you have full access to your server so you can create some kind of "deamon" to to this (or simply use a cronjob).*
The page that is displayed polls asynchronously and displays some kind of progress bar to the user (or s simple "please wait" Message. This would only be needed if the user should be able to "use" (put it in a message, or something like that) it directly after uploading.
[*: In case you have only a shared hosting you could possibly build some solution which uses an hidden Iframe in the users browser to start a script which then uploads the file to S3]
You can directly upload media to the s3 server without using your web application server.
See the following references:
Amazon API Reference : http://docs.amazonwebservices.com/AmazonS3/latest/dev/index.html?UsingHTTPPOST.html
A django implementation : https://github.com/sbc/django-uploadify-s3
As some of the answers here suggest uploading directly to S3, here's a Django S3 Mixin using plupload:
https://github.com/burgalon/plupload-s3mixin
I encountered the same issue with uploaded images. You cannot pass along files to a Celery worker because Celery needs to be able to pickle the arguments to a task. My solution was to deconstruct the image data into a string and get all other info from the file, passing this data and info to the task, where I reconstructed the image. After that you can save it, which will send it to your storage backend (such as S3). If you want to associate the image with a model, just pass along the id of the instance to the task and retrieve it there, bind the image to the instance and save the instance.
When a file has been uploaded via a form, it is available in your view as a UploadedFile file-like object. You can get it directly out of request.FILES, or better first bind it to your form, run is_valid and retrieve the file-like object from form.cleaned_data. At that point at least you know it is the kind of file you want it to be. After that you can get the data using read(), and get the other info using other methods/attributes. See https://docs.djangoproject.com/en/1.4/topics/http/file-uploads/
I actually ended up writing and distributing a little package to save an image asyncly. Have a look at https://github.com/gterzian/django_async Right it's just for images and you could fork it and add functionalities for your situation. I'm using it with https://github.com/duointeractive/django-athumb and S3

Categories

Resources