I'm building a small website that involves users uploading images that will be displayed later. The images are stored in an S3 bucket.
Sometimes, I need to display a lot of these images at once, and I'm not sure how best to accommodate that, without allowing public access to S3.
Currently, when there's a request to the server, the server downloads the object from S3, and then returns the file to the client- This is understandably slow. I would love to just be able to return the S3 URL and have the client load from there (So the traffic doesn't have to pass through my server and I don't have to wait for the image to download from S3->Server->Client, but I also don't want S3 bucket urls that are just unsecured and that anyone can go to.
What is the best architecture to solve this? Is there a way of giving people very brief temporary permission to a bucket? Is it possible to scope that to a specific url?
I looked around on stackoverflow and github for similar questions, but most of them seem to have to do with how the files are uploaded and not accessing them securely.
As suggested by #jarmod, you can pre-sign your objects' URL.
In this case, once you need to share an image, you need to create a pre-sign URL for the object and share this URL.
Your server will only provide the URL. The user will access the image directly, without your server in the middle of the request.
The AWS site explains how to use pre-sign URLs:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-presigned-url.html
https://docs.aws.amazon.com/sdk-for-go/v1/developer-guide/s3-example-presigned-urls.html
Related
We have deployed a django server (nginx/gunicorn/django) but to scale the server there are multiple instances of same django application running.
Here is the diagram (architecture):
Each blue rectangle is a Virtual Machine.
HAProxy sends all request to example.com/admin to Server 3.other requests are divided between Server 1 and Server 2.(load balance).
Old Problem:
Each machine has a media folder and when admin Uploads something the uploaded media is only on Server 3. (normal users can't upload anything)
We solved this by sending all requests to example.com/media/* to Server 3 and nginx from Server3 serves all static files and media.
Problem right now
We are also using sorl-thumbnail.
When a requests comes for example.com/,sorl-thumbnail tries to access the media file but it doesn't exist on this machine because it's on Server3.
So now all requests to that machine(server 1 or 2) get 404 for that media file.
One solution that comes to mind is to make a shared partition between all 3 machines and use it as media.
Another solution is to sync all media folders after each upload but this solution has problem and that is we have almost 2000 requests per second and sometimes sync might not be fast enough and sorl-thumbnail creates the database record of empty file and 404 happens.
Thanks in advance and sorry for long question.
You should use an object store to save and serve your user uploaded files. django-storages makes the implementation really simple.
If you don’t want to use cloud based AWS S3 or equivalent, you can host your own on-prem S3 compatible object store with minio.
On your current setup I don’t see any easy way to fix where the number of vm s are dynamic depending on load.
If you have deployment automation then maybe try out rsync so that the vm takes care of syncing files with other vms.
Question: What was the problem?
we got 404 on other machines because normal requests (requests asking for a template) would get a 404 not found on thumbnail media.
real problem was with sorl-thumbnail template tags.
Here is what we ended up doing:
In models that needed a thumbnail, we added functions to create that specific thumbnail.
and using a post-save signal in the admin machine called all those functions to make sure all the thumbnails were created after save and the table for sorl-thumbnail is filled.
now in templates instead of calling sorl-thumbnail template tags now we call a function in model.
So I am working on a Flask application which is pretty much a property manager that involves allowing users to upload images of their properties. I am new to Flask and have never had to deal with images before. From a lot of Googling I understand that there are various ways to manage static files like images.
One way is to allow users to upload images directly to the file system, and then displaying it by retrieving the file location in the static folder using something like:
<img src="static/images/filename.jpg">
However, is this really an efficient way since this means storing generating and storing the location of each image URL in the database? Especially when it comes to deploying the application? Another way I discovered was using base64 encoding and storing the image directly into the database, which also doesn't sound very efficient either.
Another way, which I think might be the best to go about this, is to use an AWS S3 bucket. The user would then be able to upload an image directly to that bucket and be assigned a URL to that image. This URL is stored in the database and can then be used to display the image similarly to the file system method. Is my understanding of this correct? Is there a better way to go about this? And is there something similar to django-storages that can be used to connect Flask to S3?
Any input or pointing me in the right direction would be much appreciated. Thank you!
If you want to store the images in the web server then the best approach for you is to use nginx as proxy in front of flask and let nginx serve the static folder for all the images.
Nginx is pretty much enough for a small website. Don't try to serve the file using flask. It is too slow.
If you want to store the images in s3 ,then you just need to store the name of image in bucket in the database. You can tell flask to use s3 bucket as the static folder. You can use boto3 library in python to access s3.
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html
If you are concerned of exposing s3 bucket to users, then you can use cloudfront distribution. It is cheaper in terms of price to serve and also hides your bucket.
I am trying to serve files securely (images in this case) to my users. I would like to do this using flask and preferably amazon s3 however I would be open to another cloud storage solution if required.
I have managed to get my flask static files like css and such on s3 however this is all non-secure. So everyone who has the link can open the static files. This is obviously not what I want for secure content. I can't seems to figure out how I can make a file available to just authenticated user that 'owns' the file.
For example: When I log into my dropbox account and copy a random file's download link. Then go over to anther computer and use this link it will denie me access. Even though I am still logged in and the download link is available to user on the latter pc.
Make the request to your Flask application, which will authenticate the user and then issue a redirect to the S3 object. The trick is that the redirect should be to a signed temporary URL that expires in a minute or so, so it can't be saved and used later or by others.
You can use boto.s3.key.generate_url function in your Flask app to create the temporary URL.
I would like for a user, without having to have an Amazon account, to be able to upload mutli-gigabyte files to an S3 bucket of mine.
How can I go about this? I want to enable a user to do this by giving them a key or perhaps through an upload form rather than making a bucket world-writeable obviously.
I'd prefer to use Python on my serverside, but the idea is that a user would need nothing more than their web browser or perhaps opening up their terminal and using built-in executables.
Any thoughts?
You are attempting to proxy the file thorough your python backend to S3, that too large files. Instead you can configure S3 to accept files from user directly (without proxying through your backend code).
It is explained here: Browser Uploads to S3 using HTML POST Forms. This way your server need not handle any upload load at all.
If you also want your users to use their elsewhere ID (google/FB etc) to achieve this workflow, that too is possible. They will be able to upload these files to a sub-folder (path) in your bucket without exposing other parts of your bucket. This is detailed here: Web Identity Federation with Mobile Applications. Though it says mobile, you can apply the same to webapps.
Having said all that, as #Ratan points out, large file uploads could break in between when you try from a browser and it cant retry "only the failed parts". This is where a dedicated app's need come in. Another option is to ask your users to keep the files in their Dropbox/BOX.com account and your server can read from there - these services already take care of large file upload with all retries etc using their apps.
This answer is relevant to .Net as language.
We had such requirement, where we had created an executable. The executable internally called a web method, which validated the app authenticated to upload files to AWS S3 or NOT.
You can do this using a web browser too, but I would not suggest this, if you are targeting big files.
I'd like to securely display a grid of thumbnail images to an authenticated user on our site. All the images will be stored in Amazon S3.
One way, I suppose, is to implement "security by obscurity" by uploading these images with public read access, and making the keys long and random.
I also could set up ACLs, but then I'd have to disclose the access key in the url (I think), or pull the image into my application via the API and display it securely through the web server.
Is there a preferred way to do this? And to be able to display the images quickly without requiring tremendous requests to S3 from the server every time a page is generated?
Thanks in advance
You can generate urls to s3 with an expiry date. Generating such a URL does not require a request to S3 and does not result in the disclosure of your secret key: you use your secret key to generate a signature that is appended to the URL (the access key id is in that URL but that's ok)
See the docs on query string authorization