We have deployed a django server (nginx/gunicorn/django) but to scale the server there are multiple instances of same django application running.
Here is the diagram (architecture):
Each blue rectangle is a Virtual Machine.
HAProxy sends all request to example.com/admin to Server 3.other requests are divided between Server 1 and Server 2.(load balance).
Old Problem:
Each machine has a media folder and when admin Uploads something the uploaded media is only on Server 3. (normal users can't upload anything)
We solved this by sending all requests to example.com/media/* to Server 3 and nginx from Server3 serves all static files and media.
Problem right now
We are also using sorl-thumbnail.
When a requests comes for example.com/,sorl-thumbnail tries to access the media file but it doesn't exist on this machine because it's on Server3.
So now all requests to that machine(server 1 or 2) get 404 for that media file.
One solution that comes to mind is to make a shared partition between all 3 machines and use it as media.
Another solution is to sync all media folders after each upload but this solution has problem and that is we have almost 2000 requests per second and sometimes sync might not be fast enough and sorl-thumbnail creates the database record of empty file and 404 happens.
Thanks in advance and sorry for long question.
You should use an object store to save and serve your user uploaded files. django-storages makes the implementation really simple.
If you don’t want to use cloud based AWS S3 or equivalent, you can host your own on-prem S3 compatible object store with minio.
On your current setup I don’t see any easy way to fix where the number of vm s are dynamic depending on load.
If you have deployment automation then maybe try out rsync so that the vm takes care of syncing files with other vms.
Question: What was the problem?
we got 404 on other machines because normal requests (requests asking for a template) would get a 404 not found on thumbnail media.
real problem was with sorl-thumbnail template tags.
Here is what we ended up doing:
In models that needed a thumbnail, we added functions to create that specific thumbnail.
and using a post-save signal in the admin machine called all those functions to make sure all the thumbnails were created after save and the table for sorl-thumbnail is filled.
now in templates instead of calling sorl-thumbnail template tags now we call a function in model.
Related
I'm building a small website that involves users uploading images that will be displayed later. The images are stored in an S3 bucket.
Sometimes, I need to display a lot of these images at once, and I'm not sure how best to accommodate that, without allowing public access to S3.
Currently, when there's a request to the server, the server downloads the object from S3, and then returns the file to the client- This is understandably slow. I would love to just be able to return the S3 URL and have the client load from there (So the traffic doesn't have to pass through my server and I don't have to wait for the image to download from S3->Server->Client, but I also don't want S3 bucket urls that are just unsecured and that anyone can go to.
What is the best architecture to solve this? Is there a way of giving people very brief temporary permission to a bucket? Is it possible to scope that to a specific url?
I looked around on stackoverflow and github for similar questions, but most of them seem to have to do with how the files are uploaded and not accessing them securely.
As suggested by #jarmod, you can pre-sign your objects' URL.
In this case, once you need to share an image, you need to create a pre-sign URL for the object and share this URL.
Your server will only provide the URL. The user will access the image directly, without your server in the middle of the request.
The AWS site explains how to use pre-sign URLs:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-presigned-url.html
https://docs.aws.amazon.com/sdk-for-go/v1/developer-guide/s3-example-presigned-urls.html
The app I am currently hosting on Heroku allows users to submit photos. Initially, I was thinking about storing those photos on the filesystem, as storing them in the database is apparently bad practice.
However, it seems there is no permanent filesystem on Heroku, only an ephemeral one. Is this true and, if so, what are my options with regards to storing photos and other files?
It is true. Heroku allows you to create cloud apps, but those cloud apps are not "permanent" - they are instances (or "slugs") that can be replicated multiple times on Amazon's EC2 (that's why scaling is so easy with Heroku). If you were to push a new version of your app, then the slug will be recompiled, and any files you had saved to the filesystem in the previous instance would be lost.
Your best bet (whether on Heroku or otherwise) is to save user submitted photos to a CDN. Since you are on Heroku, and Heroku uses AWS, I'd recommend Amazon S3, with optionally enabling CloudFront.
This is beneficial not only because it gets around Heroku's ephemeral "limitation", but also because a CDN is much faster, and will provide a better service for your webapp and experience for your users.
Depending on the technology you're using, your best bet is likely to stream the uploads to S3 (Amazon's storage service). You can interact with S3 with a client library to make it simple to post and retrieve the files. Boto is an example client library for Python - they exist for all popular languages.
Another thing to keep in mind is that Heroku file systems are not shared either. This means you'll have to be putting the file to S3 with the same application as the one handling the upload (instead of say, a worker process). If you can, try to load the upload into memory, never write it to disk and post directly to S3. This will increase the speed of your uploads.
Because Heroku is hosted on AWS, the streams to S3 happen at a very high speed. Keep that in mind when you're developing locally.
In a Django project of mine, users upload video files. Initially, I was uploading them directly to Azure Blob Storage (equivalent to storing it on Amazon S3). I.e. in models.py I had:
class Video(models.Model):
video_file = models.FileField(upload_to=upload_path, storage=OverwriteStorage())
Where OverwriteStorage overrides Storage in django.core.files.storage, and essentially uploads the file onto Azure.
Now I need to upload this file to a separate Linux server (not the same one that serves my Django web application). In this separate server, I'll perform some operations on the video file (compression, format change), and then I'll upload it to Azure Storage like before.
My question is: given my goal, how do I change the way I'm uploading the file in models.py? An illustrative example would be nice. I'm thinking I'll need to change FileField.upload_to, but all the examples I've seen indicate it's only to define a local filesystem path. Moreover, I don't want to let the user upload the content normally and then run a process to upload the file to another server. Doing it directly is my preference. Any ideas?
I've solved a similar issue with Amazon's S3, but the concept should be the same.
First, I use django-storages, and by default, upload my media files to S3 (django-storages also supports Azure). Then, my team set up an NFS share mount on our Django web servers from the destination server we occasional need to write user uploads to. Then we simply override django-storages by using "upload_to" to the local path that is a mount from the other server.
This answer has a quick example of how to set up an NFS share from one server on another: https://superuser.com/questions/300662/how-to-mount-a-folder-from-a-linux-machine-on-another-linux-machine
There are a few ways to skin the cat, but this one seemed easiest to our team. Good luck!
I would like for a user, without having to have an Amazon account, to be able to upload mutli-gigabyte files to an S3 bucket of mine.
How can I go about this? I want to enable a user to do this by giving them a key or perhaps through an upload form rather than making a bucket world-writeable obviously.
I'd prefer to use Python on my serverside, but the idea is that a user would need nothing more than their web browser or perhaps opening up their terminal and using built-in executables.
Any thoughts?
You are attempting to proxy the file thorough your python backend to S3, that too large files. Instead you can configure S3 to accept files from user directly (without proxying through your backend code).
It is explained here: Browser Uploads to S3 using HTML POST Forms. This way your server need not handle any upload load at all.
If you also want your users to use their elsewhere ID (google/FB etc) to achieve this workflow, that too is possible. They will be able to upload these files to a sub-folder (path) in your bucket without exposing other parts of your bucket. This is detailed here: Web Identity Federation with Mobile Applications. Though it says mobile, you can apply the same to webapps.
Having said all that, as #Ratan points out, large file uploads could break in between when you try from a browser and it cant retry "only the failed parts". This is where a dedicated app's need come in. Another option is to ask your users to keep the files in their Dropbox/BOX.com account and your server can read from there - these services already take care of large file upload with all retries etc using their apps.
This answer is relevant to .Net as language.
We had such requirement, where we had created an executable. The executable internally called a web method, which validated the app authenticated to upload files to AWS S3 or NOT.
You can do this using a web browser too, but I would not suggest this, if you are targeting big files.
I am developing a django application that allows users to upload photos and view them and these photos are stored as private in S3. Now everytime I have to show them the thumbnails, i generate a url and give it to the template. This process is really very slow and takes very long time.
I am hoping there is some other way that i havent explored, please help me out. I was hoping for something like x-sendfile, where i authenticate the user and than redirect it to S3. Please let me know if I am missing out anything
I forked sorl-thumbnail to make it fast with S3. My code is here sorl_thumbnail-async
But I came to know easy_thumbnails does exactly what I was trying to do, so I am using it in my current project. Sorl is not updated since last year, use easy_thumbnails with remote storages like S3. You might find useful my post on the topic here
[Edit]: sorl-thumbnail now has new maintainers and is updated with latest django releases.
You can use sorl-thumbnail to serve thumbnails with pluggable S3 backend support and memcached or redis for caching.
You might find this question helpful: Storing images and thumbnails on s3 in django