I am working my way to deploy my django app on a Linux Server. Encountered with various problems and want someone to clarify that for me. I searched for days but the answer I found are either too broad or faded away from topic.
1) According to django docs, it is inefficient to serve static files locally. Does that means that all the static files including html,css,js files should be hosted on another server?
2) I have an AWS S3 bucket hosting all my media files, can I use that same bucket (or create a new bucket) to host my static files if above answer is yes. If so, is that efficient?
3) According to search, in order for django to scale horizontally, it should be a stateless app, does it means that I also have to host my database on a different location than my own Linux server ?
1) It is completely fine to host your staticfiles on the same server as your django application however to serve said files you should use a web server such as NGINX or Apache. Django was not designed to serve static data in a production environment. Nginx and Apache on the other hand do great job at it.
2) You can definitely host your static and media files inside an S3 bucket. This will scale a lot better than hosting them on a single server as they're provided by a separate entity, meaning that no matter how many application servers you're running behind a load balancer, all of them will be requesting staticfiles from the same source. To make it more efficient you can even configure AWS' CloudFront which is Amazons CDN (content delivery network).
3) Ideally your database should be hosted on a separate server. Databases are heavy on resources therefore hosting your database on the same server as your application may lead to slowdowns and sometimes outright crashes. Scaling horizontally, you'd be connecting a lot of application servers to a single database instance; effectively increasing the load on that server.
All of the points above are relative to your use case and resources. If the application you are running doesn't deal with heavy traffic - say a few hundred hits a day - and your server has an adequate amount of resources (RAM, CPU, storage) it's acceptable to run everything off a single box.
However if you're planning to accept tens of thousands of connections every day it's better to separate the responsibilities for optimum scalability. Not only it makes your application more efficient and responsive but it also makes your life easier in the long run when you need to scale further (database clustering, nearline content delivery, etc).
TL;DR: you can run everything off a single server if it's beefy enough but in the long run it'll make your life harder.
Related
We have deployed a django server (nginx/gunicorn/django) but to scale the server there are multiple instances of same django application running.
Here is the diagram (architecture):
Each blue rectangle is a Virtual Machine.
HAProxy sends all request to example.com/admin to Server 3.other requests are divided between Server 1 and Server 2.(load balance).
Old Problem:
Each machine has a media folder and when admin Uploads something the uploaded media is only on Server 3. (normal users can't upload anything)
We solved this by sending all requests to example.com/media/* to Server 3 and nginx from Server3 serves all static files and media.
Problem right now
We are also using sorl-thumbnail.
When a requests comes for example.com/,sorl-thumbnail tries to access the media file but it doesn't exist on this machine because it's on Server3.
So now all requests to that machine(server 1 or 2) get 404 for that media file.
One solution that comes to mind is to make a shared partition between all 3 machines and use it as media.
Another solution is to sync all media folders after each upload but this solution has problem and that is we have almost 2000 requests per second and sometimes sync might not be fast enough and sorl-thumbnail creates the database record of empty file and 404 happens.
Thanks in advance and sorry for long question.
You should use an object store to save and serve your user uploaded files. django-storages makes the implementation really simple.
If you don’t want to use cloud based AWS S3 or equivalent, you can host your own on-prem S3 compatible object store with minio.
On your current setup I don’t see any easy way to fix where the number of vm s are dynamic depending on load.
If you have deployment automation then maybe try out rsync so that the vm takes care of syncing files with other vms.
Question: What was the problem?
we got 404 on other machines because normal requests (requests asking for a template) would get a 404 not found on thumbnail media.
real problem was with sorl-thumbnail template tags.
Here is what we ended up doing:
In models that needed a thumbnail, we added functions to create that specific thumbnail.
and using a post-save signal in the admin machine called all those functions to make sure all the thumbnails were created after save and the table for sorl-thumbnail is filled.
now in templates instead of calling sorl-thumbnail template tags now we call a function in model.
The app I am currently hosting on Heroku allows users to submit photos. Initially, I was thinking about storing those photos on the filesystem, as storing them in the database is apparently bad practice.
However, it seems there is no permanent filesystem on Heroku, only an ephemeral one. Is this true and, if so, what are my options with regards to storing photos and other files?
It is true. Heroku allows you to create cloud apps, but those cloud apps are not "permanent" - they are instances (or "slugs") that can be replicated multiple times on Amazon's EC2 (that's why scaling is so easy with Heroku). If you were to push a new version of your app, then the slug will be recompiled, and any files you had saved to the filesystem in the previous instance would be lost.
Your best bet (whether on Heroku or otherwise) is to save user submitted photos to a CDN. Since you are on Heroku, and Heroku uses AWS, I'd recommend Amazon S3, with optionally enabling CloudFront.
This is beneficial not only because it gets around Heroku's ephemeral "limitation", but also because a CDN is much faster, and will provide a better service for your webapp and experience for your users.
Depending on the technology you're using, your best bet is likely to stream the uploads to S3 (Amazon's storage service). You can interact with S3 with a client library to make it simple to post and retrieve the files. Boto is an example client library for Python - they exist for all popular languages.
Another thing to keep in mind is that Heroku file systems are not shared either. This means you'll have to be putting the file to S3 with the same application as the one handling the upload (instead of say, a worker process). If you can, try to load the upload into memory, never write it to disk and post directly to S3. This will increase the speed of your uploads.
Because Heroku is hosted on AWS, the streams to S3 happen at a very high speed. Keep that in mind when you're developing locally.
I am using django to develop my site and I am trying to optimize my site for speed so I want to use CDN for my bootstrap and if it fails than i want to use the copy from my server, I have seen
How to load local files if CDN is not working
but it does it in javascript but it doesn't solve my problem, I want to know
how to check if CDN working with Django and if not serve the static files from server?
Do not try to do this in server-side. CDN services are built to be reliable in that they are geographically distributed and fault-tolerant and use the best practices available.
You can't find out if the CDN servers work for your user by pinging them from your Django application. Your user is located differently and might have very different network conditions, e.g. be using a mobile network connection from a different country, and have a network provider that experiences outages.
You could, indeed, ping the CDN servers, which would probably resolve into your Django application getting one CDN load balancer address and trying to see if that works for you or not, and falling back to others, if the CDN source is down. Then you would probably have to see, for every resource you have, that is every JavaScript and CSS file, if they are available, and load a local backup, if not. On the server side. This is very slow and error-prone. Networks can fail for a googolplex different reasons.
The proper way to go about this is to
Only use local servers for serving those static files, distribute the load with application servers that each have their own versioned copies of your static files. If your application server works, it should have the copies available as well;
Do the checks on the client-side, because server side queries will slow your server down to a halt if it is not close to your CDN network, and you generally do not wish to depend on any external resources on the server side;
Or, as I would recommend, set up your own CDN which serves your local files from a proxied URL or subdomain. Read more below.
Ideally, if you wish to use a reliable CDN source, you would set up a CDN server with redundancy on the same infrastructure you use to host your files in.
For your site is located in www.example.com, which is your Django application server address. You would set up cdn.example.com domain which would be a CDN service, for example CloudFront or similar, that proxies your requests to www.example.com/static/ and mirrors your static files as a CDN, taking the load off your application server. You can just define your Django application to use the http://cdn.example.com/static address for serving static files. There are multiple different services for providing a CDN for your application, CloudFront is just one option. This will get your static, CDNable files near to your user.
Ideally, your application servers and CDN servers are hosted on the same, redundant infrastructure, and you can claim that if one part of your infrastructure works the others will as well, or other your service provider is violating your SLA. You do NOT wish to use broken infrastructure and drive away your customers, or use hacks that will eventually break in production.
I don't know that there would be a good way of doing this, but this is the method I would use if the people who paid me really wanted me to make this work
You could setup custom URL tags in a separate pluggable app, and have it ping your CDN target, and then if it fails, serve a local URI. Admittedly, pinging the CDN target doesn't mean it will actually serve the file, so a more robust way would be to attempt to GET the file from the CDN provider, and and if successful, send the remote URI, and if it fails, send the local URI. This would double the traffic of your static files for every request.
This also requires you to setup static file serving just like you would if you planned to serve everything from that server. I wouldn't recommend any of this. I would recommend doing what #ceejayoz says and just using a reliable CDN. That's their whole purpose in life is to prevent doing any of this.
This is achievable, but the setup might be a little tedious.
Basically you are trying to do failover between CDN and your origin server. If CDN fails, the request fails over to your origin server. One option is to use DNS level failover with primary to CDN CNAME, backup to your origin server hostname.
And you also include healthcheck in your DNS setup for both CDN and origin server. Once healthcheck fails for CDN, DNS should fail over to your origin server and serve static file from there automatically.
I'm fairly new to Django and I'm trying to deploy a small hobby project on OpenShift. I know that there are certain conventions that recommend against letting Django serve static and media files because it's inefficient. I've also noticed that Django refuses to serve media files when DEBUG is turned off. That's why I'm looking at better ways to serve this content.
Django's documentation endorses CDNs like Amazon S3 as one of the best way to serve static, but as a hobbyist I'd rather stick to freemium solutions for now. I found out that MongoDB - another technology I'm fairly new to - provides GridFS as a storage backend. I can get free MongoDB storage through MongoLab, so this is looking interesting to me.
Would such a construction work in practice or is this crazy talk? If this is feasible, which changes would I need to make to my OpenShift environment and Django settings to let GridFS serve static content? I've seen alternative setups where people use CloudFlare's free CDN to serve their static content, but then I wouldn't be able to upload/access media files from my local development environment.
To make a long story short: it is a bad idea.
Here is why: in order to serve static files, you would first need to process the request, get the data from GridFS, which actually scatters the files in 255k chunks which would have to be collected (depending on the size, of course) and only then they could be returned.
What I tend to do is to use varnish to cache the static files served by the application, be it either django or a Servlet container. This works like this:
All requests are sent to varnish, which either serves the requested ressource from cache or hands then to a backend.
Varnish uses your django app as a backend. I usually have django run behind an additional lighttpd, though.
When a static file is returned by django, varnish puts this file into an in memory caching area. I usually have some 100M allocated for this - sometimes even less. The size of your static files should be well known to you. I tend to do a simple caching configuration like "cache all files ending with .css, .js, .png".
All subsequent requests for this static ressource will be served by varnish now - from memory, via the sendfile system call, not even hitting the backend.
All in all: this way, load is taken from your application, latency is reduced, the ressources are delivered lightning fast, it is easy to set up without any programming effort.
Edit: as for an OpenShift environment: simply leave it as is. Serving static files from MongoDB simply does not make sense.
I'm working on a Flask application, in which we will have multiple clients (10-20) each with their own configuration (for the DB, client specific settings etc.) Each client will have a subdomain, like www.client1.myapp.com, www.cleint2.myapp.com. I'm using uWSGI as the middleware and nginx as the proxy server.
There are a couple of ways to deploy this I suppose, one would be to use application dispatching, and a single instance of uwsgi. Another way would be to just run a separate uwsgi instance for each client, and forward the traffic with nginx based on subdomain to the right app. Does anyone know of the pros and cons of each scenario? Just curious, how do applications like Jira handle this?
I would recommend having multiple instances, forwarded to by nginx. I'm doing something similar with a PHP application, and it works very well.
The reason, and benefit of doing it this way is that you can keep everything completely separate, and if one client's setup goes screwy, you can re-instance it and there's no problem for anyone else. Also, no user, even if they manage to break the application level security can access any other user's data.
I keep all clients on their own databases (one mysql instance, multiple dbs), so I can do a complete sqldump (if using mysql, etc) or for another application which uses sqlite rather than mysql: copy the .sqlite database completely for backup.
Going this way means you can also easily set up a 'test' version of a client's site, as well as as live one. Then you can swap which one is actually live just by changing your nginx settings. Say for doing upgrades, you can upgrade the testing one, check it's all OK, then swap. (Also, for some applications, the client may like having their own 'testing' version, which they can break to their hearts content, and know they (or you) can reinstance it in moments, without harming their 'real' data).
Going with application dispatching, you cannot easily get nginx to serve separate client upload directories, without having a separate nginx config per client (and if you're doing that, then why not go for individual uWSGI instances anyway). Likewise for individual SSL certificates (if you want that...).
Each subdomain (or separate domain entirely for some) has it's own logging, so if a certain client is being DOS'd, or hacked otherwise, it's easy to see which one.
You can set up file-system level size quotas per user, so that if one client starts uploading gBs of video, your server doesn't get filled up as well.
The way I'm working is using ansible to provision and set up the server how I want it, with the client specific details kept in a separate host_var file. So my inventory is:
[servers]
myapp.com #or whatever...
[application_clients]
apples
pears
elephants
[application_clients:vars]
ansible_address=myapp.com
then host_vars/apples:
url=apples.myapp.com
db_user=apples
db_pass=secret
then in the provisioning, I set up a two new users & one group for each client. For instance: apples, web.apples as the two users, and the group simply as apples (which both are in).
This way, all the application code is owned by apples user, but the PHP-FPM instance (or uWSGI instance in your case) is run by web.apples. The permissions of all the code is rwXr-X---, and the permissions of uploads & static directories is rwXrwXr-X. Nginx runs as it's own user, so it can access ONLY the upload/static directories, which it can serve as straight files. Any private files which you want to be served by the uWSGI app can be set that way easily. The web user can read the code, and execute it, but cannot edit it. The actual user itself can read and write to the code, but isn't normally used, except for updates, installing plugins, etc.
I can give out a SFTP user to a client which is chroot'd to their uploads directory if they want to upload outside of the application interface.
Using ansible, or another provisioning system, means there's very little work needed to create a new client setup, and if a client (for whatever reason) wants to move over to their own server, it's just a couple of lines to change in the provisioning details, and re-run the scripts. It also means I can keep a development server installed with the exact same provisioning as the main server, and also I can keep a backup amazon instance on standby which is ready to take over if ever I need it to.
I realise this doesn't exactly answer your question about pros and cons each way, but it may be helpful anyway. Multiple instances of uWSGI or any other WSGI server (mainly I use waitress, but there are plenty of good ones) are very simple to set up and if done logically, with a good provisioning system, easy to administrate.