Serving static files through MongoDB's GridFS

Serving static files through MongoDB's GridFS - python

I'm fairly new to Django and I'm trying to deploy a small hobby project on OpenShift. I know that there are certain conventions that recommend against letting Django serve static and media files because it's inefficient. I've also noticed that Django refuses to serve media files when DEBUG is turned off. That's why I'm looking at better ways to serve this content.
Django's documentation endorses CDNs like Amazon S3 as one of the best way to serve static, but as a hobbyist I'd rather stick to freemium solutions for now. I found out that MongoDB - another technology I'm fairly new to - provides GridFS as a storage backend. I can get free MongoDB storage through MongoLab, so this is looking interesting to me.
Would such a construction work in practice or is this crazy talk? If this is feasible, which changes would I need to make to my OpenShift environment and Django settings to let GridFS serve static content? I've seen alternative setups where people use CloudFlare's free CDN to serve their static content, but then I wouldn't be able to upload/access media files from my local development environment.

To make a long story short: it is a bad idea.
Here is why: in order to serve static files, you would first need to process the request, get the data from GridFS, which actually scatters the files in 255k chunks which would have to be collected (depending on the size, of course) and only then they could be returned.
What I tend to do is to use varnish to cache the static files served by the application, be it either django or a Servlet container. This works like this:
All requests are sent to varnish, which either serves the requested ressource from cache or hands then to a backend.
Varnish uses your django app as a backend. I usually have django run behind an additional lighttpd, though.
When a static file is returned by django, varnish puts this file into an in memory caching area. I usually have some 100M allocated for this - sometimes even less. The size of your static files should be well known to you. I tend to do a simple caching configuration like "cache all files ending with .css, .js, .png".
All subsequent requests for this static ressource will be served by varnish now - from memory, via the sendfile system call, not even hitting the backend.
All in all: this way, load is taken from your application, latency is reduced, the ressources are delivered lightning fast, it is easy to set up without any programming effort.
Edit: as for an OpenShift environment: simply leave it as is. Serving static files from MongoDB simply does not make sense.

Related

Why isn't Django serving staticfiles in production?

I am wondering the reason why Django does not serve the statifiles in production, when DEGUB = False.
STATICFILES_DIRS
We specify STATICFILES_DIRS to tell Django where to look for staticfiles that are tied up to a specified app.
STATIC_ROOT
We specify STATIC_ROOT to tell Django where to store the files once we run python manage.py collectstatic, so everystatic file is stored in the path specified in STATIC_ROOT.
Assume that we set STATIC_ROOT = "staticfiles/".
This means that once we run the collectstatic command, all the files that are inside STATICFILES_DIRS paths are going to be stored in "staticfiles/"
STATIC_URL
Finally we specify STATIC_URL as "prefix" to tell Djando where to look for staticfiles, for example in the HTML <link> tag, the url that we see is based on STATIC_URL value
When we upload our project to the server, we upload the entire project, so every single file. Why can't Django serve staticfiles itself when running on server?
As I just said, we upload the entire folder, so the files we uploaded are there (and the staticfiles too!).
QUESTIONS
I am just wondering, why do we have to specify the staticfiles based on server in production, when Django could do everything for us as it have always done in localhost?
Isn't load the files from another storage so much slower than load them from main folder of the project?

I am just wondering, why do we have to specify the staticfiles based on server in production, when Django could do everything for us as it have always done in localhost?
Because it is likely inefficient and insecure. Each time a request is made, the request passes through all middleware then the view will produce a response that will again pass through the middleware to the client. If you request the same file a second time, it will likely not have any caching, and thus repeat that process again. If you work with a webserver like Nginx/Apache, it will probably cache the result. If you work with a CDN, then it will also contact the nearest server and thus get access to these resources in a more efficient way.
Another problem is security. If you specify a path to a file that is not supposed to be served, then the webserver should prevent the browser from accessing that file. Some hackers for example try to access the source files of the browser to then look for vulnerabilities. This should not be possible. Likely a web server like Apache or Nginx will have more advanced security mechanisms for this in place.
If you really want to, you can use WhiteNoise to let Django serve static files and media files in production. This Django application has been optimized for security and efficiency. Although it is hard to tell if it will have the same level as aan Apache or Nginx server.
Isn't load the files from another storage so much slower than load them from main folder of the project?
The webserver will not contact the other storage: the browser will do that. It thus is possible that instead of the webserver, it will contact a CDN. It is possible that this is slightly less efficient, since a webbrowser usually reuses the open connection to the server to make more requests, but often you already contacted that CDN, for example for JavaScript files. Furthermore CDNs are optimized to deliver content as efficient as possible: the browser will usually contact a browerser close to the client, and usually there is also load balancing and redundancy in place to make it less likely that the server can no longer serve the resource.

Deployment of django app - Scaling, Static Files, Servers

I am working my way to deploy my django app on a Linux Server. Encountered with various problems and want someone to clarify that for me. I searched for days but the answer I found are either too broad or faded away from topic.
1) According to django docs, it is inefficient to serve static files locally. Does that means that all the static files including html,css,js files should be hosted on another server?
2) I have an AWS S3 bucket hosting all my media files, can I use that same bucket (or create a new bucket) to host my static files if above answer is yes. If so, is that efficient?
3) According to search, in order for django to scale horizontally, it should be a stateless app, does it means that I also have to host my database on a different location than my own Linux server ?

1) It is completely fine to host your staticfiles on the same server as your django application however to serve said files you should use a web server such as NGINX or Apache. Django was not designed to serve static data in a production environment. Nginx and Apache on the other hand do great job at it.
2) You can definitely host your static and media files inside an S3 bucket. This will scale a lot better than hosting them on a single server as they're provided by a separate entity, meaning that no matter how many application servers you're running behind a load balancer, all of them will be requesting staticfiles from the same source. To make it more efficient you can even configure AWS' CloudFront which is Amazons CDN (content delivery network).
3) Ideally your database should be hosted on a separate server. Databases are heavy on resources therefore hosting your database on the same server as your application may lead to slowdowns and sometimes outright crashes. Scaling horizontally, you'd be connecting a lot of application servers to a single database instance; effectively increasing the load on that server.
All of the points above are relative to your use case and resources. If the application you are running doesn't deal with heavy traffic - say a few hundred hits a day - and your server has an adequate amount of resources (RAM, CPU, storage) it's acceptable to run everything off a single box.
However if you're planning to accept tens of thousands of connections every day it's better to separate the responsibilities for optimum scalability. Not only it makes your application more efficient and responsive but it also makes your life easier in the long run when you need to scale further (database clustering, nearline content delivery, etc).
TL;DR: you can run everything off a single server if it's beefy enough but in the long run it'll make your life harder.

How to load local static files if CDN fails? (django)

I am using django to develop my site and I am trying to optimize my site for speed so I want to use CDN for my bootstrap and if it fails than i want to use the copy from my server, I have seen
How to load local files if CDN is not working
but it does it in javascript but it doesn't solve my problem, I want to know
how to check if CDN working with Django and if not serve the static files from server?

Do not try to do this in server-side. CDN services are built to be reliable in that they are geographically distributed and fault-tolerant and use the best practices available.
You can't find out if the CDN servers work for your user by pinging them from your Django application. Your user is located differently and might have very different network conditions, e.g. be using a mobile network connection from a different country, and have a network provider that experiences outages.
You could, indeed, ping the CDN servers, which would probably resolve into your Django application getting one CDN load balancer address and trying to see if that works for you or not, and falling back to others, if the CDN source is down. Then you would probably have to see, for every resource you have, that is every JavaScript and CSS file, if they are available, and load a local backup, if not. On the server side. This is very slow and error-prone. Networks can fail for a googolplex different reasons.
The proper way to go about this is to
Only use local servers for serving those static files, distribute the load with application servers that each have their own versioned copies of your static files. If your application server works, it should have the copies available as well;
Do the checks on the client-side, because server side queries will slow your server down to a halt if it is not close to your CDN network, and you generally do not wish to depend on any external resources on the server side;
Or, as I would recommend, set up your own CDN which serves your local files from a proxied URL or subdomain. Read more below.
Ideally, if you wish to use a reliable CDN source, you would set up a CDN server with redundancy on the same infrastructure you use to host your files in.
For your site is located in www.example.com, which is your Django application server address. You would set up cdn.example.com domain which would be a CDN service, for example CloudFront or similar, that proxies your requests to www.example.com/static/ and mirrors your static files as a CDN, taking the load off your application server. You can just define your Django application to use the http://cdn.example.com/static address for serving static files. There are multiple different services for providing a CDN for your application, CloudFront is just one option. This will get your static, CDNable files near to your user.
Ideally, your application servers and CDN servers are hosted on the same, redundant infrastructure, and you can claim that if one part of your infrastructure works the others will as well, or other your service provider is violating your SLA. You do NOT wish to use broken infrastructure and drive away your customers, or use hacks that will eventually break in production.

I don't know that there would be a good way of doing this, but this is the method I would use if the people who paid me really wanted me to make this work
You could setup custom URL tags in a separate pluggable app, and have it ping your CDN target, and then if it fails, serve a local URI. Admittedly, pinging the CDN target doesn't mean it will actually serve the file, so a more robust way would be to attempt to GET the file from the CDN provider, and and if successful, send the remote URI, and if it fails, send the local URI. This would double the traffic of your static files for every request.
This also requires you to setup static file serving just like you would if you planned to serve everything from that server. I wouldn't recommend any of this. I would recommend doing what #ceejayoz says and just using a reliable CDN. That's their whole purpose in life is to prevent doing any of this.

This is achievable, but the setup might be a little tedious.
Basically you are trying to do failover between CDN and your origin server. If CDN fails, the request fails over to your origin server. One option is to use DNS level failover with primary to CDN CNAME, backup to your origin server hostname.
And you also include healthcheck in your DNS setup for both CDN and origin server. Once healthcheck fails for CDN, DNS should fail over to your origin server and serve static file from there automatically.

Is it a good practice to save images in the backend database in mysql/django?

I am trying to create my personal web page. So in that I needed to put in the recommendations panel , which contains recommendations by ex employees/friends etc.
So I was planning to create a model in django with following attributes:-
author_name
author_designation
author_image
author_comments
I have following questions in my mind related to image part:-
Is it good practice to store images in the backend database?(database is for structured information from what i understand)
How to store images so that scaling the content and managing it becomes really easy?

in short: no.
use Django's built in ImageField and have your webserver serve the files from disk.
Alternatively you can use ImageField with a custom storage backend such as django-storages and put the files up on e.g. Amazon S3 and have them served from there (maybe adding something like CloudFront CDN in front)

No.Not good, especially as it scales.
https://docs.djangoproject.com/en/1.9/howto/static-files/deployment/#serving-static-files-in-production
When you think about what happens in the request/response cycle you'll know that your python scripts get interpreted by some modules. So if you're using apache for instance, mod_wsgi could be doing this work.
Usually, you don't want your static files being served by this same process because that is not very efficient, static files being static. In a typical scenario, you'll want a very fast web server, say nginx serving your static content without "thinking". This delegation gives a very efficient and scalable design. As #Anentropic said, you could choose to host static media on a CDN.

The best way to do this is to store the images in your server in some specific, general folder for this images. After that you store a string in your DB with the path to the image that you want to load. This will be a more efficient way to do this.

An "image-serving web framework" in Python?

I'm planning an iOS app that requires a server backend capable of efficiently serving image files and performing some dynamic operations based on the requests it gets (like reading and writing into a data store, such as Redis). I'm most comfortable with, and would thus prefer to write the backend in Python.
I've looked at a lot of Python web framework/server options, Flask, Bottle, static and Tornado among them. The common thread seems to be that either they support serving static files as a development-time convenience only, discouraging it in production, or are efficient static file servers but not really geared towards the dynamic framework-like side of things. This is not to say they couldn't function as the backend, but at a quick glance they all seem a bit awkward at it.
In short, I need a web framework that specializes in serving JPEGs instead of generating HTML. I'm pretty certain no such thing exists, but right now I'm hoping that someone could suggest a solution that works without bending the used Python applications in ways they are not meant for.
Specifications and practical requirements
The images I'd be serving to the clients live in the file system in a shallow directory hierarchy. The actual file names would be invisible to the clients. The server would essentially read the directory hierarchy at startup, assigning a numeric ID for each file, and would route the requests to controller methods that then actually serve the image files. Here are a few examples of ways the client would want to access the images in different circumstances:
Randomly (example URL path: /image/random)
Randomly, each file only once (/image/random_unique), produces some suitable non-200 HTTP status code when the files are exhausted
Sequentially in either direction (/image/0, /image/1, /image/2 etc.)
and so on. In addition, there would be URL endpoints for things like ratings, image info and other metadata, some client-specific information as well (the client would "register" with the server, so that needs some logic, too). This data would live in a Redis datastore, most likely.
All in all, the backend needs to be good at serving image/jpeg and application/json (which it would also generate). The scalability and concurrency requirements are modest, at least to start with (this is not an App Store app, going for ad-hoc or enterprise distribution).
I don't want the app to rely on redirects. That is, I don't want a model where a request to a URL would return a redirect to another URL that is backed by, say, nginx as a separate static file server, leaving only the image selection logic for the Python backend. Instead, a request to a URL from the client should always return image/jpeg, with metadata in custom HTTP headers where necessary. I specify this because it is a way of avoiding serving static files from Python that I thought of, and someone else might think of too ;-)
Given this information, what sort of solution would you consider a good choice, and why? Or is this something for which I need to code non-trivial extensions to existing projects?
EDIT: I've been thinking about this a bit more. I don't want redirects due to the delay inherent in the multiple requests they entail, plus I'd like to abstract out the file names from the client, but I was wondering if something like this would be possible:
It's pretty self-explanatory, but the idea is that the Python program is given the request info by nginx (or whatever serves the role), mulls it over and then tells nginx to respond to the client's request with a specific file from the file system. It does so. The client is none the wiser about how the request was fulfilled, it just receives a response with the correct content type.
This would be pretty optimal in my view, but is it possible? If not with nginx, perhaps something else?

I've been using Django for well over a year now, and it is the hammer I use for all my nails. You could probably do this with a bit of database-image storage and django's builtin orm and url routing (with regex). If you store the images in the database, you will automatically get the unique-id's set. According to this stackoverflow answer, you can use redis with django.

I don't want a model where a request to a URL would return a redirect to another URL that is backed by, say, nginx as a separate static file server, leaving only the image selection logic for the Python backend.
I think Nginx for serving static and python for figuring out the image url is the better solution.
Still if you do not want to do that I would suggest you use any Python web framework (like Django) and write your models and convert them into REST resources (Eg. Using django-tastypie) and/or return a base64 encoded image which you can then decode in your iOS client.
Refs:
Decoding a Base64 image
TastyPie returns the path as default, you might have to do extra work to either store the image blob in the table or write more code to return a base64 encoded image string

You might want to look at one of the async servers like Tornado or Twisted.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.