Django - Understanding X-Sendfile

Django - Understanding X-Sendfile - python

I've been doing some research regarding file downloads with access control, using Django. My goal is to completely block access to a file, except when accessed by a specific user. I've read that when using Django, X-Sendfile is one of the methods of choice for achieving this (based on other SO questions, etc). My rudimentary understanding of using X-Sendfile with Django is:
User requests URI to get a protected file
Django app decides which file to return based on URL, and checks user permission, etc.
Django app returns an HTTP Response with the 'X-Sendfile' header set to the server's file path
The web server finds the file and returns it to the requester (I assume the webs server also strips out the 'X-Sendfile' header along the way)
Compared with chucking the file directly from Django, X-Sendfile seems likely to be a more efficient method of achieving protected downloads (since I can rely on Nginx to serve files, vs Django), but leaves 2 questions for me:
Is my explanation of X-Sendfile at least abstractly correct?
Is it really secure, assuming I don't provide normal, front-end HTTP access (e.g. http://www.example.com/downloads/secret-file.jpg) to the directory that the file is stored (ie, don't keep it in my public_html directory)? Or, could a tech-savvy user examine headers, etc. and reverse engineer a way to access a file (to then distribute)?
Is it really a big difference in performance. Am I going to bog my application server down by providing 8b chunked downloads of 150Mb files directly from Django, or is this sort-of a non-issue? The reason I ask is because if both versions are near equal, the Django version would be preferable due to my ability to do things in Python, like log the number of completed downloads, tally bandwidth of downloads etc.
Thanks in advance.

Yes, that's just how it works.
The exact implementation depends on the webserver but in the case of nginx, it's recommended to mark the location as internal to prevent external access.
Nginx can asynchronously serve files while with Django you need one thread per request which can get problematic for higher numbers of parallel requests.
Remember to send a X-Accel-Redirect header for nginx instead of X-Sendfile.
See http://wiki.nginx.org/XSendfile for more information.

Related

Flask App Dispatching: Multiple uWSGI instances or dispatch to single instance.

I'm working on a Flask application, in which we will have multiple clients (10-20) each with their own configuration (for the DB, client specific settings etc.) Each client will have a subdomain, like www.client1.myapp.com, www.cleint2.myapp.com. I'm using uWSGI as the middleware and nginx as the proxy server.
There are a couple of ways to deploy this I suppose, one would be to use application dispatching, and a single instance of uwsgi. Another way would be to just run a separate uwsgi instance for each client, and forward the traffic with nginx based on subdomain to the right app. Does anyone know of the pros and cons of each scenario? Just curious, how do applications like Jira handle this?

I would recommend having multiple instances, forwarded to by nginx. I'm doing something similar with a PHP application, and it works very well.
The reason, and benefit of doing it this way is that you can keep everything completely separate, and if one client's setup goes screwy, you can re-instance it and there's no problem for anyone else. Also, no user, even if they manage to break the application level security can access any other user's data.
I keep all clients on their own databases (one mysql instance, multiple dbs), so I can do a complete sqldump (if using mysql, etc) or for another application which uses sqlite rather than mysql: copy the .sqlite database completely for backup.
Going this way means you can also easily set up a 'test' version of a client's site, as well as as live one. Then you can swap which one is actually live just by changing your nginx settings. Say for doing upgrades, you can upgrade the testing one, check it's all OK, then swap. (Also, for some applications, the client may like having their own 'testing' version, which they can break to their hearts content, and know they (or you) can reinstance it in moments, without harming their 'real' data).
Going with application dispatching, you cannot easily get nginx to serve separate client upload directories, without having a separate nginx config per client (and if you're doing that, then why not go for individual uWSGI instances anyway). Likewise for individual SSL certificates (if you want that...).
Each subdomain (or separate domain entirely for some) has it's own logging, so if a certain client is being DOS'd, or hacked otherwise, it's easy to see which one.
You can set up file-system level size quotas per user, so that if one client starts uploading gBs of video, your server doesn't get filled up as well.
The way I'm working is using ansible to provision and set up the server how I want it, with the client specific details kept in a separate host_var file. So my inventory is:
[servers]
myapp.com #or whatever...
[application_clients]
apples
pears
elephants
[application_clients:vars]
ansible_address=myapp.com
then host_vars/apples:
url=apples.myapp.com
db_user=apples
db_pass=secret
then in the provisioning, I set up a two new users & one group for each client. For instance: apples, web.apples as the two users, and the group simply as apples (which both are in).
This way, all the application code is owned by apples user, but the PHP-FPM instance (or uWSGI instance in your case) is run by web.apples. The permissions of all the code is rwXr-X---, and the permissions of uploads & static directories is rwXrwXr-X. Nginx runs as it's own user, so it can access ONLY the upload/static directories, which it can serve as straight files. Any private files which you want to be served by the uWSGI app can be set that way easily. The web user can read the code, and execute it, but cannot edit it. The actual user itself can read and write to the code, but isn't normally used, except for updates, installing plugins, etc.
I can give out a SFTP user to a client which is chroot'd to their uploads directory if they want to upload outside of the application interface.
Using ansible, or another provisioning system, means there's very little work needed to create a new client setup, and if a client (for whatever reason) wants to move over to their own server, it's just a couple of lines to change in the provisioning details, and re-run the scripts. It also means I can keep a development server installed with the exact same provisioning as the main server, and also I can keep a backup amazon instance on standby which is ready to take over if ever I need it to.
I realise this doesn't exactly answer your question about pros and cons each way, but it may be helpful anyway. Multiple instances of uWSGI or any other WSGI server (mainly I use waitress, but there are plenty of good ones) are very simple to set up and if done logically, with a good provisioning system, easy to administrate.

nodejs or python proxy lib with relative url support

I am looking for a lib that lets me roughly:
connect to localhost:port, but see http://somesite.com
rewrite all static assets to point to localhost:port instead of somesite.com
support cookies / authentication
i know that http://betterinternet.co/ does this already, but they wont give me their source code for some reason.
I assume this doesnt exist as free code, so if i were to write one, are there any nuances to it? If i replace all occurrences of somesite.com in html and headers, will that be enough?

So...you want an http proxy that does link rewriting? Sounds like Apache and mod_proxy_html. It's not written in node or Python, but I think it will do what you're asking.

I don't see any straight forward solution to your problem. If I've understood correctly, you want a caching HTTP proxy which serves static contents locally, with URL rewriting rules defined in Python (or nodejs). That's quite a task.
A caching HTTP proxy implementation is not trivial. So I'd use an existing implementation, such as Squid (or Apache if it does caching too).
You could then place a (relatively) simple HTTP server written in Python in front of that (e.g. based on BaseHTTPServer and urllib2) which performs the URL rewriting as you want them and forwards the requests to the proxy (or direct to internet).
The idea would be to rely on the proxy setup to perform all the processing you don't want to modify (including basic rewrite rules, authentication, caching and cache management) and limit your front-end implementation to performing only the custom rewriting you are interested in.

An "image-serving web framework" in Python?

I'm planning an iOS app that requires a server backend capable of efficiently serving image files and performing some dynamic operations based on the requests it gets (like reading and writing into a data store, such as Redis). I'm most comfortable with, and would thus prefer to write the backend in Python.
I've looked at a lot of Python web framework/server options, Flask, Bottle, static and Tornado among them. The common thread seems to be that either they support serving static files as a development-time convenience only, discouraging it in production, or are efficient static file servers but not really geared towards the dynamic framework-like side of things. This is not to say they couldn't function as the backend, but at a quick glance they all seem a bit awkward at it.
In short, I need a web framework that specializes in serving JPEGs instead of generating HTML. I'm pretty certain no such thing exists, but right now I'm hoping that someone could suggest a solution that works without bending the used Python applications in ways they are not meant for.
Specifications and practical requirements
The images I'd be serving to the clients live in the file system in a shallow directory hierarchy. The actual file names would be invisible to the clients. The server would essentially read the directory hierarchy at startup, assigning a numeric ID for each file, and would route the requests to controller methods that then actually serve the image files. Here are a few examples of ways the client would want to access the images in different circumstances:
Randomly (example URL path: /image/random)
Randomly, each file only once (/image/random_unique), produces some suitable non-200 HTTP status code when the files are exhausted
Sequentially in either direction (/image/0, /image/1, /image/2 etc.)
and so on. In addition, there would be URL endpoints for things like ratings, image info and other metadata, some client-specific information as well (the client would "register" with the server, so that needs some logic, too). This data would live in a Redis datastore, most likely.
All in all, the backend needs to be good at serving image/jpeg and application/json (which it would also generate). The scalability and concurrency requirements are modest, at least to start with (this is not an App Store app, going for ad-hoc or enterprise distribution).
I don't want the app to rely on redirects. That is, I don't want a model where a request to a URL would return a redirect to another URL that is backed by, say, nginx as a separate static file server, leaving only the image selection logic for the Python backend. Instead, a request to a URL from the client should always return image/jpeg, with metadata in custom HTTP headers where necessary. I specify this because it is a way of avoiding serving static files from Python that I thought of, and someone else might think of too ;-)
Given this information, what sort of solution would you consider a good choice, and why? Or is this something for which I need to code non-trivial extensions to existing projects?
EDIT: I've been thinking about this a bit more. I don't want redirects due to the delay inherent in the multiple requests they entail, plus I'd like to abstract out the file names from the client, but I was wondering if something like this would be possible:
It's pretty self-explanatory, but the idea is that the Python program is given the request info by nginx (or whatever serves the role), mulls it over and then tells nginx to respond to the client's request with a specific file from the file system. It does so. The client is none the wiser about how the request was fulfilled, it just receives a response with the correct content type.
This would be pretty optimal in my view, but is it possible? If not with nginx, perhaps something else?

I've been using Django for well over a year now, and it is the hammer I use for all my nails. You could probably do this with a bit of database-image storage and django's builtin orm and url routing (with regex). If you store the images in the database, you will automatically get the unique-id's set. According to this stackoverflow answer, you can use redis with django.

I don't want a model where a request to a URL would return a redirect to another URL that is backed by, say, nginx as a separate static file server, leaving only the image selection logic for the Python backend.
I think Nginx for serving static and python for figuring out the image url is the better solution.
Still if you do not want to do that I would suggest you use any Python web framework (like Django) and write your models and convert them into REST resources (Eg. Using django-tastypie) and/or return a base64 encoded image which you can then decode in your iOS client.
Refs:
Decoding a Base64 image
TastyPie returns the path as default, you might have to do extra work to either store the image blob in the table or write more code to return a base64 encoded image string

You might want to look at one of the async servers like Tornado or Twisted.

how can i redirect a user to a different server with python?

I'm trying to write a script which will monitor packets (using pypcap) and redirect certain URLs/IPs to something I choose. I know I could just edit the hosts file, but that won't work because I'm not an admin.
I'm thinking that CGI might be useful, but this one has really got me confused.
EDIT:
sorry if it sounded malicious or like a MITM attack. The reason I need this is because I have an (old) application which grabs a page from a site, but the domain has changed recently causing it to not function anymore. I didn't write the application, so I can't just change the domain it accesses.
I basically need to accomplish what can be done by editing the hosts file without having access to it.

pypcap needs administrative rights, so this is not an option.
And you don't have access to the pcs internals, to the source code or to the webserver.
There are a few options left:
Modify the host name in the applications files with a hexeditor and disassembler.
Modify the loaded application in memory with Cheat Engine and other memory tools.
Start the application in a virtual environment which can modify os api calls. A modified wine might be able to do this.
Modify the request between the pc and the webserver with a (transparent)proxy / modified router.
If the application supports the usage of proxies, it might be the easiest solution to set up a local squid with a redirector.

How to do cloud computing with Python and Java? Final Year project

For my final year project I plan to code a cloud in Python. The client will be written in Java by the other member of my team. The client will have a tabbed interface and it will provide a text editor, a media player, a couple of small Java based games and a maybe a few more services.
The server will work like this:
1) Validate the user.
2) Send a file, called "dump" to the user. Dump will contain all the file names and file types that the user created by himself or the files which the user can read/write. This info will be fetched from the database.
3) The tabs in the client will display the file types associated with the tab application. e.g the media tab will only select and show the media files from the dump readable by user. The text editor tab will show only the txt files from the dump readable by the user.
4) A request to open the file will send the file back to client, which the associated application will open.
5) All the changes made to the files and all the actions (overwriting, saving, deleting etc.) will be sent back to the server along with the new object. Something similar will be done to the newly created objects.
My Questions are:
What are the best approaches for the communication between the client and the server. For the dump I plan to use some sort of encrypted XML file. For the other way round, I don't have a clue :/.
For easy integration with the database, I was planning to use Django (which I started few days back). How can I send my requests from the client to the server (without Django I'd use SQL queries) and the files from the server to the client? Maybe GET and POST will work for the former problem? Any other suggestions?

Q1: how should I transfer data between client/server securely
A: HTTPS to support encryption & JSON to serialise objects between languages (Python/Java) seems to be the most natural. You could experiment with XML-RPC over SSL or TSL if you want to be creative.
Q2: How do I send queries to the server's db?
A: My first response is to say talk to the person coding the server, and see what's easiest on that end. However, I think that your client should stick to HTTP. The server developer would ensure the server supports RESTful URIs. Then your client only access a URI and have the results processed by the server.
At its most raw, this could be implemented like this:
https://www.example.com/db?q="SELECT * FROM docs"
There are smarter ways to do it, but you get the idea.

If you're going to use a web framework on the server, it makes sense to use an HTTP-based protocol. The downside is that only the client can initiate a connection (e.g., the client needs to first ask for the "dump" file), but a simple GET request will suffice (remember, the server can send anything in the HTTP response, including your XML file).
Regarding encryption, it's best to use an existing protocol like HTTPS. There are well-vetted libraries that will correctly establish a secure connection between your client and the server.
Overall, I'm advocating the highest-level protocols that are appropriate for your application. HTTP(S) goes hand-in-hand with your web-based architecture, so make use of it.

Stick to Django. It's really productive. I would use JSON instead of XML. More convenient. import json. This should help you in communicating between client-server.
Also cloud computing is just a recent word that's just thrown around for (client+server+some services). Oh by the way all that you want to do can be completely done in Django itself. No need to go to JAVA.
Django is Cool :)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.