I am looking for a lib that lets me roughly:
connect to localhost:port, but see http://somesite.com
rewrite all static assets to point to localhost:port instead of somesite.com
support cookies / authentication
i know that http://betterinternet.co/ does this already, but they wont give me their source code for some reason.
I assume this doesnt exist as free code, so if i were to write one, are there any nuances to it? If i replace all occurrences of somesite.com in html and headers, will that be enough?
So...you want an http proxy that does link rewriting? Sounds like Apache and mod_proxy_html. It's not written in node or Python, but I think it will do what you're asking.
I don't see any straight forward solution to your problem. If I've understood correctly, you want a caching HTTP proxy which serves static contents locally, with URL rewriting rules defined in Python (or nodejs). That's quite a task.
A caching HTTP proxy implementation is not trivial. So I'd use an existing implementation, such as Squid (or Apache if it does caching too).
You could then place a (relatively) simple HTTP server written in Python in front of that (e.g. based on BaseHTTPServer and urllib2) which performs the URL rewriting as you want them and forwards the requests to the proxy (or direct to internet).
The idea would be to rely on the proxy setup to perform all the processing you don't want to modify (including basic rewrite rules, authentication, caching and cache management) and limit your front-end implementation to performing only the custom rewriting you are interested in.
Related
I have a REST (or almost REST) web api,
I want the API users to be able to use all the api, even if for some reason the can only make GET calls, so the plan is to accept a url parameter (query string) like request_method that can be GET (default) or POST, PUT, DELETE and I want to route them.
My question is other than the standard request handler overrides and checking in each httpRequestHandler in the get(self) method if this is meant to be a POST, PUT, DELETE and calling the appropriate functions, is there a way to do this "routing" in a more general way, like in the URL patterns in the application definition or overriding a routing function or something?
To make it clear, these requests are all coming over GET with a parameter for example like ?request_method=POST
Any suggestions is appreciated.
Possible Solutions:
only have a ".*" url pattern and handle all the routing in a single RequestHandler. Should work fine, except that I won't be taking advantage of the url pattern matching features of Tornado.
add an if to all the get(self) methods in all the request handlers and check if the request should be handled by get if not, call the relevant method.
This would be a very foolish thing to do. Both Chrome and Firefox, along with many other web user agents, will speculatively fetch (GET) some or all of the links on a page, including your request_method=DELETE URLs. You will find your database has been emptied out just because someone was looking around. Do not deliberately break HTTP. GET is defined to be a "safe" method, meaning it's okay to GET any URL you like and nothing bad will happen.
EDIT for others in similar situations:
The OP says he is using JSONP and is in control of both the API server and the client web app. In such a case the ideal solution is Cross-Origin Resource Sharing (CORS, spec), although this technology requires IE8+, Firefox 3.5+, Safari 4+ or Chrome 3+. If you need to target earlier browsers, and you control both domains, I would recommend merging the content of the two domains at least for your own client web app. The api domain can remain for external clients, but they would be restricted by the CORS browser requirements.
I have implemented a very small application with Tornado, where HTTP GET Requests are used to perform actions. Now I would like to secure these requests. What would be a preferable way? Using .htaccess? How can I realize that?
It doesn't have to be for certain requests, it should be for all requests running on a certain port.
If you based your application on the Tornado "Hello World" example then you probably haven't, but you really should consider writing your application as a WSGI application. Tornado has no problem with that, and the advantage is that your application now will run under a multitude of other environments (Apache + mod_wsgi to name but one).
But how does that solve your original problem? Well, just Google "WSGI authentication middleware", it'll yield plenty of hits. Basically, what that entails is transparently 'wrapping' your WSGI-application in another, one allowing you to completely decouple that aspect of your application. If you're lucky, and one of hits turns out to be a perfect fit, you might get away with not witing any extra code at all.
Since you mentioned .htaccess: it is possible to have Apache do the authentication in an Apache/mod_wsgi configuration.
.htaccess files are not supported by Tornado as far as I know. Look into setting up basic authentication on Tornado. Something like this: https://gist.github.com/660185 is probably what you want. You'll need to store your own user credentials however as Tornado has no baked in support for that as apache does with .htaccess files.
I'm planning an iOS app that requires a server backend capable of efficiently serving image files and performing some dynamic operations based on the requests it gets (like reading and writing into a data store, such as Redis). I'm most comfortable with, and would thus prefer to write the backend in Python.
I've looked at a lot of Python web framework/server options, Flask, Bottle, static and Tornado among them. The common thread seems to be that either they support serving static files as a development-time convenience only, discouraging it in production, or are efficient static file servers but not really geared towards the dynamic framework-like side of things. This is not to say they couldn't function as the backend, but at a quick glance they all seem a bit awkward at it.
In short, I need a web framework that specializes in serving JPEGs instead of generating HTML. I'm pretty certain no such thing exists, but right now I'm hoping that someone could suggest a solution that works without bending the used Python applications in ways they are not meant for.
Specifications and practical requirements
The images I'd be serving to the clients live in the file system in a shallow directory hierarchy. The actual file names would be invisible to the clients. The server would essentially read the directory hierarchy at startup, assigning a numeric ID for each file, and would route the requests to controller methods that then actually serve the image files. Here are a few examples of ways the client would want to access the images in different circumstances:
Randomly (example URL path: /image/random)
Randomly, each file only once (/image/random_unique), produces some suitable non-200 HTTP status code when the files are exhausted
Sequentially in either direction (/image/0, /image/1, /image/2 etc.)
and so on. In addition, there would be URL endpoints for things like ratings, image info and other metadata, some client-specific information as well (the client would "register" with the server, so that needs some logic, too). This data would live in a Redis datastore, most likely.
All in all, the backend needs to be good at serving image/jpeg and application/json (which it would also generate). The scalability and concurrency requirements are modest, at least to start with (this is not an App Store app, going for ad-hoc or enterprise distribution).
I don't want the app to rely on redirects. That is, I don't want a model where a request to a URL would return a redirect to another URL that is backed by, say, nginx as a separate static file server, leaving only the image selection logic for the Python backend. Instead, a request to a URL from the client should always return image/jpeg, with metadata in custom HTTP headers where necessary. I specify this because it is a way of avoiding serving static files from Python that I thought of, and someone else might think of too ;-)
Given this information, what sort of solution would you consider a good choice, and why? Or is this something for which I need to code non-trivial extensions to existing projects?
EDIT: I've been thinking about this a bit more. I don't want redirects due to the delay inherent in the multiple requests they entail, plus I'd like to abstract out the file names from the client, but I was wondering if something like this would be possible:
It's pretty self-explanatory, but the idea is that the Python program is given the request info by nginx (or whatever serves the role), mulls it over and then tells nginx to respond to the client's request with a specific file from the file system. It does so. The client is none the wiser about how the request was fulfilled, it just receives a response with the correct content type.
This would be pretty optimal in my view, but is it possible? If not with nginx, perhaps something else?
I've been using Django for well over a year now, and it is the hammer I use for all my nails. You could probably do this with a bit of database-image storage and django's builtin orm and url routing (with regex). If you store the images in the database, you will automatically get the unique-id's set. According to this stackoverflow answer, you can use redis with django.
I don't want a model where a request to a URL would return a redirect to another URL that is backed by, say, nginx as a separate static file server, leaving only the image selection logic for the Python backend.
I think Nginx for serving static and python for figuring out the image url is the better solution.
Still if you do not want to do that I would suggest you use any Python web framework (like Django) and write your models and convert them into REST resources (Eg. Using django-tastypie) and/or return a base64 encoded image which you can then decode in your iOS client.
Refs:
Decoding a Base64 image
TastyPie returns the path as default, you might have to do extra work to either store the image blob in the table or write more code to return a base64 encoded image string
You might want to look at one of the async servers like Tornado or Twisted.
I've been doing some research regarding file downloads with access control, using Django. My goal is to completely block access to a file, except when accessed by a specific user. I've read that when using Django, X-Sendfile is one of the methods of choice for achieving this (based on other SO questions, etc). My rudimentary understanding of using X-Sendfile with Django is:
User requests URI to get a protected file
Django app decides which file to return based on URL, and checks user permission, etc.
Django app returns an HTTP Response with the 'X-Sendfile' header set to the server's file path
The web server finds the file and returns it to the requester (I assume the webs server also strips out the 'X-Sendfile' header along the way)
Compared with chucking the file directly from Django, X-Sendfile seems likely to be a more efficient method of achieving protected downloads (since I can rely on Nginx to serve files, vs Django), but leaves 2 questions for me:
Is my explanation of X-Sendfile at least abstractly correct?
Is it really secure, assuming I don't provide normal, front-end HTTP access (e.g. http://www.example.com/downloads/secret-file.jpg) to the directory that the file is stored (ie, don't keep it in my public_html directory)? Or, could a tech-savvy user examine headers, etc. and reverse engineer a way to access a file (to then distribute)?
Is it really a big difference in performance. Am I going to bog my application server down by providing 8b chunked downloads of 150Mb files directly from Django, or is this sort-of a non-issue? The reason I ask is because if both versions are near equal, the Django version would be preferable due to my ability to do things in Python, like log the number of completed downloads, tally bandwidth of downloads etc.
Thanks in advance.
Yes, that's just how it works.
The exact implementation depends on the webserver but in the case of nginx, it's recommended to mark the location as internal to prevent external access.
Nginx can asynchronously serve files while with Django you need one thread per request which can get problematic for higher numbers of parallel requests.
Remember to send a X-Accel-Redirect header for nginx instead of X-Sendfile.
See http://wiki.nginx.org/XSendfile for more information.
I am currently working on a project to create simple file uploader site that will update the user of the progress of an upload.
I've been attempting this in pure python (with CGI) on the server side but to get the progress of the file I obviously need send requests to the server continually. I was looking to use AJAX to do this but I was wondering how hard it would be to, instead of changing to some other framerwork (web.py for instance), just write my own web server for receiving the XML HTTP Requests?
My main problem is that sending the request is done from HTML and Javascript so it all seems like magic trickery at the moment.
Can anyone advise me as to the best way to go about receiving these requests on the server?
EDIT: It seems that a framework would be the way to go. Would web.py be a good route to take?
I would recommend to use a microframework like Sinatra for Ruby. There seem to be some equivalents for Python. What python equivalent of Sinatra would you recommend?
Such a framework allows you to simply map a single method to a route.
Writing a very basic HTTP server won't be very hard (see http://docs.python.org/library/simplehttpserver.html for an example), but you will be missing many features that are provided by real servers and web frameworks.
For your project, I suggest you pick one of the many Python web frameworks and run your application behind Apache/mod_wsgi.
There's absolutely no need to write your own web server. Plenty of options exist, including lightweight ones like nginx.
You should use one of those, and either your own custom WSGI code to receive the request, or (better) one of the microframeworks like Flask or Bottle.