Rerouting request via middleware in django

Rerouting request via middleware in django - python

I have a java server sending out a request where the path contains the hostname.
OTOH I've got an ancient django server (1.8) that figures (wrongly) that the hostname is part of the path, and always returns a 404.
Assuming that sending properly formed requests from java side is not possible, how would I go about modifying the request URL to something that can be properly redirected to, without actually redirecting? (a 30x is also unacceptable in this case).

I ended up doing a rewrite in nginx (using proxy_pass).
I couldn't solve this at the application level in any feasible way.

Related

Why Flask gives 404 for files in the same directory with my application

Let's say I have a domain. Under home directory of the domain I have a text(.txt) file called note.txt. Like below
https://www.example.com/note.txt
When I access the url, browser display text string contained inside the file. But when I run a Flask under that domain instead of a traditional html,css,javascript,php app, server return 404error even though the file exists in fact at the same location. I can see this from the ftp client.
So why does the server returns 404 error when the site hosts a python app instead of the more traditional html,css,javascript,php app?

What you are missing here is that Flask has its own URL routing.
Python code must be exposed through app.route decorator
Static files, like your note.txt can be served through Flask, but they are often handled by the front end web server (Nginx, Apache) through their configuration
The answer for "why does the server returns 404 error" is that URL routing should be explicit (nothing happens unless you tell it to happen) instead of implicit (everything on the server is exposed by default). Because PHP chose the latter approach, WordPress, Drupal, et. al. traditional PHP sites are getting hacked very easily when they are given to people who don't have the full picture what they are doing. It might be convenient in the beginning, but it is also an open invitation for script kiddies to raid your server.

Serving all REST requests over GET with Tornado

I have a REST (or almost REST) web api,
I want the API users to be able to use all the api, even if for some reason the can only make GET calls, so the plan is to accept a url parameter (query string) like request_method that can be GET (default) or POST, PUT, DELETE and I want to route them.
My question is other than the standard request handler overrides and checking in each httpRequestHandler in the get(self) method if this is meant to be a POST, PUT, DELETE and calling the appropriate functions, is there a way to do this "routing" in a more general way, like in the URL patterns in the application definition or overriding a routing function or something?
To make it clear, these requests are all coming over GET with a parameter for example like ?request_method=POST
Any suggestions is appreciated.
Possible Solutions:
only have a ".*" url pattern and handle all the routing in a single RequestHandler. Should work fine, except that I won't be taking advantage of the url pattern matching features of Tornado.
add an if to all the get(self) methods in all the request handlers and check if the request should be handled by get if not, call the relevant method.

This would be a very foolish thing to do. Both Chrome and Firefox, along with many other web user agents, will speculatively fetch (GET) some or all of the links on a page, including your request_method=DELETE URLs. You will find your database has been emptied out just because someone was looking around. Do not deliberately break HTTP. GET is defined to be a "safe" method, meaning it's okay to GET any URL you like and nothing bad will happen.
EDIT for others in similar situations:
The OP says he is using JSONP and is in control of both the API server and the client web app. In such a case the ideal solution is Cross-Origin Resource Sharing (CORS, spec), although this technology requires IE8+, Firefox 3.5+, Safari 4+ or Chrome 3+. If you need to target earlier browsers, and you control both domains, I would recommend merging the content of the two domains at least for your own client web app. The api domain can remain for external clients, but they would be restricted by the CORS browser requirements.

nodejs or python proxy lib with relative url support

I am looking for a lib that lets me roughly:
connect to localhost:port, but see http://somesite.com
rewrite all static assets to point to localhost:port instead of somesite.com
support cookies / authentication
i know that http://betterinternet.co/ does this already, but they wont give me their source code for some reason.
I assume this doesnt exist as free code, so if i were to write one, are there any nuances to it? If i replace all occurrences of somesite.com in html and headers, will that be enough?

So...you want an http proxy that does link rewriting? Sounds like Apache and mod_proxy_html. It's not written in node or Python, but I think it will do what you're asking.

I don't see any straight forward solution to your problem. If I've understood correctly, you want a caching HTTP proxy which serves static contents locally, with URL rewriting rules defined in Python (or nodejs). That's quite a task.
A caching HTTP proxy implementation is not trivial. So I'd use an existing implementation, such as Squid (or Apache if it does caching too).
You could then place a (relatively) simple HTTP server written in Python in front of that (e.g. based on BaseHTTPServer and urllib2) which performs the URL rewriting as you want them and forwards the requests to the proxy (or direct to internet).
The idea would be to rely on the proxy setup to perform all the processing you don't want to modify (including basic rewrite rules, authentication, caching and cache management) and limit your front-end implementation to performing only the custom rewriting you are interested in.

POST request to Tastypie returns a non-SSL Location Header

I am doing a POST request to my Tastypie api, which creates a resource.
It normally returns the resource uri, through the Location header in the response.
The problem I'm having is the Location header contains a non-ssl url, even though my initial request (and the whole of my application) is under https.
From my request headers:
URL: https://example.com/api/v1/resource/
From my response headers:
Location: http://example.com/api/v1/resource/80/
Because this is a reusable application that is not always running under ssl, I do not want to hardcode an ugly string replace. Also, there is already a 301 redirect in place, from http to https, but I do not want the redirect to happen.
All help appreciated!
Update:
This actually didn't have anything to do with Tastypie, it was because of the servers/proxy configuration. See answer below for resolution details.

The reason is simple: seemingly request.is_secure() returns False in your case, so the URL is constructed using http instead of https.
There are couple of solutions, but you should first find what caused request.is_secure() to return False. I bet you are running behind some proxy or load balancer. If you did not change the logic behind URL generation, then this is probably the cause of your issue.
To fix that, you can take a look at SECURE_PROXY_SSL_HEADER setting in Django, which defines headers that indicate the SSL connection established with the proxy or load balancer:
If your Django app is behind a proxy, though, the proxy may be "swallowing" the fact that a request is HTTPS, using a non-HTTPS connection between the proxy and Django. In this case, is_secure() would always return False -- even for requests that were made via HTTPS by the end user.
In this situation, you'll want to configure your proxy to set a custom HTTP header that tells Django whether the request came in via HTTPS, and you'll want to set SECURE_PROXY_SSL_HEADER so that Django knows what header to look for.
But if you are designing a reusable app and the above is correct in your case, just make sure it is not something different. If you are sure this is the case, then leave that to the user - the headers responsible for secure request indication should be set explicitly, only by the programmer who uses your app. Otherwise this could mean a security issue.

Google App Engine URL Fetch Doesn't Work on Production

I am using google app engine's urlfetch feature to remotely log into another web service. Everything works fine on development, but when I move to production the login procedure fails. Do you have any suggestions on how to debug production URL fetch?
I am using cookies and other headers in my URL fetch (I manually set up the cookies within the header). One of the cookies is a session cookie.
There is no error or exception. On production, posting a login to the URL command returns the session cookies but when you request a page using the session cookies, they are ignored and you are prompted for login information again. On development once you get the session cookies you can access the internal pages just fine. I thought the problem was related to saving the cookies, but they look correct as the requests are nearly identical.
This is how I call it:
fetchresp = urlfetch.fetch(url=req.get_full_url(),
payload=req.get_data(),
method=method,
headers=all_headers,
allow_truncated=False,
follow_redirects=False,
deadline=10
)
Here are some guesses as to the problem:
The distributed nature of google's url fetch implementation is messing things up.
On production, headers are sent in a different order than in development, perhaps confusing the server.
Some of google's servers are blacklisted by the target server.
Here are some hypothesis that I've ruled out:
Google caching is too aggressive. But I still get the problem after turning off cache by using the header Cache-Control: no-store.
Google's urlfetch is too fast for the target server. But I still get the problem after inserting delays between calls.
Google appends some data to the User-Agent header. But I have added that header to development and I don't get the problem.
What other differences are there between the production URL fetch and the development URL fetch? Do you have any ideas for debugging this?
UPDATE 2
(First update was incorporated above)
I don't know if it was something I did (maybe adding delays or disabling caches mentioned above) but now the production environment works about 50% of the time. This definitely looks like a race condition. Unfortunately, I have no idea if the problem is in my code, google's code, or the target server's code.

As others have mentioned, the key differences between dev and prod are the originating IP, and how some of the request headers are handled. See here for a list of restricted headers. I don't know if this is documented, but in prod, your app ID is appended to the end of your user agent. I had an issue once where requests in prod only were getting detected as a search engine spider because my app ID contained the string "bot".
You mentioned that you're setting up cookies manually, including the session cookie. Does this mean that you established a session in Dev, and then you're trying to re-use it in prod? Is it possible that the remote server is logging the source IP that establishes a session, and requiring that subsequent requests come from the same IP?
You said that it doesn't work, but you don't get an exception. What exactly does this mean? You get an HTTP 200 and an empty response body? Another HTTP status? Your best bet may be to contact the owners of the remote service and see if they can tell you more specifically what was wrong with your request. Anything else is just speculation.

Check your server's logs to see if GAE is chopping any headers off. I've noticed that GAE (thought I think I've seen it on the dev server) will chop off headers it doesn't like.
Depending on the web service you're calling, it might also be less ok with GAE calling it than your local machine.

I ran across this issue while making a webapp with an analogous issue- when looking at urlfetch's documentation, it turns out that the maximum timeout for a fetch call is 60 seconds, but it defaults to 5 seconds.
5 seconds on my local machine was long enough to request URLs on my local machine, but on GAE it was only consistently completing its task in 5 seconds about 20% of the time.
I included the parameter deadline=60 and it has been working fine since.
Hope this helps others!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.