Flask Redirect URLs escaping - python

I'm trying to redirect to a Graphite URL with Flask. The graphite URLs I'm building are complex and must include the literal characters {, }, and |. Flask is escaping them to %7B %7C
and %7D.
Is there any way I can stop this? On the graphite side, I want a target that looks like this: sumSeries({metric|metric|metric})
#app.route("/")
def index():
instances = get_data()
url = build_graphite_url(instances)
print url
return redirect(url)

If you dig into the Flask source you will eventually run into a function called get_wsgi_headers in wrappers.py under werkzeug: See here.
This function is called when the final response is created and returned, and if you scroll down a little you will find that it checks to see if a location header was set and if so, does some auto correction to make sure the url is absolute. During this time, it needs to escape the url, which is why your URL is escaped.
From the best of my knowledge, the only way to prevent this is to patch get_wsgi_headers so that it will basically not escape certain characters, since after all Flask is open source :)
Also as a side note, the reason why you cannot listen for the after_request callback and modify the response headers is because werkzeug's get_wsgi_headers is called after the callback, so whatever Location you set in the callback will end up being escaped as well.

Related

Signing and validating URLs with Flask

I am building an API using Flask and I would like to make use of HMAC-signed URLs. However, I am not finding any standard implementations for doing this which work well with Flask's url_for mechanism.
Ideally, I would like to be able to support parameters sent both via the request routing functions and via GET query strings, without having to know which is which up-front.
The only library I managed to find for handling URL signing in Python is Ska, which is not suitable for my needs as it wants to handle all of its own query string building. In particular, simply passing a flask.url_for URL in results in an invalid URL if that URL already contains a query string:
>>> ska.sign_url(url='http://example.com/foo?bar=baz',secret_key='qwerpoiu',auth_user='')
'http://example.com/foo?bar=baz?signature=Fz3fKO3BqD6o4mmMn4EAg9f6pts%3D&auth_user=&valid_until=1561903773.0&extra='
(Note the secondary ?in the resulting URL, which should be a &.)
Since Flask already does a perfectly good job of parsing and building URLs, is there a standard pattern for signing and validating URLs produced by Flask?
What I ended up doing is parsing the URL before signing it, and then if there's an existing query string, passing a suffix of '&' to ska.sign_url; for example:
ska.sign_url(url=my_url,
suffix='&' if '?' in my_url else '?',
**signing_options)

Do I need to use methods=['GET', 'POST'] in #app.route()?

My Forms send the age parameter via GET, and it worked with just this:
#app.route("/foo")
def foo():
age = request.args['age']
I did not bother with
#app.route('/foo', methods=['GET', 'POST'])
Does it matter?
It does not matter, in the sense that it will work. However usually, you would like to have several functions doing different things like. POST to /foo, means that you add an element, GET to /foo means that you retrieve the element(s) and DELETE to /foo means that you delete an element.
If you don't specify a methods argument to app.route(), then the default is to only accept GET and HEAD requests (*).
You only need to explicitly set methods if you need to accept other HTTP methods, such as POST, otherwise Flask will respond with a 405 Method Not Allowed HTTP response code when a client uses a HTTP method you didn't list, and your route function is simply not called.
So if your route should handle both GET and POST requests, but you forgot to add methods=['GET', 'POST'] to #route(), then you have a bug as POST requests result in a 405 response instead of your route handling the request.
In your case, however, you should not use methods=['GET', 'POST'], and instead let clients that try to use POST anyway know your route doesn't handle that method. Better to be explicit about the error than let it silently pass.
(*) HEAD is added whenever you use register a route that handles GET, and in case of a HEAD request, your route is called and only the headers are then served to the client. Flask automatically handles OPTIONS for you, the route is not called in that case.
As always, the answer is: it depends.
If you don't provide "methods" arguments, then Flask assumes the HTTP method is GET (and also accepts HEAD). So long as that assumption is valid, your code will work just fine.
If, however, your web page is communicated as a POST method (or DELETE, etc.), Flask will fail and complain that the POST (or DELETE, etc.) request is not allowed.
Think of this requirement as a redundancy check. Flask could have been written to adapt to whatever method is used in the HTTP request. Instead, Flask insists that you specify the method as a signal that the form of communication is intentional. This requirement makes the Flask implementation a little simpler at the cost of imposing the responsibility of coordinating the client-server interface on the programmer.

Python tornado AsyncHTTPClient fluke

I have a problem here:
import tornado.httpclient
from tornado.httpclient import AsyncHTTPClient
AsyncHTTPClient.configure("tornado.curl_httpclient.CurlAsyncHTTPClient")
# inside a function
client = AsyncHTTPClient()
result = yield client.fetch('http://some-site.com/#hash?&key=value', raise_error=False)
print(result.effective_url) # prints: http://some.site/some/path/
Note that key-values go after hash. Some site that I scrape gives redirects like this. If I comment out "AsyncHTTPClient.configure('tornado.curl_httpclient.CurlAsyncHTTPClient')" that all works fine, but I cannot use proxy to intersect and view the HTTP exchanges. And with this line staff after hash disappears... Can Anyone tell me why?
Everything after the # is called the "fragment", and it is not normally sent to the server. Instead, it is made available for the browser and javascript to use. At the level of HTTP, http://some-site.com/#hash?&key=value is equivalent to http://some-site.com/. AsyncHTTPClient should be stripping off the fragment whether you use curl or not; the difference you're seeing here is probably a bug.
You want to pass #fragment part. It is used by browsers to navigate through anchors on page or client side routing (more info rfc3985 3.5).
Fragment is not send to the server by browser. Also libcurl do not send the fragment part, as doc says:
While space is not typically a "legal" letter, libcurl accepts them.
When a user wants to pass in a '#' (hash) character it will be treated
as a fragment and get cut off by libcurl if provided literally. You
will instead have to escape it by providing it as backslash and its
ASCII value in hexadecimal: "\23".
You could replace # with %23 as well, but the server should know how to handle it, more likely it does not since it is the part handled by a browser.

Remove # from the URL in Python Simple HTTP Server

I have an angularjs app that uses Angular UI Router and the URL that are created have a # in them.. Eg. http://localhost:8081/#/login. I am using Python Simple HTTP server to run the app while developing. I need to remove the # from the URL. I know how to remove it by enabling HTML5 mode in angular. But that method has its problems and i want to remove the # from the server side.
How can i do this using Python Simple HTTP Server?
You can derive from BaseHTTPRequestHandler and override the do_GET method.
class MyHTTPHandler(BaseHTTPRequestHandler):
def do_GET(self):
# do your own processing further, if you want to pass it back to
# default handler with the new path, just do the following
self.path = self.path.strip('/#')
BaseHTTPRequestHandler.do_GET(self)
Well i had a similar problem but the difference is that i had Spring on the Server Side.
You can capture page not found exception at your server side implementation, and redirect to the default page [route] in your app. In Spring, we do have handlers for page not found exceptions, i guess they are available in python too.
You could catch the string that represends the webadres adres, and if there is a # in replace it with a empty character.
stringA = 'http://localhost:8081/#/login'
stringB = stringA.replace("#", "")
print(stringB)

can WSGI get the full URL rather than simply : environ['SERVER_NAME'] ?.. if so.. Mod_Rewrite alternative?

Currently..
environ['SERVER_NAME']
can get the domain name. such as..
domaint.tld
or
sub.domain.tld
but can it also get..
domain.tld/file/in/question.extension
also ?
if that is possible.. then can this somehow replace.. mod rewrite and do something like this..
suppose the url is
domain.tld/01
we split this at
/
and then if it starts with 0 it must be something that we are looking for..
use the number after '0'
to pull out variable content.
can this replace mod_rewrite ?
i do not like touching mod_rewrite due to complex stuff of regex..
It cannot really replace mod_rewrite if the intent is that the rewritten URL be fed back in as a sub request into Apache as a whole with request content still available. All you can really do is perform URL rewriting for use of the URL and request within a specific Python WSGI application.
The qualification on the first about request content is that when using daemon mode of mod_wsgi, it is possible to return a 200 response with no content and a Location response header, where the value of the Location header is a server relative URL. This will cause a new sub request to be performed at the Apache level, but in doing that it can only issue a GET request with no request content.
The 200/Location response can thus be used as a way for a WSGI application to reroute a request the WSGI application handles back to Apache to serve up a different resource, such as a static file or even a PHP application. This is by no means though a general mechanism that could be used to replace mod_rewrite and would only be suitable in certain use cases.
And on the specific point of access to other parts of the original request information, you have:
SERVER_NAME
SCRIPT_NAME
PATH_INFO
See for example URL reconstruction in the WSGI specification at:
https://www.python.org/dev/peps/pep-3333/#url-reconstruction
Under Apache/mod_wsgi you also have REQUEST_URI, but that can be complicated to deal with as is not decoded nor normalised and can be a server relative URL or technically a URI format including scheme and host/port information.

Categories

Resources