I'm trying to get consistent URL strings in my mobile client before submitting, and on the server once received, to be able to reliably add a hash for security checksum purposes. Currently I'm adding the hash after URL-encoding on the client, and attempting to grab the URL before anything gets decoded on the server, but I'm getting one character (a period) already decoded:
When I post something like this:
https://myapp.appspot.com/endpt?par=0%3Afirstlast%40gmail%2Ecom&di . . .
From this on the server:
self.request.url
I get:
https://myapp.appspot.com/endpt?par=0%3Afirstlast%40gmail.com&di . . .
And from this:
self.request.get('par')
I get it completely decoded as I would expect:
0:firstlast#gmail.com
I'm wondering how I can grab the URL before ANY decoding happens? Or alternately, I could do my hashing outside of the encoding/decoding if it's possible to grab the URL with the entire query portion decoded? I.e. I can inject my hash at any point that I can get consistent, reliable results. Thanks.
You could fetch this directly from your WSGI environment, but I'd suggest taking a page out of Amazon's book instead, and defining a canonical format for signing URLs. Then you can encode and format the URLs in the same way on both ends, and you don't have to rely on the vagaries of frameworks and proxies not to interfere with the trivial encoding details of your URLs.
Related
I am building an API using Flask and I would like to make use of HMAC-signed URLs. However, I am not finding any standard implementations for doing this which work well with Flask's url_for mechanism.
Ideally, I would like to be able to support parameters sent both via the request routing functions and via GET query strings, without having to know which is which up-front.
The only library I managed to find for handling URL signing in Python is Ska, which is not suitable for my needs as it wants to handle all of its own query string building. In particular, simply passing a flask.url_for URL in results in an invalid URL if that URL already contains a query string:
>>> ska.sign_url(url='http://example.com/foo?bar=baz',secret_key='qwerpoiu',auth_user='')
'http://example.com/foo?bar=baz?signature=Fz3fKO3BqD6o4mmMn4EAg9f6pts%3D&auth_user=&valid_until=1561903773.0&extra='
(Note the secondary ?in the resulting URL, which should be a &.)
Since Flask already does a perfectly good job of parsing and building URLs, is there a standard pattern for signing and validating URLs produced by Flask?
What I ended up doing is parsing the URL before signing it, and then if there's an existing query string, passing a suffix of '&' to ska.sign_url; for example:
ska.sign_url(url=my_url,
suffix='&' if '?' in my_url else '?',
**signing_options)
I have a problem here:
import tornado.httpclient
from tornado.httpclient import AsyncHTTPClient
AsyncHTTPClient.configure("tornado.curl_httpclient.CurlAsyncHTTPClient")
# inside a function
client = AsyncHTTPClient()
result = yield client.fetch('http://some-site.com/#hash?&key=value', raise_error=False)
print(result.effective_url) # prints: http://some.site/some/path/
Note that key-values go after hash. Some site that I scrape gives redirects like this. If I comment out "AsyncHTTPClient.configure('tornado.curl_httpclient.CurlAsyncHTTPClient')" that all works fine, but I cannot use proxy to intersect and view the HTTP exchanges. And with this line staff after hash disappears... Can Anyone tell me why?
Everything after the # is called the "fragment", and it is not normally sent to the server. Instead, it is made available for the browser and javascript to use. At the level of HTTP, http://some-site.com/#hash?&key=value is equivalent to http://some-site.com/. AsyncHTTPClient should be stripping off the fragment whether you use curl or not; the difference you're seeing here is probably a bug.
You want to pass #fragment part. It is used by browsers to navigate through anchors on page or client side routing (more info rfc3985 3.5).
Fragment is not send to the server by browser. Also libcurl do not send the fragment part, as doc says:
While space is not typically a "legal" letter, libcurl accepts them.
When a user wants to pass in a '#' (hash) character it will be treated
as a fragment and get cut off by libcurl if provided literally. You
will instead have to escape it by providing it as backslash and its
ASCII value in hexadecimal: "\23".
You could replace # with %23 as well, but the server should know how to handle it, more likely it does not since it is the part handled by a browser.
I'm having issues getting a URL pattern to work.
The URL is in the format of the following:
/API#access_token=<string>&expires_in=<timestamp>
I can't change the #access_token=&expires_in= part unfortunately, as this is outside of my control, and I simply have to just make my side of the code work.
I've tried a number of different patterns, a number of which are outlined below. This is my first Django project, and any advice, and pointers would be much appreciated.
url(r'^API#access_token=(?P<token_info>\w+)&expires_in(?P<time>\d+)$'
url(r'^API#(?P<tokens>\w+)$'
url(r'^API/#(?P<tokens>\w+)&(?P<expiration>\d+)$'
The issue is that the anchor #, also called the fragment identifier, is not sent to the server by the browser. The regex cannot capture what is not there. From the wikipedia article on the fragment identifier:
The fragment identifier functions differently than the rest of the
URI: namely, its processing is exclusively client-side with no
participation from the web server — of course the server typically
helps to determine the MIME type, and the MIME type determines the
processing of fragments. When an agent (such as a Web browser)
requests a web resource from a Web server, the agent sends the URI to
the server, but does not send the fragment. Instead, the agent waits
for the server to send the resource, and then the agent processes the
resource according to the document type and fragment value.
The only way around this is to parse the fragment in JavaScript on the client side and send it as a separate asynchronous request. For a GET request, you could send the fragment as a query parameter (after stripping off the hash) or put it in the header as a custom value.
I'm trying to program a small HTTP local proxy server to run on my machine and run some tests.
My server currently runs perfectly and serves the requests fine.
However, when I try to analyse the packer - I get a problem.
I'm searching for the tag "" in my packets, and print a message to a log when I find it.
It works on a very limited number of websites, while on the other, like StackOverflow for example, it doesn't.
Do I need to some sort of decoding before I search for the word in the received data? If so - which decoding? How do I recode the data to serve to the browser?
Here's my code for the searching and replacing:
data = i.recv(8192)
if data:
if "<head>" in data:
print "Found Head Tag."
The above code is a simple python code to retrieve the data from the socket, save it to the data object, and search for the wanted tag. As I said, it works on very few websites, and not on the others.
Many webservers use compression to lower bandwidth usage.
You will need to check HTTP headers for Content-Encoding and apply the required operations (i.e. gzip decompression) to get the plain text.
I am serving some JSON content from a Google App Engine server. I need to serve an ETAG for the content to know if its changed since i last loaded the data. Then my app will remove its old data and use the new JSON data to populate its views.
self.response.headers['Content-Type'] = "application/json; charset=utf-8"
self.response.out.write(json.dumps(to_dict(objects,"content")))
Whats the best practice to set ETAGs for the response? Do i have to calculate the ETAG myself? Or is it a way to get the HTTP protocol to do this?
If you're using webapp2, it can add an md5 ETag based on the response body automatically.
self.response.md5_etag()
http://webapp-improved.appspot.com/guide/response.html
You'll have to calculate the e-tag value yourself. E-tags are opaque strings that only have meaning to the application.
Best practice is to just concatenate all the input variables (converted to string) that determine the JSON content; anything that, if changed, would result in a change in the JSON output, should be part of this. If there is anything sensitive in those strings you wouldn't want to be exposed, use the MD5 hash of the values instead.
For example, in a CMS application that I administer, the front page has the following e-tag:
|531337735|en-us;en;q=0.5|0|Eli Visual Theme|1|943ed3c25e6d44497deb3fe274f98a96||
The input variables that we care about have been concatenated with the | symbol into an opaque value, but it does represent several distinct input values, such as a last-modified timestamp (the number), the browser accepted language header, the current visual theme, and a internal UID that is retrieved from a browser cookie (and which determines what context the content on the front page is taken from). If any of those variables would change, the page is likely to be different and the cached copy would be stale.
Note that an e-tag is useless without a means to verify it quickly. A client will include it in a If-None-Match request header, and the server should be able to quickly re-calculate the e-tag header from the current variables and see if the tag is still current. If that re-calculation would take the same amount of time as regenerating the content body, you only save a little bandwidth sending the 304 Not Modified response instead of a full JSON body in a 200 OK response.