How is the django's HttpRequest.META dictionary populated?

How is the django's HttpRequest.META dictionary populated? - python

How is the django's HttpRequest.META dictionary populated? Do all the keys and values come from the headers of the http - request sent by the client? If so, must I assume that all of these values can be modified by the client?
I am asking because I can't find most of the keys in the headers that are displayed in my chrome debugging console. And some of those keys are definitely not the client's business, for example the username of a user logged in via Shibboleth. It makes no sense to me why this kind of data would be sent first from the server to the client and then back to the server via the http-request.

Most of request.META comes from the script's environment, cf the django.core.handler.wsgi.WSGIRequest class initializer. I'm talking about the wsgi handler only here but AFAICT it's currently the only concrete handler subclass and all other deployement options end up using wsgi one way or another (cf django.core.server.fastcgi and django.core.server.basehttp).
IOW: what you get in request.META depends on what the calling script passed in, which depends on the front server etc.

I believe you are correct, and the data should never be trusted, the only thing between you and the client is the server, e.g. ngix, which might modify the header, e.g. allow only certain size and so on. but i could be wrong :)

I meet the same problem when i try to add custom request.META keys in DJango.
From the official document:
With the exception of CONTENT_LENGTH and CONTENT_TYPE, as given above, any HTTP headers in the request are converted to META keys by converting all characters to uppercase, replacing any hyphens with underscores and adding an HTTP_ prefix to the name. So, for example, a header called X-Bender would be mapped to the META key HTTP_X_BENDER.
Note that runserver strips all headers with underscores in the name, so you won’t see them in META. This prevents header-spoofing based on ambiguity between underscores and dashes both being normalizing to underscores in WSGI environment variables. It matches the behavior of Web servers like Nginx and Apache 2.4+.
That means if you add a header named "new_meta", it will convert to "HTTP_NEW_META", add a header named "new" will convert to "HTTP_NEW".
By the way, the other keys in request.META without "HTTP_" prefix is come from the server env, you can found them by run export on your server host.

Related

can WSGI get the full URL rather than simply : environ['SERVER_NAME'] ?.. if so.. Mod_Rewrite alternative?

Currently..
environ['SERVER_NAME']
can get the domain name. such as..
domaint.tld
or
sub.domain.tld
but can it also get..
domain.tld/file/in/question.extension
also ?
if that is possible.. then can this somehow replace.. mod rewrite and do something like this..
suppose the url is
domain.tld/01
we split this at
/
and then if it starts with 0 it must be something that we are looking for..
use the number after '0'
to pull out variable content.
can this replace mod_rewrite ?
i do not like touching mod_rewrite due to complex stuff of regex..

It cannot really replace mod_rewrite if the intent is that the rewritten URL be fed back in as a sub request into Apache as a whole with request content still available. All you can really do is perform URL rewriting for use of the URL and request within a specific Python WSGI application.
The qualification on the first about request content is that when using daemon mode of mod_wsgi, it is possible to return a 200 response with no content and a Location response header, where the value of the Location header is a server relative URL. This will cause a new sub request to be performed at the Apache level, but in doing that it can only issue a GET request with no request content.
The 200/Location response can thus be used as a way for a WSGI application to reroute a request the WSGI application handles back to Apache to serve up a different resource, such as a static file or even a PHP application. This is by no means though a general mechanism that could be used to replace mod_rewrite and would only be suitable in certain use cases.
And on the specific point of access to other parts of the original request information, you have:
SERVER_NAME
SCRIPT_NAME
PATH_INFO
See for example URL reconstruction in the WSGI specification at:
https://www.python.org/dev/peps/pep-3333/#url-reconstruction
Under Apache/mod_wsgi you also have REQUEST_URI, but that can be complicated to deal with as is not decoded nor normalised and can be a server relative URL or technically a URI format including scheme and host/port information.

Constant Flask Session IDs

I've a Flask application, served with Nginx+WSGI (FastCGI & Gevent) and use standard Flask sessions. I do not use the session.permanent=True or any other extra option, but simply set SECRET_KEY in the default configuration.
I do not save any (key,value) pairs in the session, and only rely on the SID = session['_id'] entry to identify a returning user. I use the following code the read the SID:
#page.route ('/')
def main (page='home', template='index.html'):
if not request.args.get ('silent', False):
print >> sys.stderr, "Session ID: %r" % session['_id']
I made the following observations:
For same IP addresses, but different browsers I get different SIDs - that's expected;
For different IPs & same browser I again have different SIDs - expected;
For same IP address with same browser I get same SID - also expected;
Now, point (3) is interesting because even if a delete the corresponding cookie the SID remains constant! To some extent even that might be understandable, but actually I was expecting the SID to change between different cookies. But the only difference I see is that
session.new is True
for the first request immediately after the deletion of the cookie. Even that is very much expected; but given these facts I face the following problems:
Does this mean that for different users sitting behind the same IP (with the same browser configuration) my back-end will mistake them for the same user?
If point (1) is not the case, the current behavior of these "sticky" sessions is actually quite pleasant, since this avoids the situation where my users might loose there data just because they deleted the corresponding cookie.
They can still save the day, by revisiting the site from the same network with the same browser. I like that, but only if point (1) is not the case.
I assume point (1) will actually bite me, would the conclusion actually be to save a token in the session and hence accept the fate that the user can blow himself up, by simply deleting his cookie?
Or is there a way to tell Flask to give different SIDs for each fresh cookie?
Actually, this question arouse since I used a load impact service, which was simulating different users (on the same IP) but my back-end kept seeing them as a single user since the corresponding SIDs were all the same.
The application is available for tests at http://webed.blackhan.ch (which upon release will move the https://notex.ch [a browser based text editor]). Thank you for your answers.

It looks like you're using the Flask-Login extension. Here's the code that generates the id token:
def _create_identifier():
base = unicode("%s|%s" % (request.remote_addr,
request.headers.get("User-Agent")), 'utf8', errors='replace')
hsh = md5()
hsh.update(base.encode("utf8"))
return hsh.digest()
It's basically just md5(ip_address + user_agent).
Flask uses Werkzeug's secure cookies to store this identifier. Secure cookies are (as their name suggests) secure:
This module implements a cookie that is not alterable from the client because it adds a checksum the server checks for. You can use it as session replacement if all you have is a user id or something to mark a logged in user.

session['_id'] is not an actual session identifier. It's just a value used by Flask-Login to implement Session Protection.
Standard Flask sessions do not have an SID - as it would serve no purpose since the actual content of the session is stored in the cookie itself. Also see this.

it's now 2022, and Flask-Session does support session.sid to get a generated UUID that looks something like this:
print(session.sid)
>>> f9c792fa-70e0-46e3-b84a-3a11813468ce
From the docs (https://flasksession.readthedocs.io/en/latest/)
sid
Session id, internally we use uuid.uuid4() to generate one session id. You can access it with session.sid.

Is cookie a common and secure implementation of session?

I'm using pyramid web framework. I was confused by the relationship between the cookie and session. After looked up in wikipedia, did I know that session is an abstract concept and cookie may just be an kind of approach (on the client side).
So, my question is, what's the most common implementation (on both the client and server)? Can somebody give some example (maybe just description) codes? (I wouldn't like to use the provided session support inside the pyramid in order to learn)

The most common implementation of sessions is to use a cookie.
A cookie provides a way to store an arbitrary piece of text, which is usually used as a session identifier. When the cookie gets sent along with a HTTP request, the server (technically the code running on it) can use the cookie text (if it exists) to recognise that it has seen a client before. Text in a cookie usually provides enough information to retrieve extra information from the database about this client.
For example, a very naive implementation might store the primary key to the shopping_cart table in a database, so that when the server receives the cookie text it can directly use it to access the appropriate shopping cart for that particular client.
(And it's a naive approach because a user can do something like change their own cookie to a different primary key and access someone else's cart that way. Choosing a proper session id isn't as simple as it seems, which is why it's almost always better to use an existing implementation of sessions.)
An alternate approach is to store a session identifier is to use a GET parameter in the url (for example, in something like http://example.com/some/page?sid=4s6da4sdasd48, then the sid GET param serves the same function as the cookie string). In this approach, all links to other pages on the site have the GET param appended to them.

In general, the cookie stored with the client is just a long, hard-to-guess hash code string that can be used as a key into a database. On the server side, you have a table mapping those session hashes to primary keys (a session hash should never be a primary key) and expiration timestamps.
So when you get a request, first thing you do is look for the cookie. If there isn't one, create a session entry (cookie + expiration timestamp) in the database table. If there is one, look it up and make sure it hasn't expired; if it has, make a new one. In either case, if you made a new cookie, you might want to pass that fact down to later code so it knows if it needs to ask for a login or something. If you didn't need to make a new cookie, reset the expiration timestamp so you don't expire the session too soon.
While handling the view code and generating a response, you can use that session primary key to index into other tables that have data associated with the session. Finally, in the response sent back to the client, set the cookie to the session key hash.
If someone has cookies disabled, then their session cookie will always be new, and any session-based features won't work.

A session is (usually) a cookie that has a unique value. This value maps to a value in a database or held in memory that then tells you what session to load. PHP has an alternate method where it appends a unique value to the end of every URL (if you've ever seen PHPSESSID in a URL you now know why) but that has security implications (in theory).
Of course, since cookies are sent back and forth with every request unless you're talking over HTTPS you are sending the only way to know (reliably) that the client you are talking to now is the same one you logged in ten seconds ago to anyone on the same wireless network. See programs like Firesheep for reasons why switching to HTTPS is a good idea.
Finally, if you do want to build your own I, was given some advice on the matter by a university professor. Give out a new token on every page load and invalidate all a users tokens if an invalid token is used. This just means that if an attacker does get a token and logs in to it whilst it is still valid when the victim clicks a link both parties get logged out.

Alternate host/IP for python script

I want my Python script to access a URL through an IP specified in the script instead of through the default DNS for the domain. Basically I want the equivalent of adding an entry to my /etc/hosts file, but I want the change to apply only to my script instead of globally on the whole server. Any ideas?

Whether this works or not will depend on whether the far end site is using HTTP/1.1 named-based virtual hosting or not.
If they're not, you can simply replace the hostname part of the URL with their IP address, per #Greg's answer.
If they are, however, you have to ensure that the correct Host: header is sent as part of the HTTP request. Without that, a virtual hosting web server won't know which site's content to give you. Refer to your HTTP client API (Curl?) to see if you can add or change default request headers.

You can use an explicit IP number to connect to a specific machine by embedding that into the URL: http://127.0.0.1/index.html is equivalent to http://localhost/index.html
That said, it isn't a good idea to use IP numbers instead of DNS entries. IPs change a lot more often than DNS entries, meaning your script has a greater chance of breaking if you hard-code the address instead of letting it resolve normally.

I wish to "retrieve" the cookies sent by the client in my subclass of BaseHTTPRequestHandler.
Firstly I'm unsure of the exact sequence of sending of headers, in a typical HTTP request and response this is my understanding of the sequence of events:
Client sends request (method, path, HTTP version, host, and ALL headers).
The server responds with a response code, followed by a bunch of headers of its own.
The server then sends the body of the response.
When exactly is the client's POST data sent? Does any overlapping occur in this sequence as described above?
Second, when is it safe to assume that the "Cookie" header has been received by the server. Should all of the client headers have been received by the time self.send_response is called by the server? When in the HTTP communication is the appropriate time to peek at cookie headers in self.headers?
Thirdly, what is the canonical way to parse cookies in Python. I currently believe a Cookie.SimpleCookie should be instantiated, and then data from the cookie headers some how fed into it. Further clouding this problem, is the Cookie classes clunkiness when dealing with the HTTPRequestHandler interfaces. Why does the output from Cookie.output() not end with a line terminator to fit into self.wfile.write(cookie.output()), or instead drop the implicitly provided header name to fit nicely into self.send_header("Set-Cookie", cookie.output())
Finally, the cookie classes in the Cookie module, give the illusion that they're dictionaries of dictionaries. Assigning to different keys in the cookie, does not pack more data into the cookie, but rather generates more cookies... all apparently in the one class, and each generating its own Set-Cookie header. What is the best practise for packing containers of values into cookie(s)?

HTTP is a request/response protocol, without overlap; the body of a POST comes as part of the request (when the verb is POST, of course).
All headers also come as part of the request, including Cookie: if any (there might be no such header of course, e.g. when the browser is running with cookies disabled or whatever). So peek at the headers whenever you've received the request and are serving it.
I'm not sure what your "thirdly" problem is. No newline gets inserted if none is part of the cookie -- why ever should it be? Edit: see later.
On the fourth point, I think you may be confusing cookies with "morsels". There is no limit to the number of Set-Cookie headers in the HTTP response, so why's that a problem?
Edit: you can optionally pass to output up to three arguments: the set of morsel attributes you want in the output for each morsel (default None meaning all attributes), the header string you want to use in front of each morsel (default Set-Cookie:), the separator string you want between morsels (default \r\n). So it seems that your intended use of a cookie is single-morsel (otherwise you couldn't stick the string representation into a single header, which you appear most keen to do): in that case
thecookie.output(None, '')
will give you exactly the string you want. Just make multiple SimpleCookie instances with one morsel each (since one morsel is what fits into a single header!-).

Here's a quick way to get the cookies with no 3rd-party-libraries. While it only answers a section of the question, it may be answering the one which most "visitors" will be after.
import Cookie
def do_GET(self):
cookies = {}
cookies_string = self.headers.get('Cookie')
if cookies_string:
cookies = Cookie.SimpleCookie()
cookies.load(cookies_string)
if 'my-cookie' in cookies:
print(cookies['my-cookie'].value)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.