Testing request parameters in Django ("+" behaves differently)

Testing request parameters in Django ("+" behaves differently) - python

I have a Django View that uses a query parameter to do some content filtering. Something like this:
/page/?filter=one+and+two
/page/?filter=one,or,two
I have noticed that Django converts the + to a space (request.GET.get('filter') returns one and two), and I´m OK with that. I just need to adjust the split() function I use in the View accordingly.
But...
When I try to test this View, and I call:
from django.test import Client
client = Client()
client.get('/page/', {'filter': 'one+and+two'})
request.GET.get('filter') returns one+and+two: with plus signs and no spaces. Why is this?
I would like to think that Client().get() mimics the browser behaviour, so what I would like to understand is why calling client.get('/page/', {'filter': 'one+and+two'}) is not like browsing to /page/?filter=one+and+two. For testing purposes it should be the same in my opinion, and in both cases the view should receive a consistent value for filter: be it with + or with spaces.
What I don´t get is why there are two different behaviours.

The plusses in a query string are the normal and correct encoding for spaces. This is a historical artifact; the form value encoding for URLs differs ever so slightly from encoding other elements in the URL.
Django is responsible for decoding the query string back to key-value pairs; that decoding includes decoding the URL percent encoding, where a + is decoded to a space.
When using the test client, you pass in unencoded data, so you'd use:
client.get('/page/', {'filter': 'one and two'})
This is then encoded to a query string for you, and subsequently decoded again when you try and access the parameters.

This is because the test client (actually, RequestFactory) runs django.utils.http.urlencode on your data, resulting in filter=one%2Band%2Btwo. Similarly, if you were to use {'filter': 'one and two'}, it would be converted to filter=one%20and%20two, and would come into your view with spaces.
If you really absolutely must have the pluses in your query string, I believe it may be possible to manually override the query string with something like: client.get('/page/', QUERY_STRING='filter=one+and+two'), but that just seems unnecessary and ugly in my opinion.

Related

Tornado - What is the difference between RequestHandler's get_argument(), get_query_argument() and get_body_argument()?

When to use RequestHandler.get_argument(), RequestHandler.get_query_argument() and RequestHandler.get_body_argument()?
What is the use-case for each of them?
Also what does the request.body and request.argument do in these cases? Which are to be used in which scenarios?
And, is there a request.query or something similar too?

Most HTTP requests store extra parameters (say, form values) in one of two places: the URL (in the form of a ?foo=bar&spam=eggs query string), or in the request body (when using a POST request and either the application/x-www-form-urlencoded or multipart/form-data mime type).
The Request.get_query_argument() looks for URL parameters, the RequestHandler.get_body_argument() lets you retrieve parameters set in the POST body. The RequestHandler.get_argument() method retrieves either a body or a URL parameter (in that order).
You use Request.get_argument() when you explicitly don't care where the parameter comes from and your endpoint supports both GET and POST parameters. Otherwise, use one of the other methods, to keep it explicit where your parameters come from.
The Request.get_*_argument methods use the request.body_arguments and request.query_arguments values (with request.arguments being their aggregate), decoded to Unicode. request.body is the undecoded, unparsed raw request body; and yes, there is an equivalent self.query containing the query string from the URL.

Escaping characters for instance query matching in webpy

(The title may be in error here, but I believe that the problem is related to escaping characters)
I'm using webpy to create a VERY simple todo list using peewee with Sqlite to store simple, user submitted todo list items, such as "do my taxes" or "don't forget to interact with people", etc.
What I've noticed is that the DELETE request fails on certain inputs that contain specific symbols. For example, while I can add the following entries to my Sqlite database that contains all the user input, I cannot DELETE them:
what?
test#
test & test
This is a test?
Any other user input with any other symbols I'm able to DELETE with no issues. Here's the webpy error message I get in the browser when I try to DELETE the inputs list above:
<class 'peewee.UserInfoDoesNotExist'> at /del/test
Instance matching query does not exist: SQL: SELECT "t1"."id", "t1"."title" FROM "userinfo" AS t1 WHERE ("t1"."title" = ?) PARAMS: [u'test']
Python /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/peewee.py in get, line 2598
Web POST http://0.0.0.0:7700/del/test
When I view the database file (called todoUserList.db) in sqlitebrowser, I can see that these entries do exist with the symbols, they're all there.
In my main webpy app script, I'm using a regex to search through the db to make a DELETE request, it looks like this:
urls = (
'/', 'Index',
'/del/(.*?)', 'Delete'
)
I've tried variations of the regex, such as '/del/(.*)', but still get the same error, so I don't think that's the problem.
Given the error message above, is webpy not "seeing" certain symbols in the user input because they're not being escaped properly?
Confused as to why it seems to only happen with the select symbols listed above.

Depending on how the URL escaping is functioning it could be an issue in particular with how "?" and "&" are interpreted by the browser (in a typical GET style request & and ? are special character used to separate query string parameters)

Instead of passing those in as part of the URL itself you should pass them in as an escaped querystring. As far as I know, no web server is going to respect wacky values like that as part of a URL. If they are escaped and put in the querystring (or POST body) you'll be fine, though.

Representation of python dictionaries with unicode in database queries

I have a problem that I would like to know how to efficiently tackle.
I have data that is JSON-formatted (used with dumps / loads) and contains unicode.
This is part of a protocol implemented with JSON to send messages. So messages will be sent as strings and then loaded into python dictionaries. This means that the representation, as a python dictionary, afterwards will look something like:
{u"mykey": u"myVal"}
It is no problem in itself for the system to handle such structures, but the thing happens when I'm going to make a database query to store this structure.
I'm using pyOrient towards OrientDB. The command ends up something like:
"CREATE VERTEX TestVertex SET data = {u'mykey': u'myVal'}"
Which will end up in the data field getting the following values in OrientDB:
{'_NOT_PARSED_': '_NOT_PARSED_'}
I'm assuming this problem relates to other cases as well when you wish to make a query or somehow represent a data object containing unicode.
How could I efficiently get a representation of this data, of arbitrary depth, to be able to use it in a query?
To clarify even more, this is the string the db expects:
"CREATE VERTEX TestVertex SET data = {'mykey': 'myVal'}"
If I'm simply stating the wrong problem/question and should handle it some other way, I'm very much open to suggestions. But what I want to achieve is to have an efficient way to use python2.7 to build a db-query towards orientdb (using pyorient) that specifies an arbitrary data structure. The data property being set is of the OrientDB type EMBEDDEDMAP.
Any help greatly appreciated.
EDIT1:
More explicitly stating that the first code block shows the object as a dict AFTER being dumped / loaded with json to avoid confusion.

Dargolith:
ok based on your last response it seems you are simply looking for code that will dump python expression in a way that you can control how unicode and other data types print. Here is a very simply function that provides this control. There are ways to make this function more efficient (for example, by using a string buffer rather than doing all of the recursive string concatenation happening here). Still this is a very simple function, and as it stands its execution is probably still dominated by your DB lookup.
As you can see in each of the 'if' statements, you have full control of how each data type prints.
def expr_to_str(thing):
if hasattr(thing, 'keys'):
pairs = ['%s:%s' % (expr_to_str(k),expr_to_str(v)) for k,v in thing.iteritems()]
return '{%s}' % ', '.join(pairs)
if hasattr(thing, '__setslice__'):
parts = [expr_to_str(ele) for ele in thing]
return '[%s]' % (', '.join(parts),)
if isinstance(thing, basestring):
return "'%s'" % (str(thing),)
return str(thing)
print "dumped: %s" % expr_to_str({'one': 33, 'two': [u'unicode', 'just a str', 44.44, {'hash': 'here'}]})
outputs:
dumped: {'two':['unicode', 'just a str', 44.44, {'hash':'here'}], 'one':33}

I went on to use json.dumps() as sobolevn suggested in the comment. I didn't think of that one at first since I wasn't really using json in the driver. It turned out however that json.dumps() provided exactly the formats I needed on all the data types I use. Some examples:
>>> json.dumps('test')
'"test"'
>>> json.dumps(['test1', 'test2'])
'["test1", "test2"]'
>>> json.dumps([u'test1', u'test2'])
'["test1", "test2"]'
>>> json.dumps({u'key1': u'val1', u'key2': [u'val21', 'val22', 1]})
'{"key2": ["val21", "val22", 1], "key1": "val1"}'
If you need to take more control of the format, quotes or other things regarding this conversion, see the reply by Dan Oblinger.

Catch full URL in python

I'm developing application using Bottle. How do I get full query string when I get a GET Request.
I dont want to catch using individual parameters like:
param_a = request.GET.get("a","")
as I dont want to fix number of parameters in the URL.
How to get full query string of requested url

You can use the attribute request.query_string to get the whole query string.

Use request.query or request.query.getall(key) if you have more than one value for a single key.
For eg., request.query.a will return you the param_a you wanted. request.query.b will return the parameter for b and so on.
If you only want the query string alone, you can use #halex's answer.

json object as get parameter

I'm writing API for a mongo database. I need to pass JSON object as GET parameter:
example.com/api/obj/list/1/?find={"foo":"bar"}
How should I organize this better?
I thought about using JSON-like objects without quotes and spaces, for example:
{$or:[{a:foo+bar},{b:2}]}
So is there any tools to parse it in Python/Django?

It should be fine as long as the JSON objects aren't too big, they don't contain sensitive data (it sucks to see your password in your browser history) and you URL-escape them.
Unfortunately, you have to take shortcuts if you want to have a human-readable JSON parameter. All JSON brackets ({, }, [, ]) are recommended for escaping. You don't have to escape them, but you are taking a risk if you don't. More annoying is the :, which is ubiquitous in JSON and must be escaped.
If you want human-readable query strings, then the sensible solution is to encode all query parameters explicitly. A compromise that might work quite well is to unpack the top-level JSON object into explicit query parameters, each of remains JSON-encoded. Going a small step further, you could drop any top-level delimiters that remain, e.g.:
JSON: {"foo":"bar", "items":[1, 2, 3], "staff":{"id":432, "first":"John", "last":"Doe"}}
Query: foo=bar&items=1,2,3&staff="id"%3A432,"first"%3A"John","last"%3A"Doe"
Since you know that foo is a string, items is an array and staff is an object, you can rehydrate the JSON syntax correctly before sending the lot to a JSON parser.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.