I'm writing API for a mongo database. I need to pass JSON object as GET parameter:
example.com/api/obj/list/1/?find={"foo":"bar"}
How should I organize this better?
I thought about using JSON-like objects without quotes and spaces, for example:
{$or:[{a:foo+bar},{b:2}]}
So is there any tools to parse it in Python/Django?
It should be fine as long as the JSON objects aren't too big, they don't contain sensitive data (it sucks to see your password in your browser history) and you URL-escape them.
Unfortunately, you have to take shortcuts if you want to have a human-readable JSON parameter. All JSON brackets ({, }, [, ]) are recommended for escaping. You don't have to escape them, but you are taking a risk if you don't. More annoying is the :, which is ubiquitous in JSON and must be escaped.
If you want human-readable query strings, then the sensible solution is to encode all query parameters explicitly. A compromise that might work quite well is to unpack the top-level JSON object into explicit query parameters, each of remains JSON-encoded. Going a small step further, you could drop any top-level delimiters that remain, e.g.:
JSON: {"foo":"bar", "items":[1, 2, 3], "staff":{"id":432, "first":"John", "last":"Doe"}}
Query: foo=bar&items=1,2,3&staff="id"%3A432,"first"%3A"John","last"%3A"Doe"
Since you know that foo is a string, items is an array and staff is an object, you can rehydrate the JSON syntax correctly before sending the lot to a JSON parser.
Related
I'm having a hard time understanding what is going on with this walmart API and I can't seem to iterate through key, values like I wish. I get different errors depending on the way I attack the problem.
import requests
import json
import urllib
response=requests.get("https://grocery.walmart.com/v0.1/api/stores/4104/departments/1256653758154/aisles/1256653758260/products?count=60&start=0")
info = json.loads(response.text)
print(info)
I'm not sure if I'm playing with a dictionary or a JSON object.
I'm thrown off because the API itself has no quotes over key/val.
When I do a json.loads it comes in but only comes in with single quotes.
I've tried going at it with for-loops but can only traverse the top layer and nothing else. My overall goal is to retrieve the info from the API link, turn it into JSON and be able to grab which ever key/val I need from it.
I'm not sure if I'm playing with a dictionary or a JSON object.
Python has no concept of a "JSON Object". It's a dictionary.
I'm thrown off because the API itself has no quotes over key/val.
Yes it does
{"aisleName":"Organic Dairy, Eggs & Meat","productCount":17,"products":[{"data":
When I do a json.loads it comes in but only comes in with single quotes
Because it's a Python dictionary, and the repr() of dict uses single quotes.
Try print(info['aisleName']) for example
I'm working on an API written in Python that accepts JSON payloads from clients, applies some validation and stores the payloads in MongoDB so that they can be processed asynchronously.
However, I'm running into some trouble with payloads that (legitimately) include keys that start with $ and/or include .. According to the MongoDB documentation, my best bet is to escape these characters:
In some cases, you may wish to build a BSON object with a user-provided key. In these situations, keys will need to substitute the reserved $ and . characters. Any character is sufficient, but consider using the Unicode full width equivalents: U+FF04 (i.e. “$”) and U+FF0E (i.e. “.”).
Fair enough, but here's where it gets interesting. I'd like this process to be transparent to the application, so:
The keys should be unescaped when retrieving the documents...
...but only keys that needed to be escaped in the first place.
For example, suppose a (nefarious) user sends a JSON payload that includes a key like \ff04mixed.chars. When the application gets this document from the storage backend, this key should be converted back into \ff04mixed.chars, not $mixed.chars.
My primary concern here is information leakage; I don't want somebody to discover that the application requires special treatment for $ and . characters. The bad guys probably how to exploit MongoDB way better than I know how to secure it, and I don't want to take any chances.
Here's the approach I ended up going with:
Before inserting a document into Mongo, run it through a SONManipulator that searches for and escapes any illegal keys in the document.
The original keys get stored as a separate attribute in the document so that we can restore them later.
After retrieving a document from Mongo, run it through the SONManipulator to extract the original keys and restore them.
Here's an abbreviated example:
# Example of a document with naughty keys.
document = {
'$foo': 'bar',
'$baz': 'luhrmann'
}
##
# Before inserting the document, we must first run it through our
# SONManipulator.
manipulator = KeyEscaper()
escaped = manipulator.transform_incoming(document, collection.name)
# Now we can insert the document.
document_id = collection.insert_one(escaped).inserted_id
##
# Later, we retrieve the document.
raw = collection.find_one({'_id': document_id})
# Run the document through our KeyEscaper to restore the original
# keys.
unescaped = manipulator.transform_outgoing(raw, collection.name)
assert unescaped == document
The actual document stored in MongoDB looks like this:
{
"_id": ObjectId('582cebe5cd9b344c814d98e3')
"__escaped__1": "luhrmann",
"__escaped__0": "bar",
"__escaped__": {
"__escaped__1": ["$baz", {}],
"__escaped__0": ["$foo", {}]
}
}
Note the __escaped__ attribute that contains the original keys so that they can be restored when the document is retrieved.
This makes querying against the escaped keys a little tricky, but that's infinitely preferable to not being able to store the document in the first place.
Full code with unit tests and example usage:
https://gist.github.com/todofixthis/79a2f213989a3584211e49bfba582b40
Some background first: I have a few rather simple data structures which are persisted as json files on disk. These json files are shared between applications of different languages and different environments (like web frontend and data manipulation tools).
For each of the files I want to create a Python "POPO" (Plain Old Python Object), and a corresponding data mapper class for each item should implement some simple CRUD like behavior (e.g. save will serialize the class and store as json file on disk).
I think a simple mapper (which only knows about basic types) will work. However, I'm concerned about security. Some of the json files will be generated by a web frontend, so a possible security risk if a user feeds me some bad json.
Finally, here is the simple mapping code (found at How to convert JSON data into a Python object):
class User(object):
def __init__(self, name, username):
self.name = name
self.username = username
import json
j = json.loads(your_json)
u = User(**j)
What possible security issues do you see?
NB: I'm new to Python.
Edit: Thanks all for your comments. I've found out that I have one json where I have 2 arrays, each having a map. Unfortunately this starts to look like it gets cumbersome when I get more of these.
I'm extending the question to mapping a json input to a recordtype. The original code is from here: https://stackoverflow.com/a/15882054/1708349.
Since I need mutable objects, I'd change it to use a namedlist instead of a namedtuple:
import json
from namedlist import namedlist
data = '{"name": "John Smith", "hometown": {"name": "New York", "id": 123}}'
# Parse JSON into an object with attributes corresponding to dict keys.
x = json.loads(data, object_hook=lambda d: namedlist('X', d.keys())(*d.values()))
print x.name, x.hometown.name, x.hometown.id
Is it still safe?
There's not much wrong that can happen in the first case. You're limiting what arguments can be provided and it's easy to add validation/conversion right after loading from JSON.
The second example is a bit worse. Packing things into records like this will not help you in any way. You don't inherit any methods, because each type you define is new. You can't compare values easily, because dicts are not ordered. You don't know if you have all arguments handled, or if there is some extra data, which can lead to hidden problems later.
So in summary: with User(**data), you're pretty safe. With namedlist there's space for ambiguity and you don't really gain anything. (compared to bare, parsed json)
If you blindly accept users json input without sanity check, you are at risk of become json injection victim.
See detail explanation of json injection attack here: https://www.acunetix.com/blog/web-security-zone/what-are-json-injections/
Besides security vulnerability, parse JSON to Python object this way is not type safe.
With your example of User class, I would assume you expect both fields name and username to be string type. What if the json input is like this:
{
"name": "my name",
"username": 1
}
j = json.loads(your_json)
u = User(**j)
type(u.username) # int
You have gotten an object with unexpected type.
One solution to make sure type safe is to use json schema to validate input json. more about json schema: https://json-schema.org/
I have a problem that I would like to know how to efficiently tackle.
I have data that is JSON-formatted (used with dumps / loads) and contains unicode.
This is part of a protocol implemented with JSON to send messages. So messages will be sent as strings and then loaded into python dictionaries. This means that the representation, as a python dictionary, afterwards will look something like:
{u"mykey": u"myVal"}
It is no problem in itself for the system to handle such structures, but the thing happens when I'm going to make a database query to store this structure.
I'm using pyOrient towards OrientDB. The command ends up something like:
"CREATE VERTEX TestVertex SET data = {u'mykey': u'myVal'}"
Which will end up in the data field getting the following values in OrientDB:
{'_NOT_PARSED_': '_NOT_PARSED_'}
I'm assuming this problem relates to other cases as well when you wish to make a query or somehow represent a data object containing unicode.
How could I efficiently get a representation of this data, of arbitrary depth, to be able to use it in a query?
To clarify even more, this is the string the db expects:
"CREATE VERTEX TestVertex SET data = {'mykey': 'myVal'}"
If I'm simply stating the wrong problem/question and should handle it some other way, I'm very much open to suggestions. But what I want to achieve is to have an efficient way to use python2.7 to build a db-query towards orientdb (using pyorient) that specifies an arbitrary data structure. The data property being set is of the OrientDB type EMBEDDEDMAP.
Any help greatly appreciated.
EDIT1:
More explicitly stating that the first code block shows the object as a dict AFTER being dumped / loaded with json to avoid confusion.
Dargolith:
ok based on your last response it seems you are simply looking for code that will dump python expression in a way that you can control how unicode and other data types print. Here is a very simply function that provides this control. There are ways to make this function more efficient (for example, by using a string buffer rather than doing all of the recursive string concatenation happening here). Still this is a very simple function, and as it stands its execution is probably still dominated by your DB lookup.
As you can see in each of the 'if' statements, you have full control of how each data type prints.
def expr_to_str(thing):
if hasattr(thing, 'keys'):
pairs = ['%s:%s' % (expr_to_str(k),expr_to_str(v)) for k,v in thing.iteritems()]
return '{%s}' % ', '.join(pairs)
if hasattr(thing, '__setslice__'):
parts = [expr_to_str(ele) for ele in thing]
return '[%s]' % (', '.join(parts),)
if isinstance(thing, basestring):
return "'%s'" % (str(thing),)
return str(thing)
print "dumped: %s" % expr_to_str({'one': 33, 'two': [u'unicode', 'just a str', 44.44, {'hash': 'here'}]})
outputs:
dumped: {'two':['unicode', 'just a str', 44.44, {'hash':'here'}], 'one':33}
I went on to use json.dumps() as sobolevn suggested in the comment. I didn't think of that one at first since I wasn't really using json in the driver. It turned out however that json.dumps() provided exactly the formats I needed on all the data types I use. Some examples:
>>> json.dumps('test')
'"test"'
>>> json.dumps(['test1', 'test2'])
'["test1", "test2"]'
>>> json.dumps([u'test1', u'test2'])
'["test1", "test2"]'
>>> json.dumps({u'key1': u'val1', u'key2': [u'val21', 'val22', 1]})
'{"key2": ["val21", "val22", 1], "key1": "val1"}'
If you need to take more control of the format, quotes or other things regarding this conversion, see the reply by Dan Oblinger.
I have a Django View that uses a query parameter to do some content filtering. Something like this:
/page/?filter=one+and+two
/page/?filter=one,or,two
I have noticed that Django converts the + to a space (request.GET.get('filter') returns one and two), and I´m OK with that. I just need to adjust the split() function I use in the View accordingly.
But...
When I try to test this View, and I call:
from django.test import Client
client = Client()
client.get('/page/', {'filter': 'one+and+two'})
request.GET.get('filter') returns one+and+two: with plus signs and no spaces. Why is this?
I would like to think that Client().get() mimics the browser behaviour, so what I would like to understand is why calling client.get('/page/', {'filter': 'one+and+two'}) is not like browsing to /page/?filter=one+and+two. For testing purposes it should be the same in my opinion, and in both cases the view should receive a consistent value for filter: be it with + or with spaces.
What I don´t get is why there are two different behaviours.
The plusses in a query string are the normal and correct encoding for spaces. This is a historical artifact; the form value encoding for URLs differs ever so slightly from encoding other elements in the URL.
Django is responsible for decoding the query string back to key-value pairs; that decoding includes decoding the URL percent encoding, where a + is decoded to a space.
When using the test client, you pass in unencoded data, so you'd use:
client.get('/page/', {'filter': 'one and two'})
This is then encoded to a query string for you, and subsequently decoded again when you try and access the parameters.
This is because the test client (actually, RequestFactory) runs django.utils.http.urlencode on your data, resulting in filter=one%2Band%2Btwo. Similarly, if you were to use {'filter': 'one and two'}, it would be converted to filter=one%20and%20two, and would come into your view with spaces.
If you really absolutely must have the pluses in your query string, I believe it may be possible to manually override the query string with something like: client.get('/page/', QUERY_STRING='filter=one+and+two'), but that just seems unnecessary and ugly in my opinion.