Ordered attributes in retrieving documents from mongo with pymongo? - python

When I store the following document into mongo, something like:
{
name: Somename,
profile: Someprofile
}
When I use a find_one():
I get a result of something like:
{
profile: Someprofile,
_id: 35353432326532(random mongo id),
name: Somename
}
Is there some way in python such that when I do something before or after find_one such that I can get a result in a json string that is ordered like:
{
_id: 35353432326532(random mongo id),
name: Somename,
profile: Someprofile
}
I tried using an OrderedDict like below, but it does not seem to help.
somedocument = db.mycollection
theordereddict = OrderedDict(data_layer.find_one())
print str(theordereddict)
How do I get my output string in the right order in regards to attributes? Is this order dictated by something else before I even insert the document into the database?

collections.OrderedDict doesn't order keys it just preserves order, you need to insert keys into it in the order you want to retrieve them.
d = data_layer.find_one()
def key_function(tuple):
"""This defines the sort order for the sorted builtin"""
return tuple[0]
sorted_dict = collections.OrderedDict((k,v) for k, v in sorted(d.items(),
key=key_function))
That said, it looks like print str(sorted_dict) doesn't give you the output you want. I think you need to build your sorted string representation manually. E.g.:
s = "{" + ",".join(["%s:%s" for k,v in sorted(d.items(), key=key_function)]) + "}"

Basically the same as #Mike Steder's answer but maybe less fancy and more clear:
import json
from collections import OrderedDict
theordereddict = OrderedDict()
d = data_layer.find_one()
for k in sorted(d.keys()):
theordereddict[k] = d[k]
json.dumps(theordereddict)

Related

Formatting dicts and nested dicts

Amazon's DynamoDB requires specially formatted JSON when inserting items into the database.
I have a function that takes a dictionary and transforms values into a nested dict formatted for insertion; the value is transformed into a nested dict where the nested key is the value's data type.
For example, input like {'id':1, 'firstName':'joe'} would be transformed to {'id': {'N':1}, 'firstName': {'S':'joe'}}
This is currently successful with this function:
type_map = {
str:'S', unicode:'S', dict:'M',
float:'N', int:'N', bool:'BOOL'
}
def format_row(self, row):
""" Accepts a dict, formats for DynamoDB insertion. """
formatted = {}
for k,v in row.iteritems():
type_dict = {}
type_dict[ self.type_map[type(v)] ] = v
formatted[k] = type_dict
return formatted
I need to modify this function to handle values that might be dicts.
So, for example:
{
'id':1,
'a':{'x':'hey', 'y':1},
'b':{'x':1}
}
Should be transformed to:
{
'id': {'N':1},
'a':{'M': {'x': {'S':'hey'}, 'y':{'N':1}}},
'b': {'M': {'x': {'N':1}}}
}
I'm thinking the correct way to do this must be to call the function from within the function right?
Note: I'm using Python 2.7
What ultimately ended up working for me was the following function:
def format_row(self, row):
""" Accepts a dict, formats for DynamoDB insertion. """
formatted = {}
for k,v in row.iteritems():
if type(v) == dict:
v = self.format_row(v)
type_dict = {}
type_dict['M'] = v
formatted[k] = type_dict
else:
type_dict = {}
type_dict[ self.type_map[type(v)] ] = v
formatted[k] = type_dict
return formatted
If anyone has a better way of doing this, please let me know!

json query that returns parent element and child data?

Given the following json:
{
"README.rst": {
"_status": {
"md5": "952ee56fa6ce36c752117e79cc381df8"
}
},
"docs/conf.py": {
"_status": {
"md5": "6e9c7d805a1d33f0719b14fe28554ab1"
}
}
}
is there a query language that can produce:
{
"README.rst": "952ee56fa6ce36c752117e79cc381df8",
"docs/conf.py": "6e9c7d805a1d33f0719b14fe28554ab1",
}
My best attempt so far with JMESPath (http://jmespath.org/) isn't very close:
>>> jmespath.search('*.*.md5[]', db)
['952ee56fa6ce36c752117e79cc381df8', '6e9c7d805a1d33f0719b14fe28554ab1']
I've gotten to the same point with ObjectPath (http://objectpath.org):
>>> t = Tree(db)
>>> list(t.execute('$..md5'))
['952ee56fa6ce36c752117e79cc381df8', '6e9c7d805a1d33f0719b14fe28554ab1']
I couldn't make any sense of JSONiq (do I really need to read a 105 page manual to do this?) This is my first time looking at json query languages..
not sure why you want a query language this is pretty easy
def find_key(data,key="md5"):
for k,v in data.items():
if k== key: return v
if isinstance(v,dict):
result = find_key(v,key)
if result:return result
dict((k,find_key(v,"md5")) for k,v in json_result.items())
it's even easier if the value dict always has "_status" and "md5" as keys
dict((k,v["_status"]["md5"]) for k,v in json_result.items())
alternatively I think you could do something like
t = Tree(db)
>>> dict(zip(t.execute("$."),t.execute('$..md5'))
although I dont know that it would match them up quite right ...
Here is the JSONiq code that does the job:
{|
for $key in keys($document)
return {
$key: $document.$key._status.md5
}
|}
You can execute it here with the Zorba engine.
If the 105-page manual you mention is the specification, I do not recommend reading it as a JSONiq user. I would rather advise reading tutorials or books online, which give a more gentle introduction.
Do in ObjectPath:
l = op.execute("[keys($.*), $..md5]")
you'll get:
[
[
"README.rst",
"docs/conf.py"
],
[
"952ee56fa6ce36c752117e79cc381df8",
"6e9c7d805a1d33f0719b14fe28554ab1"
]
]
then in Python:
dict(zip(l[0],l[1]))
to get:
{
'README.rst': '952ee56fa6ce36c752117e79cc381df8',
'docs/conf.py': '6e9c7d805a1d33f0719b14fe28554ab1'
}
Hope that helps. :)
PS. I'm using OPs' keys() to show how to make full query that works anywhere in the document not only when keys are in the root of document.
PS2. I might add new function so that it would look like: object([keys($.*), $..md5]). Shoot me tweet http://twitter.com/adriankal if you want that.
Missed the python requirement, but if you are willing to call external program, this will still work.
Please note, that jq >= 1.5 is required for this to work.
# If single "key" $p[0] has multiple md5 keys, this will reduce the array to one key.
cat /tmp/test.json | \
jq-1.5 '[paths(has("md5")?) as $p | { ($p[0]): getpath($p)["md5"]}] | add '
# this will not create single object, but you'll see all key, md5 combinations
cat /tmp/test.json | \
jq-1.5 '[paths(has("md5")?) as $p | { ($p[0]): getpath($p)["md5"]}] '
Get paths with "md5"-key '?'=ignore errors (like testing scalar for key). From resulting paths ($p) filter and surround result with '{}' = object. And then those are in an array ([] surrounding the whole expression) which is then "added/merged" together |add
https://stedolan.github.io/jq/
A solution that implements a new query language:
def keylist(db):
"Return all the keys in db."
def _keylist(db, prefix, res):
if prefix is None:
prefix = []
for key, val in db.items():
if isinstance(val, dict):
_keylist(val, prefix + [key], res)
else:
res.append(prefix + [key])
res = []
_keylist(db, [], res)
return ['::'.join(key) for key in res]
def get_key(db, key):
"Get path and value from key."
def _get_key(db, key, path):
k = key[0]
if len(key) == 1:
return path + [k, db[k]]
return _get_key(db[k], key[1:], path + [k])
return _get_key(db, key, [])
def search(query, db):
"Convert query to regex and use it to search key space."
keys = keylist(db)
query = query.replace('*', r'(?:.*?)')
matching = [key for key in keys if re.match(query, key)]
res = [get_key(db, key.split('::')) for key in matching]
return dict(('::'.join(r[:-1]), r[-1]) for r in res)
which gives me something that's pretty close to the requirements:
>>> pprint.pprint(search("*::md5", db))
{'README.rst::_status::md5': '952ee56fa6ce36c752117e79cc381df8',
'docs/conf.py::_status::md5': '6e9c7d805a1d33f0719b14fe28554ab1'}
and a query language that looks like a glob/re hybrid (if we're making a new language, at least make it look familiar):
>>> pprint.pprint(search("docs*::md5", db))
{'docs/conf.py::_status::md5': '6e9c7d805a1d33f0719b14fe28554ab1'}
since the data contains file paths I've randomly used :: as a path separator. (I'm pretty sure it doesn't handle the full json grammar yet, but that should be mostly grunt work).
If your json is well structured, ie. assured you'll have _status and md5 sub-elements, you could just load the json up and use a list comprehension to spit out the items you're looking for.
>>> import json
>>> my_json = json.loads(json_string)
>>> print [(key, value['_status']['md5']) for key, value in my_json.iteritems()]
[(u'README.rst', u'952ee56fa6ce36c752117e79cc381df8'), (u'docs/conf.py', u'6e9c7d805a1d33f0719b14fe28554ab1')]

Python `dict` indexed by tuple: Getting a slice of the pie

Let's say I have
my_dict = {
("airport", "London"): "Heathrow",
("airport", "Tokyo"): "Narita",
("hipsters", "London"): "Soho"
}
What is an efficient (no scanning of all keys), yet elegant way to get all airports out of this dictionary, i.e. expected output ["Heathrow", "Narita"]. In databases that can index by tuples, it's usually possible to do something like
airports = my_dict.get(("airport",*))
(but usually only with the 'stars' sitting at the rightmost places in the tuple since the index usually is only stored in one order).
Since I imagine Python to index dictionary with tuple keys in a similar way (using the keys's inherent order), I imagine there might be a method I could use to slice the index this way?
Edit1: Added expected output
Edit2: Removed last phrase. Added '(no scanning of all keys)' to the conditions to make it clearer.
The way your data is currently organized doesn't allow efficient lookup - essentially you have to scan all the keys.
Dictionaries are hash tables behind the scenes, and the only way to access a value is to get the hash of the key - and for that, you need the whole key.
Use a nested hierarchy like this, so you can do a direct O(1) lookup:
my_dict = {
"airport": {
"London": "Heathrow",
"Tokyo": "Narita",
},
"hipsters": {
"London": "Soho"
}
}
Check "airport" is present in the every key in the dictionary.
Demo:
>>> [value for key, value in my_dict.items() if "airport" in key]
['Narita', 'Heathrow']
>>>
Yes, Nested dictionary will be better option.
>>> my_dict = {
... "airport": {
... "London": "Heathrow",
... "Tokyo": "Narita",
... },
... "hipsters": {
... "London": "Soho"
... }
... }
>>>
>>> if "airport" in my_dict:
... result = my_dict["airport"].values()
... else:
... result = []
...
>>> print result
['Heathrow', 'Narita']
>>>
What I'd like to avoid, if possible, is to go through all dictionary keys and filter them down.
Why? Why do you think Python is doing the equivalent of a DB full table scan? Filtering a dictionary does not mean sequential scanning it.
Python:
[value for key, value in my_dict.items() if key[0] == "airport"]
Output:
['Narita', 'Heathrow']

Retrieve a key-value pair from a dict as another dict

I have a dictionary that looks like:
{u'message': u'Approved', u'reference': u'A71E7A739E24', u'success': True}
I would like to retrieve the key-value pair for reference, i.e. { 'reference' : 'A71E7A739E24' }.
I'm trying to do this using iteritems which does return k, v pairs, and then I'm adding them to a new dictionary. But then, the resulting value is unicode rather than str for some reason and I'm not sure if this is the most straightforward way to do it:
ref = {}
for k, v in charge.iteritems():
if k == 'reference':
ref['reference'] = v
print ref
{'reference': u'A71E7A739E24'}
Is there a built-in way to do this more easily? Or, at least, to avoid using iteritems and simply return:
{ 'reference' : 'A71E7A739E24' }
The trouble with using iteritems is that you increase lookup time to O(n) where n is dictionary size, because you are no longer using a hash table
If you only need to get one key-value pair, it's as simple as
ref = { key: d[key] }
If there may be multiple pairs that are selected by some condition,
either use dict from iterable constructor (the 2nd version is better if your condition depends on values, too):
ref = dict(k,d[k] for k in charge if <condition>)
ref = dict(k,v for k,v in charge.iteritems() if <condition>)
or (since 2.7) a dict comprehension (which is syntactic sugar for the above):
ref = {k,d[k] for k in charge if <condition>}
<same as above>
I dont understand the question:
is this what you are trying to do:
ref={'reference',charge["reference"]}

Python : convert unicode string to raw object/text

I've got a set of key, value pairs dictionary in my Django application. The value in the dictionary is a string type.
{u'question': u'forms.CharField(max_length=512)'}
I need to convert this "value" string to an actual object, and get something like this.
properties = {
'question' : forms.CharField(max_lenth=512)
}
Notice that values in the second dictionary are actual Django form fields and NOT strings. I need to do this manipulation to create dynamic forms. The second dictionary is to be passed to "type" built-in function. Sample code can be found on this page. http://dougalmatthews.com/articles/2009/dec/16/nicer-dynamic-forms-django/ .
If you modify your representation a bit:
fields = {u'question': u'{"field": "django.forms.CharField", "params": {"max_length": 512}}'}
then you can use the following:
from django.utils import importlib, simplejson
def get_field(fname):
module, name = fname.rsplit('.', 1)
return getattr(importlib.import_module(module), name)
print dict((k.encode('ascii', 'ignore'), get_field(v['field'])(**v['params']))
for k, v in ((k, simplejson.loads(v)) for k, v in fields.iteritems()))
Following your code, I suggest to separate field name from field attrs:
my_fields = {u'question': {'name': 'CharField', 'attrs': {'max_length': 512} }}
and then something like:
properties = {}
for field_name, field_def in my_fields.items():
properties[field_name] = getattr(forms, field_def['name'])(**field_def['attrs'])
EDIT: Based on clarifying comments by the OP, this isn't an appropriate solution.
I don't know the constraints you are under, but perhaps the following representation will work better:
{u'question': lambda: forms.CharField(max_length=512)}
You can then "realise" the fields thus:
dict((k, v()) for (k, v) in props.iteritems())

Categories

Resources