Formatting dicts and nested dicts - python

Amazon's DynamoDB requires specially formatted JSON when inserting items into the database.
I have a function that takes a dictionary and transforms values into a nested dict formatted for insertion; the value is transformed into a nested dict where the nested key is the value's data type.
For example, input like {'id':1, 'firstName':'joe'} would be transformed to {'id': {'N':1}, 'firstName': {'S':'joe'}}
This is currently successful with this function:
type_map = {
str:'S', unicode:'S', dict:'M',
float:'N', int:'N', bool:'BOOL'
}
def format_row(self, row):
""" Accepts a dict, formats for DynamoDB insertion. """
formatted = {}
for k,v in row.iteritems():
type_dict = {}
type_dict[ self.type_map[type(v)] ] = v
formatted[k] = type_dict
return formatted
I need to modify this function to handle values that might be dicts.
So, for example:
{
'id':1,
'a':{'x':'hey', 'y':1},
'b':{'x':1}
}
Should be transformed to:
{
'id': {'N':1},
'a':{'M': {'x': {'S':'hey'}, 'y':{'N':1}}},
'b': {'M': {'x': {'N':1}}}
}
I'm thinking the correct way to do this must be to call the function from within the function right?
Note: I'm using Python 2.7

What ultimately ended up working for me was the following function:
def format_row(self, row):
""" Accepts a dict, formats for DynamoDB insertion. """
formatted = {}
for k,v in row.iteritems():
if type(v) == dict:
v = self.format_row(v)
type_dict = {}
type_dict['M'] = v
formatted[k] = type_dict
else:
type_dict = {}
type_dict[ self.type_map[type(v)] ] = v
formatted[k] = type_dict
return formatted
If anyone has a better way of doing this, please let me know!

Related

Creating python objectify Element and then adding attributes

I am trying to serialize a python class named Origin, containing a dictionary as an attribute, into an xml with lxml objectify. This dictionary is initialized with the value "default" for each key.
class Origin:
def __init__(self):
dict = {"A":"default", "B":"default" ...} // my dictionnary has 6 keys actually
The dictionary is filled by first parsing a XML. Not every key is filled. For exemple: dict = {A:"hello", B:"default" ...}
I want to create my Origin XML Element with my dictionary as attribute but I don't want to have the "default" keys.
My solution is to have nested if:
ìf(self.dict["A"] != "default"):
if(self.dict["B"] != "default"):
...
objectify.Element("Origin", A=self.dict["A"], B=self.dict["B"]...)
But it's an ugly and non practical solution if you have more than one or two keys.
Is there a way to first create my Element origin = objectify.Element("Origin") and then add my dictionary keys' if there are different from "default"?
Something more dynamic, like
for key in self.dict:
if(self.dict[key] != "default"):
origin.addAttribute(key=self.dict[key])
Thank you
I would filter the dictionary to only the values that are not "default".
The dict comprehension feature is a big help here.
Example:
data = {
"A": "foo",
"B": "default",
"C": "bar",
}
data = {key: value for key, value in data.items() if value != "default"}
print(data)
Output:
{'A': 'foo', 'C': 'bar'}

Convert a list-of-dictionaries to a dictionary

I have this list of dictionaries I want to convert to one dictionary
vpcs = [{'VPCRegion': 'us-east-1', 'VPCId': '12ededd4'},
{'VPCRegion': 'us-east-1', 'VPCId': '9847'},
{'VPCRegion': 'us-west-2', 'VPCId': '99485003'}]
I want to convert it to
{'us-east-1': '12ededd4', 'us-east-1': '9847', 'us-west-2': '99485003'}
I used this function
def convert_dict(tags):
return {tag['VPCRegion']:tag['VPCId'] for tag in tags}
but get this output it doesn't convert the first dictionary in the list
{'us-east-1': '9847', 'us-west-2': '99485003'}
Perhaps a list of dictionary may fit your need - see code below:
[{'us-east-1': '12ededd4'}, {'us-east-1': '9847'}, {'us-west-2': '99485003'}]
To elaborate on what other commented about dictionary key has to be unique, you can see that in the commented line which zip up the list_dict would result error if the 'vpcs' has 2 duplicate 'VPCRegion': 'us-east-1' and successfully create new dict if you take out one of the 'VPCRegion': 'us-east-1'.
vpcs = [{'VPCRegion': 'us-east-1', 'VPCId': '12ededd4'},
{'VPCRegion': 'us-east-1', 'VPCId': '9847'},
{'VPCRegion': 'us-west-2', 'VPCId': '99485003'}]
def changekey(listofdict):
new_dict = {}
new_list = []
for member in listofdict:
new_key = member['VPCRegion']
new_val = member['VPCId']
new_dict.update({new_key:new_val})
new_list.append({new_key:new_val})
return new_dict, new_list
dict1,list_dict=changekey(vpcs)
print(dict1)
print(list_dict)
#dict4=dict(zip(*[iter(list_dict)]*2))
#print(dict4)
Since your output must group several values under the same name, your output will be a dict of lists, not a dict of strings.
One way to quickly do it:
import collections
def group_by_region(vpcs):
result = collections.defaultdict(list)
for vpc in vpcs:
result[vpc['VPCRegion']].append(vpc['VPCId'])
return result
The result of group_by_region(vpcs) will be {'us-east-1': ['12ededd4', '9847'], 'us-west-2': ['99485003']}).
As an entertainment, here's a cryptic but efficient way to get this in one expression:
import itertools
{key: [rec['VPCId'] for rec in group]
for (key, group) in itertools.groupby(vpcs, lambda vpc: vpc['VPCRegion'])}

How to properly keep structure when removing keys in JSON using python?

I'm using this as a reference: Elegant way to remove fields from nested dictionaries
I have a large number of JSON-formatted data here and we've determined a list of unnecessary keys (and all their underlying values) that we can remove.
I'm a bit new to working with JSON and Python specifically (mostly did sysadmin work) and initially thought it was just a plain dictionary of dictionaries. While some of the data looks like that, several more pieces of data consists of dictionaries of lists, which can furthermore contain more lists or dictionaries with no specific pattern.
The idea is to keep the data identical EXCEPT for the specified keys and associated values.
Test Data:
to_be_removed = ['leecher_here']
easy_modo =
{
'hello_wold':'konnichiwa sekai',
'leeching_forbidden':'wanpan kinshi',
'leecher_here':'nushiyowa'
}
lunatic_modo =
{
'hello_wold':
{'
leecher_here':'nushiyowa','goodbye_world':'aokigahara'
},
'leeching_forbidden':'wanpan kinshi',
'leecher_here':'nushiyowa',
'something_inside':
{
'hello_wold':'konnichiwa sekai',
'leeching_forbidden':'wanpan kinshi',
'leecher_here':'nushiyowa'
},
'list_o_dicts':
[
{
'hello_wold':'konnichiwa sekai',
'leeching_forbidden':'wanpan kinshi',
'leecher_here':'nushiyowa'
}
]
}
Obviously, the original question posted there isn't accounting for lists.
My code, modified appropriately to work with my requirements.
from copy import deepcopy
def remove_key(json,trash):
"""
<snip>
"""
keys_set = set(trash)
modified_dict = {}
if isinstance(json,dict):
for key, value in json.items():
if key not in keys_set:
if isinstance(value, dict):
modified_dict[key] = remove_key(value, keys_set)
elif isinstance(value,list):
for ele in value:
modified_dict[key] = remove_key(ele,trash)
else:
modified_dict[key] = deepcopy(value)
return modified_dict
I'm sure something's messing with the structure since it doesn't pass the test I wrote since the expected data is exactly the same, minus the removed keys. The test shows that, yes it's properly removing the data but for the parts where it's supposed to be a list of dictionaries, it's only getting returned as a dictionary instead which will have unfortunate implications down the line.
I'm sure it's because the function returns a dictionary but I don't know to proceed from here in order to maintain the structure.
At this point, I'm needing help on what I could have overlooked.
When you go through your json file, you only need to determine whether it is a list, a dict or neither. Here is a recursive way to modify your input dict in place:
def remove_key(d, trash=None):
if not trash: trash = []
if isinstance(d,dict):
keys = [k for k in d]
for key in keys:
if any(key==s for s in trash):
del d[key]
for value in d.values():
remove_key(value, trash)
elif isinstance(d,list):
for value in d:
remove_key(value, trash)
remove_key(lunatic_modo,to_be_removed)
remove_key(easy_modo,to_be_removed)
Result:
{
"hello_wold": {
"goodbye_world": "aokigahara"
},
"leeching_forbidden": "wanpan kinshi",
"something_inside": {
"hello_wold": "konnichiwa sekai",
"leeching_forbidden": "wanpan kinshi"
},
"list_o_dicts": [
{
"hello_wold": "konnichiwa sekai",
"leeching_forbidden": "wanpan kinshi"
}
]
}
{
"hello_wold": "konnichiwa sekai",
"leeching_forbidden": "wanpan kinshi"
}

PyYAML replace dash in keys with underscore

I would like to map directly some configuration parameters from YAML into Python argument names. Just wondering if there is a way without writing extra-code (to modify keys afterwards) to let YAML parser replace dash '-' in a key with an underscore '_'.
some-parameter: xyz
some-other-parameter: 123
Should become when parsed with PyYAML (or may be other lib) a dictionary with values:
{'some_parameter': 'xyz', 'some_other_parameter': 123}
Than I can pass the dictionary to a function as named parameters:
foo(**parsed_data)
I know I can iterate through the keys afterwards and modify their values, but I don't want to do that :)
At least for your stated case, you don't need to transform the keys. Given:
import pprint
def foo(**kwargs):
print 'KWARGS:', pprint.pformat(kwargs)
If you set:
values = {
'some-parameter': 'xyz',
'some-other-parameter': 123,
}
And then call:
foo(**values)
You get:
KWARGS: {'some-other-parameter': 123, 'some-parameter': 'xyz'}
If you goal is is actually to call a function like this:
def foo(some_parameter=None, some_other_parameter=None):
pass
Then sure, you would need to map the key names. But you could just do this:
foo(**dict((k.replace('-','_'),v) for k,v in values.items()))
I think I found a solution: There is a package which is called yconf: https://pypi.python.org/pypi/yconf
I can map there values and work with it using the well-known argparse-interface:
config.yml
logging:
log-level: debug
Argparse like definition:
parser.add_argument("--log-level", dest="logging.log-level")
Well, If you parse the YAML file as a python dictionary, you can use the following code to convert all dash (inside all nested dictionaries and arrays) with dash.
def hyphen_to_underscore(dictionary):
"""
Takes an Array or dictionary and replace all the hyphen('-') in any of its keys with a underscore('_')
:param dictionary:
:return: the same object with all hyphens replaced by underscore
"""
# By default return the same object
final_dict = dictionary
# for Array perform this method on every object
if type(dictionary) is type([]):
final_dict = []
for item in dictionary:
final_dict.append(hyphen_to_underscore(item))
# for dictionary traverse all the keys and replace hyphen with underscore
elif type(dictionary) is type({}):
final_dict = {}
for k, v in dictionary.items():
# If there is a sub dictionary or an array perform this method of it recursively
if type(dictionary[k]) is type({}) or type(dictionary[k]) is type([]):
value = hyphen_to_underscore(v)
final_dict[k.replace('-', '_')] = value
else:
final_dict[k.replace('-', '_')] = v
return final_dict
Here is a sample usage
customer_information = {
"first-name":"Farhan",
"last-name":"Haider",
"address":[{
"address-line-1": "Blue Mall",
"address-line-2": None,
"address-type": "Work"
},{
"address-line-1": "DHA",
"address-line-2": "24-H",
"address-type": "Home"
}],
"driver_license":{
"number": "209384092834",
"state-region": "AB"
}
}
print(hyphen_to_underscore(customer_information))
# {'first_name': 'Farhan', 'last_name': 'Haider', 'address': [{'address_line_1': 'Blue Mall', 'address_line_2': None, 'address_type': 'Work'}, {'address_line_1': 'DHA', 'address_line_2': '24-H', 'address_type': 'Home'}], 'driver_license': {'number': '209384092834', 'state_region': 'AB'}}

Ordered attributes in retrieving documents from mongo with pymongo?

When I store the following document into mongo, something like:
{
name: Somename,
profile: Someprofile
}
When I use a find_one():
I get a result of something like:
{
profile: Someprofile,
_id: 35353432326532(random mongo id),
name: Somename
}
Is there some way in python such that when I do something before or after find_one such that I can get a result in a json string that is ordered like:
{
_id: 35353432326532(random mongo id),
name: Somename,
profile: Someprofile
}
I tried using an OrderedDict like below, but it does not seem to help.
somedocument = db.mycollection
theordereddict = OrderedDict(data_layer.find_one())
print str(theordereddict)
How do I get my output string in the right order in regards to attributes? Is this order dictated by something else before I even insert the document into the database?
collections.OrderedDict doesn't order keys it just preserves order, you need to insert keys into it in the order you want to retrieve them.
d = data_layer.find_one()
def key_function(tuple):
"""This defines the sort order for the sorted builtin"""
return tuple[0]
sorted_dict = collections.OrderedDict((k,v) for k, v in sorted(d.items(),
key=key_function))
That said, it looks like print str(sorted_dict) doesn't give you the output you want. I think you need to build your sorted string representation manually. E.g.:
s = "{" + ",".join(["%s:%s" for k,v in sorted(d.items(), key=key_function)]) + "}"
Basically the same as #Mike Steder's answer but maybe less fancy and more clear:
import json
from collections import OrderedDict
theordereddict = OrderedDict()
d = data_layer.find_one()
for k in sorted(d.keys()):
theordereddict[k] = d[k]
json.dumps(theordereddict)

Categories

Resources