Resorting a nested json into a dictionary - python

I have a JSON string in the following format that i get from an API and need to reformat it, so that I can check the difference between two lookups (which settings are different in the different modules):
{ "modules": [
{
"name": "A1",
"bar" : "AA",
"settings" :[
{
"name" : "set1",
"value" : "1"
},
{
"name" : "set2",
"value" : "2"
}
]
},
{
"name": "A2",
"bar" : "DD",
"settings" :[
{
"name" : "set1",
"value" : "A21"
}
]
},
{
"name": "A1",
"settings" :[
{
"name" : "set3",
"value" : "1"
}
]
}
]
}
and need to get it into a dictionary of the format
'A1' : {
'bar' : 'AA',
'settings': {
'set1' : '1',
'set2' : '2',
'set3' : '1'
}....
is there any nicer, easier way to do this than, assuming I have read the string from above in a dictionary json_dict
modules_a = { module['name'] : { 'bar' : module['bar'], 'settings' : {}} for module in json_dict['modules']
for module in json_dict['modules']:
modules_a[module['name']]['settings'].update( s['name']: s['value'] for s in module['settings'] )

you have some errors in the input, you missed a comma after bar. Here is a more readable version:
# First, merge together the modules with the same names
concatenated_json = {'modules': []}
reference_dict = dict()
for module in json["modules"]:
# Check whether module has a 'bar' and whether it has already been mentioned
if module.get("bar") is not None and reference_dict.get(module["bar"]) is None:
# Module has not been mentioned yet, add it to the fixed dict and note its reference
concatenated_json['modules'].append(module)
reference_dict[module["name"]] = module
else:
# Append to settings of a previously mentioned module
reference_dict[module["name"]]["settings"] += module["settings"]
json = concatenated_json
# Format dict in a required way
modules_a = {
module["name"]:{
"bar": module["bar"],
"settings": {
setting["name"]: setting["value"] for setting in module["settings"]
}
}
for module in json["modules"]
}

Here's a way to do it, although I'm not sure what you meant about "resorting".
# Preallocate result.
modules_a = {module['name']: {'settings': []} for module in json_dict['modules']}
for module in json_dict['modules']:
obj = modules_a[module['name']]
obj.update({k: v for k, v in module.items() if k != 'settings'})
# Accumulate 'settings' in a list.
obj['settings'].extend([{setting['name']: setting['value']}
for setting in module['settings'] ])
import json
print(json.dumps(modules_a, indent=4))
Result:
{
"A1": {
"settings": [
{
"set1": "1"
},
{
"set2": "2"
},
{
"set3": "1"
}
],
"bar": "AA",
"name": "A1"
},
"A2": {
"settings": [
{
"set1": "A21"
}
],
"bar": "DD",
"name": "A2"
}
}

Related

Basic request to mongodb with pymongo

I need to get all objects inside "posts" that have "published: true"
with pymongo. I've tried already so many variants but all I can do:
for elt in db[collection].find({}, {"posts"}):
print(elt)
And it'll show all "posts". I've tried smth like this:
for elt in db[collection].find({}, {"posts", {"published": {"$eq": True}}}):
print(elt)
But it doesn't work. Help, I'm trying for 3 days already =\
What you want to be doing is to use the aggregate $filter like so:
db[collection].aggregate([
{
"$match": { // only fetch documents with such posts
"posts.published": {"$eq": True}
}
},
{
"$project": {
"posts": {
"$filter": {
"input": "$posts",
"as": "post",
"cond": {"$eq": ["$$post.published", True]}
}
}
}
}
])
Note that the currenct structure returned will be:
[
{posts: [post1, post2]},
{posts: [post3, post4]}
]
If you want to retrieve it as a list of posts you'll need to add an $unwind stage to flatten the array.
The query options are quite limited you can do it with $elemMatch (projection) or with the $ operator but both of these return only the first post that matches the condition which is not what you want.
------- EDIT --------
Realizing posts is actually an object and not an array, you'll have to turn the object to an array, iterate over to filter and then restore the structure like so:
db.collection.aggregate([
{
$project: {
"posts": {
"$arrayToObject": {
$filter: {
input: {
"$objectToArray": "$posts"
},
as: "post",
cond: {
$eq: [
"$$post.v.published",
true
]
}
}
}
}
}
}
])
Mongo Playground
What I assumed that your document looks like this,
{
"_id" : ObjectId("5f8570f8afdefd2cfe7473a7"),
"posts" : {
"a" : {
"p" : false,
"name" : "abhishek"
},
"k" : {
"p" : true,
"name" : "jack"
},
"c" : {
"p" : true,
"name" : "abhinav"
}
}}
You can try the following query but the result format will be a bit different, adding that for clarification,
db.getCollection('temp2').aggregate([
{
$project: {
subPost: { $objectToArray: "$posts" }
}
},
{
'$unwind' : '$subPost'
},
{
'$match' : {'subPost.v.p':true}
},
{
'$group': {_id:'$_id', subPosts: { $push: { subPost: "$subPost"} }}
}
])
result format,
{
"_id" : ObjectId("5f8570f8afdefd2cfe7473a7"),
"subPosts" : [
{
"subPost" : {
"k" : "k",
"v" : {
"p" : true,
"name" : "jack"
}
}
},
{
"subPost" : {
"k" : "c",
"v" : {
"p" : true,
"name" : "abhinav"
}
}
}
]
}

What is the most Pythonic way to sort a dictionary with string number keys numerically?

Here is my format, dict:
{
"server" : {
"2" : {
name : "Chris",
text : "Hello!"
},
"1" : {
name : "David",
text : "Hey!"
}
}
}
How can I sort dict['server'] numerically, although the keys are strings? All ways that I can come up with require multiple names and don't feel very Pythonic at all.
Using collections.OrderedDict
Ex:
import collections
d = {
"server" : {
"2" : {
"name" : "Chris",
"text" : "Hello!"
},
"1" : {
"name" : "David",
"text" : "Hey!"
}
}
}
print collections.OrderedDict(sorted(d["server"].items(), key=lambda x: int(x[0])))
output:
OrderedDict([('1', {'text': 'Hey!', 'name': 'David'}), ('2', {'text': 'Hello!', 'name': 'Chris'})])

Filtering out desired data from a JSON file (Python)

this is a sample of my json file:
{
"pops": [{
"name": "pop_a",
"subnets": {
"Public": ["1.1.1.0/24,2.2.2.0/24"],
"Private": ["192.168.0.0/24,192.168.1.0/24"],
"more DATA":""
}
},
{
"name": "pop_b",
"subnets": {
"Public": ["3.3.3.0/24,4.4.4.0/24"],
"Private": ["192.168.2.0/24,192.168.3.0/24"],
"more DATA":""
}
}
]
}
after i read it, i want to make a dic object and store some of the things that i need from this file.
i want my object to be like this ..
[{
"name": "pop_a",
"subnets": {"Public": ["1.1.1.0/24,2.2.2.0/24"],"Private": ["192.168.0.0/24,192.168.1.0/24"]}
},
{
"name": "pop_b",
"subnets": {"Public": ["3.3.3.0/24,4.4.4.0/24"],"Private": ["192.168.2.0/24,192.168.3.0/24"]}
}]
then i want to be able to access some of the public/private values
here is what i tried, and i know there is update(), setdefault() that gave also same unwanted results
def my_funckion():
nt_json = [{'name':"",'subnets':[]}]
Pname = []
Psubnet= []
for pop in pop_json['pops']: # it print only the last key/value
nt_json[0]['name']= pop['name']
nt_json[0]['subnet'] = pop['subnet']
pprint (nt_json)
for pop in pop_json['pops']:
"""
it print the names in a row then all of the ipss
"""
Pname.append(pop['name'])
Pgre.append(pop['subnet'])
nt_json['pop_name'] = Pname
nt_json['subnet']= Psubnet
pprint (nt_json)
Here's a quick solution using list comprehension. Note that this approach can be taken only with enough knowledge of the json structure.
>>> import json
>>>
>>> data = ... # your data
>>> new_data = [{ "name" : x["name"], "subnets" : {"Public" : x["subnets"]["Public"], "Private" : x["subnets"]["Private"]}} for x in data["pops"]]
>>>
>>> print(json.dumps(new_data, indent=2))
[
{
"name": "pop_a",
"subnets": {
"Private": [
"192.168.0.0/24,192.168.1.0/24"
],
"Public": [
"1.1.1.0/24,2.2.2.0/24"
]
}
},
{
"name": "pop_b",
"subnets": {
"Private": [
"192.168.2.0/24,192.168.3.0/24"
],
"Public": [
"3.3.3.0/24,4.4.4.0/24"
]
}
}
]

Python json schema that extracts parameters

I need to parse requests to a single url that are coming in JSON, but in several different formats. For example, some have timestamp noted as timestamp attr, others as unixtime etc. So i want to create json schemas for all types of requests that not only validate incoming JSONs but also extract their parameters from specified places. Is there a library that can do that?
Example:
If I could define a schema that would look something like this
schema = {
"type" : "object",
"properties" : {
"price" : {
"type" : "number",
"mapped_name": "product_price"
},
"name" : {
"type" : "string",
"mapped_name": "product_name"
},
"added_at":{
"type" : "int",
"mapped_name": "timestamp"
},
},
}
and then apply it to a dict
request = {
"name" : "Eggs",
"price" : 34.99,
'added_at': 1234567
}
by some magical function
params = validate_and_extract(request, schema)
I want params to have mapped values there:
{"mapped_name": "Eggs", "product_price": 34.99, "timestamp": 1234567}
so this is a module I'm looking for. And it should support nested dicts in request, not just flat dicts.
The following code may help. It supports nested dict as well.
import json
def valid_type(type_name, obj):
if type_name == "number":
return isinstance(obj, int) or isinstance(obj, float)
if type_name == "int":
return isinstance(obj, int)
if type_name == "float":
return isinstance(obj, float)
if type_name == "string":
return isinstance(obj, str)
def validate_and_extract(request, schema):
''' Validate request (dict) against the schema (dict).
Validation is limited to naming and type information.
No check is done to ensure all elements in schema
are present in the request. This could be enhanced by
specifying mandatory/optional/conditional information
within the schema and subsequently checking for that.
'''
out = {}
for k, v in request.items():
if k not in schema['properties'].keys():
print("Key '{}' not in schema ... skipping.".format(k))
continue
if schema['properties'][k]['type'] == 'object':
v = validate_and_extract(v, schema['properties'][k])
elif not valid_type(schema['properties'][k]['type'], v):
print("Wrong type for '{}' ... skipping.".format(k))
continue
out[schema['properties'][k]['mapped_name']] = v
return out
# Sample Data 1
schema1 = {
"type" : "object",
"properties" : {
"price" : {
"type" : "number",
"mapped_name": "product_price"
},
"name" : {
"type" : "string",
"mapped_name": "product_name"
},
"added_at":{
"type" : "int",
"mapped_name": "timestamp"
},
},
}
request1 = {
"name" : "Eggs",
"price" : 34.99,
'added_at': 1234567
}
# Sample Data 2: containing nested dict
schema2 = {
"type" : "object",
"properties" : {
"price" : {
"type" : "number",
"mapped_name": "product_price"
},
"name" : {
"type" : "string",
"mapped_name": "product_name"
},
"added_at":{
"type" : "int",
"mapped_name": "timestamp"
},
"discount":{
"type" : "object",
"mapped_name": "offer",
"properties" : {
"percent": {
"type" : "int",
"mapped_name": "percentage"
},
"last_date": {
"type" : "string",
"mapped_name": "end_date"
},
}
},
},
}
request2 = {
"name" : "Eggs",
"price" : 34.99,
'added_at': 1234567,
'discount' : {
'percent' : 40,
'last_date' : '2016-09-25'
}
}
params = validate_and_extract(request1, schema1)
print(params)
params = validate_and_extract(request2, schema2)
print(params)
Output from running this:
{'timestamp': 1234567, 'product_name': 'Eggs', 'product_price': 34.99}
{'offer': {'percentage': 40, 'end_date': '2016-09-25'}, 'timestamp': 1234567, 'product_name': 'Eggs', 'product_price': 34.99}
See http://json-schema.org
This doesn't look like a Python question.

add a new value to array while keeping existing one

I have below an example of data:
{
"id": "2",
"items":
{
"3" : { "blocks" : { "3" : { "txt" : 'xx' } } },
"4" : { "blocks" : { "1" : { "txt" : 'yy'}, "2" : { "txt" : 'zz'} } }
}
}
I want to make it so that it looks like the below example data. Simply append a new value to items.3.blocks.3.txt while keeping existing value of it:
{
"id": "2",
"items":
{
"3" : { "blocks" : { "3" : { "txt" : 'xx, tt' } } },
"4" : { "blocks" : { "1" : { "txt" : 'yy'}, "2" : { "txt" : 'zz'} } }
}
}
I run below but did not make any difference
dbx.test.update({"_id": ObjectId("5192264c02a03e374e67d7be")}, {'$addToSet': {'items.3.blocks.0.txt': 'tt'}}, )
what should be correct syntax, any help is appreciated...
regards
You need to use '$push' instead of '$addToSet':
dbx.test.update({"_id": ObjectId("5192264c02a03e374e67d7be")}, {'$push': {'items.3.blocks.0.txt': 'tt'}}, )
I did not realize that this was a mongodb|pymongo question to start with.
a = {
"id": "2",
"items": {
"3" : { "blocks" : { "3" : { "txt" : 'xx' } } },
"4" : { "blocks" : { "1" : { "txt" : 'yy'}, "2" : { "txt" : 'zz'} } }
}
}
a['items']['3']['blocks']['3']['txt'] += ', yy'
?
Shorter answer on how to modify dicts:
a = {'moo' : 'cow', 'quack' : 'duck'}
a['moo'] = 'bull'
a['quack'] = 'duckling'
print(str(a))
if a['moo'] == 'bull':
print 'Yes the bull says moo'
If your data is a JSON string you first need to convert it from a JSON-string to a dictionary in Python, do so by:
import json
a = json.loads(<your string goes here>)
is simple, for will work with $addToSet need the data in arrays [ ] but you have the data in {"key":"value"}, but you have { "blocks" : { "1" : { "txt" : 'yy'}, "2" : { "txt" : 'zz'} } }, would recommend work in Arrays, Example
"items":
[
{ "algo" : { "blocks" : { "txt" : ['xx']} } } ,
{ "algoo" : { "blocks" : { "txt" : ['yy','zz']} } }
]
db.foo.update({"_id":1,"items.algo.blocks.txt":'xx'},{$addToSet:{"items.$.algo.blocks.txt":'tt'}});
and no use numbers in fields name.
{
"_id" : 1,
"items" : [
{
"algo" : {
"blocks" : {
"txt" : [
"xx",
"tt"
]
}
}
},
{
"algoo" : {
"blocks" : {
"txt" : [
"yy",
"zz"
]
}
}
}
]
}

Categories

Resources