Python: tree like implementation of dict datastructure [closed] - python

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 9 years ago.
I have dict object something similar to like this,
topo = {
'name' : 'm0',
'children' : [{
'name' : 'm1',
'children' : []
}, {
'name' : 'm2',
'children' : []
}, {
'name' : 'm3',
'children' : []
}]
}
Now, i want to insert one more dict object let's say,
{
'name' : 'ABC',
'children' : []
}
as child of dict named "m2" inside m2's children array.
Could you please suggest how should i go about it?
Should i go for a separate data structure implementation ?

I would suggest you first convert it to a data structure like this:
topo = {
'm0' : {
'm1' : {},
'm2' : {},
'm3' : {},
},
}
That is, you have made every value for the 'name' key be a key in a dictionary, and every value for the 'children' key be the value for that key, and changed it to a dictionary instead of a list.
Now you don't need to assume beforehand the index position where m2 is found. You do need to know that m2 is inside m0, but then you can simply say
topo['m0']['m2']['ABC'] = {}
You can convert between formats with this code:
def verbose_to_compact(verbose):
return { item['name']: verbose_to_compact(item['children']) for item in verbose }
def compact_to_verbose(compact):
return [{'name':key, 'children':compact_to_verbose(value)} for key, value in compact]
Call them like this
compact_topo = verbose_to_compact([topo]) # function expects list; make one-item list
verbose_topo = compact_to_verbose(compact_topo)[0] # function returns list; extract the single item
I am assuming the format you have is the direct interpretation of some file format. You can read it in that way, convert it, work with it in the compact format, and then just convert it back when you need to write it out to a file again.

Your issue is a common Tree structure, you can consider to use http://pythonhosted.org/ete2/tutorial/tutorial_trees.html and populate each node with your dict value (don't reinvent the wheel).

Add it in the dictionary as you normally would, with the use of .append():
topo['children'][1]['children'].append({'name' : 'ABC', 'children' : []})
topo is now:
{
"name": "m0",
"children": [
{
"name": "m1",
"children": []
},
{
"name": "m2",
"children": [
{
"name": "ABC",
"children": []
}
]
},
{
"name": "m3",
"children": []
}
]
}

topo['children'][1]['children'].append({'name': 'ABC', 'children': []})
This adds the new dictionary under the children of the second child of topo:
{'children': [{'children': [], 'name': 'm1'},
{'children': [{'children': [], 'name': 'ABC'}], 'name': 'm2'},
{'children': [], 'name': 'm2'}],
'name': 'm0'}
But I would not use dict and list builtin objects for such a task - I'd rather create my own objects.

topo['children'].append({'name' : 'ABC',
'children' : []
})

Related

Change value in nested JSON string

I want to process a nested JSON with an Azure Function and Python. The JSON has more than 3 sometimes up to 10 or more nested layers. this is a simple example of a JSON passed to the function:
[{
"node1": {
"tattr": {
"usqx": "qhi123"
},
"x-child": [{
"el1": "ast",
"tattr": {
"usx": "xht",
"ust": "cr12"
},
"x-child": [{
"el1": "asx",
"tattr": {
"usx": "md0"
},
"x-child": [{
"el1": "ast",
"tattr": {
"usx": "mdw"
},
"x-child": [{
"el1": "ast",
"tattr": {
"usx": "mdh"
},
"x-child": [{
"el1": "ast",
"x-child": "placeholder_a"
}]
}, {
"el1": "ast",
"tattr": {
"usx": "ssq"
},
"x-child": "placeholder_c"
}, {
"el1": "div",
"tattr": {
"usx": "mdf"
},
"x-child": "abc"
}]
}]
}]
}]
}
}, {
"node02": {
"tattr": {
"usx": "qhi123"
}
}
}]
In this example, placeholder_a should be replaced.
Somewhere in this is a value that needs to be replaced. My idea is to recursive iterate the JSON and process every key that has a dict or list as value. I think the recursive call of the function with a part of the JSON string just copies the JSON. So if the searched string will be find and changed in one recursion, it does not change the original string. What is the best approach to get the "placeholder" in the JSON replaced? It can be on every level.
Since my approach seems to be wrong, I am looking for ideas how to solve the issue. Currently I am between a simple string replace, where I am not sure if the replaced string will be a key or value in a JSON or a recursive function that takes the JSON, search and replace and rebuild the JSON on every recusion.
The code finds the search_para and replaces it but it will not be changed in the original string.
def lin(json_para,search_para,replace_para):
json_decoded = json.loads(json_para)
if isinstance(json_decoded,list):
for list_element in json_decoded:
lin(json.dumps(list_element))
elif isinstance(json_decoded,dict):
for dict_element in json_decoded:
if isinstance(json_decoded[dict_element],dict):
lin(json.dumps(json_decoded[dict_element]))
elif isinstance(json_decoded[dict_element],str):
if str(json_decoded[dict_element]) == 'search_para:
json_decoded[dict_element] = replace_para
While it certainly could be accomplished via recursion given the nature of the problem, I think there's an even more elegant approach based on an idea I got a long time ago reading #Mike Brennan’s answer to another JSON-related question titled How to get string objects instead of Unicode from JSON?
The basic idea is to use the optional object_hook parameter that both the json.load and json.loads() functions accept to watch what is being decoded and check it for the sought-after value (and replace it when it's encountered).
The function passed will be called with the result of any JSON object literal decoded (i.e. a dict) — in other words at any depth. What may not be obvious is that the dict can also be changed if desired.
This nice thing about this overall approach is that it's based (primarily) on prewritten, debugged, and relatively fast code because it's part of the standard library. It also allows the object_hook callback function to be kept relatively simple.
Here's what I'm suggesting:
import json
def replace_placeholders(json_para, search_para, replace_para):
# Local nested function.
def decode_dict(a_dict):
if search_para in a_dict.values():
for key, value in a_dict.items():
if value == search_para:
a_dict[key] = replace_para
return a_dict
return json.loads(json_para, object_hook=decode_dict)
result = replace_placeholders(json_para, 'placeholder_a', 'REPLACEMENT')
print(json.dumps(result, indent=2))
You can use recursion as follows:
data = [{'k1': 'placeholder_a', 'k2': [{'k3': 'placeholder_b', 'k4': 'placeholder_a'}]}, {'k5': 'placeholder_a', 'k6': 'placeholder_c'}]
def replace(data, val_from, val_to):
if isinstance(data, list):
return [replace(x, val_from, val_to) for x in data]
if isinstance(data, dict):
return {k: replace(v, val_from, val_to) for k, v in data.items()}
return val_to if data == val_from else data # other cases
print(replace(data, "placeholder_a", "REPLACED"))
# [{'k1': 'REPLACED', 'k2': [{'k3': 'placeholder_b', 'k4': 'REPLACED'}]}, {'k5': 'REPLACED', 'k6': 'placeholder_c'}]
I've changed the input/output for the sake of simplicity. You can check that the function replaces 'placeholder_a' at any level with 'REPLACED'.

Python filter nested dict with key value and print a part

Let´s say I have a nested dict like this but much longer:
{
"problems": [{
"1":[{
"name":"asprin abc",
"dose":"",
"strength":"100 mg"
}],
"2":[{
"name":"somethingElse",
"dose":"",
"strenght":"51g"
}],
"3":[{
"name":"againSomethingElse",
"dose":"",
"strenght":"511g"
}],
}],
"labs":[{
"missing_field": "missing_value"
}]
}
Now I want to iterate through the dict and do some filtering. I just want to have the part where the key "name" is LIKE '%aspirin%, as in Transact-SQL.
So the output should be the following:
[{
"name":"asprin abc",
"dose":"",
"strength":"100 mg"
}]
I now how to iterate through the dict but I don´t know how I should achieve the value filtering where I print the whole part where the title matches.
The following is a general solution making no assumption on the structure of the passed object, which could be a list, dictionary, etc. It will recursively descend throug the structure looking for a dictionary with a key "name" whose value contains asprin and will yield that dictionary:
d = {
"problems": [{
"1":[{
"name":"asprin abc",
"dose":"",
"strength":"100 mg"
}],
"2":[{
"name":"somethingElse",
"dose":"",
"strenght":"51g"
}],
"3":[{
"name":"againSomethingElse",
"dose":"",
"strenght":"511g"
}],
}],
"labs":[{
"missing_field": "missing_value"
}]
}
def filter(obj):
if isinstance(obj, list):
for item in obj:
yield from filter(item)
elif isinstance(obj, dict):
if "name" in obj and "asprin" in obj["name"]:
yield obj
else:
for v in obj.values():
if isinstance(v, (list, dict)):
yield from filter(v)
print(list(filter(d)))
Prints:
[{'name': 'asprin abc', 'dose': '', 'strength': '100 mg'}]
Python Demo
I found an easier solution like described above. A list comprehension is much easier
[problem for problem in problem['problems'] if problem['name'].find("aspirin")!=1]
You can try this below:
for p_obj in json_data["problems"]:
for i in p_obj.keys():
for j in p_obj[i]:
if "asprin" in j["name"]:
output.append(j)
print(output)

Create partial dict from recursively nested field list

After parsing a URL parameter for partial responses, e.g. ?fields=name,id,another(name,id),date, I'm getting back an arbitrarily nested list of strings, representing the individual keys of a nested JSON object:
['name', 'id', ['another', ['name', 'id']], 'date']
The goal is to map that parsed 'graph' of keys onto an original, larger dict and just retrieve a partial copy of it, e.g.:
input_dict = {
"name": "foobar",
"id": "1",
"another": {
"name": "spam",
"id": "42",
"but_wait": "there is more!"
},
"even_more": {
"nesting": {
"why": "not?"
}
},
"date": 1584567297
}
should simplyfy to:
output_dict = {
"name": "foobar",
"id": "1",
"another": {
"name": "spam",
"id": "42"
},
"date": 1584567297,
}
Sofar, I've glanced over nested defaultdicts, addict and glom, but the mappings they take as inputs are not compatible with my list (might have missed something, of course), and I end up with garbage.
How can I do this programmatically, and accounting for any nesting that might occur?
you can use:
def rec(d, f):
result = {}
for i in f:
if isinstance(i, list):
result[i[0]] = rec(d[i[0]], i[1])
else:
result[i] = d[i]
return result
f = ['name', 'id', ['another', ['name', 'id']], 'date']
rec(input_dict, f)
output:
{'name': 'foobar',
'id': '1',
'another': {'name': 'spam', 'id': '42'},
'date': 1584567297}
here the assumption is that on a nested list the first element is a valid key from the upper level and the second element contains valid keys from a nested dict which is the value for the first element

Delete a dictionary based on the value of a key in another list of dictionaries

I have a list of dictionaries and a main dictionary.
List of dictionaries have the following format. Values are assigned a variable that changes dynamically in the program.
list_dict = [{'url': url_value , 'title' : title_value}, {'url': url_value , 'title' : title_value}]
main_dict = {"execution_time": "2017-06-05", "target_url": "http://www.bloomberg.com", "data": [{ "url" : url1}, { "url" : url2}], "name": "Michael", "occupation": "software"}
If any url value(url1 or url2) under data in main_dict is the same value as the url_value in any of the dictionaries in list_dict, I want to delete that dictionary from the data.
Output: Assuming url_value is url1 then:
main_dict = {"execution_time": "2017-06-05", "target_url": "http://www.bloomberg.com", "data": [{ "url" : url2}], "name": "Michael", "occupation": "software"}
I thought about using dict comprehensions, however everything I tried did not work. I would appreciate a starting point or any guidance.
You can try this:
>>> list_dict = [{'url': "url1" , 'title' : "title_value1"}, {'url': "other_url" , 'title' : "title_value2"}]
>>> main_dict = {"execution_time": "2017-06-05", "target_url": "http://www.bloomberg.com", "data": [{ "url" : "url1"}, { "url" : "url2"}], "name": "Michael", "occupation": "software"}
>>> S = set(d["url"] for d in list_dict)
>>> main_dict["data"] = [d for d in main_dict["data"] if d["url"] not in S]
>>> main_dict
{'execution_time': '2017-06-05', 'target_url': 'http://www.bloomberg.com', 'data': [{'url': 'url2'}], 'name': 'Michael', 'occupation': 'software'}
Instead of deleting elements of main_dict["data"], the idea is to recreate the list without the matchings urls:
extract the distinct urls of the list_dict in S;
filter the dicts d in main_dict["data"] on the rule: d["url"] not in S.
Note on naming: try to name your variables according to the content and not the type.
list_dict is a list of dictionaries (I can see it), but I would like to know immediately what's in those dictionaries. web_pages would be better, if you accept that an url + a title makes a page. But you should specify why those pages are on this list (e.g. dead_link_pages, or whatever)
main_dict is a dictionary (pretty obvious and not really informative): something like task is better. Again, a better specification is informative: update_task, retrieve_task, ?
ok, I replace S by page_urls!
Have a look, this is far more readable:
>>> web_pages = [{'url': "url1" , 'title' : "title_value1"}, {'url': "other_url" , 'title' : "title_value2"}]
>>> task = {"execution_time": "2017-06-05", "target_url": "http://www.bloomberg.com", "data": [{ "url" : "url1"}, { "url" : "url2"}], "name": "Michael", "occupation": "software"}
>>> page_urls = set(p["url"] for p in web_pages)
>>> task["data"] = [t for t in task["data"] if t["url"] not in page_urls]
>>> task
{'execution_time': '2017-06-05', 'target_url': 'http://www.bloomberg.com', 'data': [{'url': 'url2'}], 'name': 'Michael', 'occupation': 'software'}

How can I insert records which have dicts and lists in Flask Eve?

I'm using Flask-Eve to provide an API for my data. I would like to insert my records using Eve, so that I get a _created attribute and the other Eve-added attributes.
Two of my fields are dicts, and one is a list. When I try to insert that to Eve the structure seems to get flattened, losing some information. Trying to tell Eve about the dict & list elements gives me an error on POST, saying those fields need to be dicts and lists, but they already are! Please can someone help me & tell me what I'm doing wrong?
My Eve conf looked like this:
'myendpoint': { 'allow_unknown': True,
'schema': { 'JobTitle': { 'type': 'string',
'required': True,
'empty': False,
'minlength': 3,
'maxlength': 99 },
'JobDescription': { 'type': 'string',
'required': True,
'empty': False,
'minlength': 32,
'maxlength': 102400 },
},
},
But when I POST the following structure using requests:
{
"_id" : ObjectId("56e840686dbf9a5fe069220e"),
"Salary" : {
"OtherPay" : "On Application"
},
"ContactPhone" : "xx",
"JobTypeCodeList" : [
"Public Sector",
"Other"
],
"CompanyName" : "Scc",
"url" : "xx",
"JobTitle" : "xxx",
"WebAdID" : "TA7494725_1_1",
"JobDescription" : "xxxx",
"JobLocation" : {
"DisplayCity" : "BRIDGWATER",
"City" : "BRIDGWATER",
"StateProvince" : "Somerset",
"Country" : "UK",
"PostalCode" : "TA6"
},
"CustomField1" : "Permanent",
"CustomField3" : "FTJOBUKNCSG",
"WebAdManagerEmail" : "xxxx",
"JobType" : "Full",
"ProductID" : "JCPRI0UK"
}
The post line looks like this:
resp = requests.post(url, data = job)
It gets 'flattened' and loses the information from the dicts and list:
{
"_id" : ObjectId("56e83f5a6dbf9a6395ea559d"),
"Salary" : "OtherPay",
"_updated" : ISODate("2016-03-15T16:59:06Z"),
"ContactPhone" : "xx",
"JobTypeCodeList" : "Public Sector",
"CompanyName" : "Scc",
"url" : "xxx",
"JobTitle" : "xx",
"WebAdID" : "TA7494725_1_1",
"JobDescription" : "xxx",
"JobLocation" : "DisplayCity",
"CustomField1" : "Permanent",
"_created" : ISODate("2016-03-15T16:59:06Z"),
"CustomField3" : "FTJOBUKNCSG",
"_etag" : "55d8d394141652f5dc2892a900aa450403a63d10",
"JobType" : "Full",
"ProductID" : "JCPRI0UK"
}
I've tried updating my schema to say some are dicts and lists:
'JobTypeCodeList': { 'type': 'list'},
'Salary': { 'type': 'dict'},
'JobLocation': { 'type': 'dict'},
But then when I POST in the new record I get an error saying
{u'Salary': u'must be of dict type', u'JobTypeCodeList': u'must be of list type', u'JobLocation': u'must be of dict type'},
I've verified before the POST that type(job.Salary) == dict etc, so I'm not sure how to resolve this. While I can POST the record directly into MongoDB ok, bypassing Eve, I'd prefer to use Eve if possible.
In case this is useful to anyone else, I ended up working around this issue by posting a flat structure into Eve, and then using the on_insert and on_update events to loop through the keys and construct objects (and lists) from them.
It's a bit convoluted but it does the trick and now that it's in place it's fairly transparent to use. My objects added to MongoDB through Eve now have embedded lists and hashes, but they also get the handy Eve attributes like _created and _updated, while the POST and PATCH requests also get validated through Eve's normal schema.
The only really awkward thing is that on_insert and on_update send slightly different arguments, so there's a lot of repetition in the code below which I haven't yet refactored out.
Any characters can be used as flags: I'm using two underscores to indicate key/values which should end up as a single object, and two ampersands for values which should be split into a list. The structure I'm posting in now looks like this:
"Salary__OtherPay" : "On Application"
"ContactPhone" : "xx",
"JobTypeCodeList" : "Public Sector&&Other",
"CompanyName" : "Scc",
"url" : "xx",
"JobTitle" : "xxx",
"WebAdID" : "TA7494725_1_1",
"JobDescription" : "xxxx",
"JobLocation__DisplayCity" : "BRIDGWATER",
"JobLocation__City" : "BRIDGWATER",
"JobLocation__StateProvince" : "Somerset",
"JobLocation__Country" : "UK",
"JobLocation__PostalCode" : "TA6"
"CustomField1" : "Permanent",
"CustomField3" : "FTJOBUKNCSG",
"WebAdManagerEmail" : "xxxx",
"JobType" : "Full",
"ProductID" : "JCPRI0UK"
And my Eve schema has been updated accordingly to validate the values of those new key names. Then in the backend I've defined the function below which checks the incoming keys/values and converts them into objects/lists, and also deletes the original __ and && data:
import re
def flat_to_complex(items=None, orig=None):
if type(items) is dict: # inserts of new objects
if True: # just to force indentation
objects = {} # hash-based container for each object
lists = {} # hash-based container for each list
for key,value in items.items():
has_object_wildcard = re.search(r'^([^_]+)__', key, re.IGNORECASE)
if bool(has_object_wildcard):
objects[has_object_wildcard.group(1)] = None
elif bool(re.search(r'&&', unicode(value))):
lists[key] = str(value).split('&&')
for list_name, this_list in lists.items():
items[list_name] = this_list
for obj_name in objects:
this_obj = {}
for key,value in items.items():
if key.startswith('{s}__'.format(s=obj_name)):
match = re.search(r'__(.+)$', key)
this_obj[match.group(1)] = value
del(items[key])
objects[obj_name] = this_obj
for obj_name, this_obj in objects.items():
items[obj_name] = this_obj
elif type(items) is list: # updates to existing objects
for idx in range(len(items)):
if type(items[idx]) is dict:
objects = {} # hash-based container for each object
lists = {} # hash-based container for each list
for key,value in items[idx].items():
has_object_wildcard = re.search(r'^([^_]+)__', key, re.IGNORECASE)
if bool(has_object_wildcard):
objects[has_object_wildcard.group(1)] = None
elif bool(re.search(r'&&', unicode(value))):
lists[key] = str(value).split('&&')
for list_name, this_list in lists.items():
items[idx][list_name] = this_list
for obj_name in objects:
this_obj = {}
for key,value in items[idx].items():
if key.startswith('{s}__'.format(s=obj_name)):
match = re.search(r'__(.+)$', key)
this_obj[match.group(1)] = value
del(items[idx][key])
objects[obj_name] = this_obj
for obj_name, this_obj in objects.items():
items[idx][obj_name] = this_obj
And then I just tell Eve to run that function on inserts and updates to that collection:
app.on_insert_myendpoint += flat_to_complex
app.on_update_myendpoint += flat_to_complex
This achieves what I needed and the resulting record in Mongo is the same as the one from the question above (with _created and _updated attributes). It's obviously not ideal but it gets there, and it's fairly easy to work with once it's in place.

Categories

Resources