Copy nested dictionary to another dictionary with all the levels - python

For some third party APIs, there is a huge data that needs to be sent in the API parameters. And input data comes to our application in the CSV format.
I receive all the rows of the CSV containing around 120 columns, in a plane dict format by CSV DictReader.
file_data_obj = csv.DictReader(open(file_path, 'rU'))
This gives me each row in following format:
CSV_PARAMS = {
'param7': "Param name",
'param6': ["some name"],
'param5': 1234,
'param4': 999999999,
'param3': "some ",
'param2': {"x name":"y_value"},
'param1': None,
'paramA': "",
'paramZ': 2.687
}
And there is one nested dictionary containing all the third-party API parameters as keys with blank value.
eg. API_PARAMS = {
"param1": "",
"param2": "",
"param3": "",
"paramAZ": {
"paramA": "",
"paramZ": {"test1":1234, "name":{"hello":1}},
...
},
"param67": {
"param6": "",
"param7": ""
},
...
}
I have to map all the CSV Values to API parameters dynamically. following code works but upto 3 level nesting only.
def update_nested_params(self, paramdict, inpdict, result={}):
"""Iterate over nested dictionary up to level 3 """
for k, v in paramdict.items():
if isinstance(v, dict):
for k1, v1 in v.items():
if isinstance(v1, dict):
for k2, _ in v1.items():
result.update({k:{k1:{k2: inpdict.get(k2, '')}}})
else:
result.update({k:{k1: inpdict.get(k1, '')}})
else:
result.update({k: inpdict.get(k, '')})
return result
self.update_nested_params(API_PARAMS, CSV_PARAMS)
Is there any other efficient way to achieve this for n number of nestings of the API Parameters?

You could use recursion:
def update_nested_params(self, template, source):
result = {}
for key, value in template.items():
if key in source:
result[key] = source[key]
elif not isinstance(value, dict):
# assume the template value is a default
result[key] = value
else:
# recurse
result[key] = self.update_nested_params(value, source)
return result
This copies the 'template' (API_PARAMS) recursively, taking any key it finds from source if available, and recurses if not but the value in template is another dictionary. This handles nesting up to sys.getrecursionlimit() levels (default 1000).
Alternatively, use an explicit stack:
# extra import to add at the module top
from collections import deque
def update_nested_params(self, template, source):
top_level = {}
stack = deque([(top_level, template)])
while stack:
result, template = stack.pop()
for key, value in template.items():
if key in source:
result[key] = source[key]
elif not isinstance(value, dict):
# assume the template value is a default
result[key] = value
else:
# push nested dict into the stack
result[key] = {}
stack.append((result[key], value))
return top_level
This essentially just moves the call stack used in recursion to an explicit stack. The order in which keys are processed changes from depth to breath first but this doesn’t matter for your specific problem.

Related

how to stack the keys from nasted dict and to flatten it

I had a task to flatten nested dict, which was easy. This is my code for that:
class Simple:
def __init__(self):
self.store_data = {}
def extract_data(self, config):
for key in config:
if isinstance(config[key], dict):
self.extract_data(config[key])
else:
self.store_data[{key}] = config[key]
return self.store_data
This was my intput:
input = {
'k1_lv1': {
'k1_lv2': 'v1_lv2', 'k2_lv2': 'v2_lv2'},
'k2_lv1': 'v1_lv1',
'k3_lv1': {
'k1_lv2': 'v1_lv2', 'k2_lv2': 'v2_vl2'},
'k4_lv1': 'v1_lv1',
}
and this was my output (imagine that the keys are unique):
output = {
'k1_lv2': 'v1_lv2', 'k2_lv2': 'v2_lv2',
'k2_lv1': 'v1_lv1',
'k1_lv2': 'v1_lv2', 'k2_lv2': 'v2_vl2',
'k4_lv1': 'v1_lv1'
}
but now my task has been changed and my output has to become like this:
output = {
'k1_lv1_k1_lv2': 'v1_lv2',
'k1_lv1_k2_lv2': 'v2_lv2',
'k2_lv1': 'v1_lv1',
'k3_lv1_k1_lv2': 'v1_lv2',
'k3_lv1_k2_lv2': 'v2_vl2',
'k4_lv1': 'v1_lv1'
}
so I have to not only flatten the nested dict, but have to save the keys of nested dicts.
I tried to achieve that output but I am failing.
You can use recursion for the task:
dct = {
"k1_lv1": {"k1_lv2": "v1_lv2", "k2_lv2": "v2_lv2"},
"k2_lv1": "v1_lv1",
"k3_lv1": {"k1_lv2": "v1_lv2", "k2_lv2": "v2_vl2"},
"k4_lv1": "v1_lv1",
}
def flatten(d, path=""):
if isinstance(d, dict):
for k, v in d.items():
yield from flatten(v, (path + "_" + k).strip("_"))
else:
yield (path, d)
out = dict(flatten(dct))
print(out)
Prints:
{
"k1_lv1_k1_lv2": "v1_lv2",
"k1_lv1_k2_lv2": "v2_lv2",
"k2_lv1": "v1_lv1",
"k3_lv1_k1_lv2": "v1_lv2",
"k3_lv1_k2_lv2": "v2_vl2",
"k4_lv1": "v1_lv1",
}
Why don't you loop through the keys using input.keys() and then stack keys using
output['{}_{}'.format(key_level1, key_level2]]= input['key_level1']['key_level2']
You might need to nest for loops and add a condition to test the depth of the keys in your dictionnary.

How can I convert/transform a JSON tree structure to a merkle tree

I'm running a web server, where I receive data in JSON format and planning to store it in a NoSQL database. Here is an example:
data_example = {
"key1": "val1",
"key2": [1, 2, 3],
"key3": {
"subkey1": "subval1",
.
.
}
}
I had thoughts about using a Merkle tree to represent my data since JSON is also a tree-like structure.
Essentially, what I want to do is to store my data in (or as) a more secure decentralized tree-like structure. Many entities will have access to create, read, update or delete (CRUD) a record from it. These CRUD operations will ideally need to be verified from other entities in the network, which will also hold a copy of the database. Just like in blockchain.
I'm having a design/concept problem and I'm trying to understand how can I turn my JSON into a Merkle tree structure. This is my Node class:
class Node:
""" class that represents a node in a merkle tree"""
def __init__(data):
self.data = data
self.hash = self.calculate_some_hash() # based on the data or based on its child nodes
I'm interested in the conception/design of this as I couldn't figure out how this can work. Any idea how to save/store my data_example object in a Merkle tree? (is it possible?)
You can create a Merkle Tree by first converting your dictionary to a class object form, and then recursively traverse the tree, hashing the sum of the child node hashes. Since a Merkle Tree requires a single root node, any input dictionaries that have more than one key at the topmost level should become the child dictionary of an empty root node (with a default key of None):
data_example = {
"key1": "val1",
"key2": [1, 2, 3],
"key3": {
"subkey1": "subval1",
"subkey2": "subval2",
"subkey3": "subval3",
}
}
class MTree:
def __init__(self, key, value):
self.key, self.hash = key, None
self.children = value if not isinstance(value, (dict, list)) else self.__class__.build(value, False)
def compute_hashes(self):
#build hashes up from the bottom
if not isinstance(self.children, list):
self.hash = hash(self.children)
else:
self.hash = hash(sum([i.compute_hashes() for i in self.children]))
return self.hash
def update_kv(self, k, v):
#recursively update a value in the tree with an associated key
if self.key == k:
self.children = v
elif isinstance(self.children, list):
_ = [i.update_kv(k, v) for i in self.children]
def update_tree(self, payload):
#update key-value pairs in the tree from payload
for a, b in payload.items():
self.update_kv(a, b)
self.compute_hashes() #after update is complete, recompute the hashes
#classmethod
def build(cls, dval, root=True):
#convert non-hashible values to strings
vals = [i if isinstance(i, (list, tuple)) else (None, i) for i in getattr(dval, 'items', lambda :dval)()]
if root:
if len(vals) > 1:
return cls(None, dval)
return cls(vals[0][0], vals[0][-1])
return [cls(a, b) for a, b in vals]
def __repr__(self):
return f'{self.__class__.__name__}({self.hash}, {repr(self.children)})'
tree = MTree.build(data_example) #create the basic tree with the input dictionary
_ = tree.compute_hashes() #get the hashes for each node (predicated on its children)
print(tree)
Output:
MTree(-1231139208667999673, [MTree(-8069796171680625903, 'val1'), MTree(6, [MTree(1, 1), MTree(2, 2), MTree(3, 3)]), MTree(-78872064628455629, [MTree(-8491910191379857244, 'subval1'), MTree(1818926376495655970, 'subval2'), MTree(1982425731828357743, 'subval3')])])
Updating the tree with the contents from a payload:
tree.update_tree({"key1": "newVal1"})
Output:
MTree(1039734050960246293, [MTree(5730292134016089818, 'newVal1'), MTree(6, [MTree(1, 1), MTree(2, 2), MTree(3, 3)]), MTree(-78872064628455629, [MTree(-8491910191379857244, 'subval1'), MTree(1818926376495655970, 'subval2'), MTree(1982425731828357743, 'subval3')])])

rename duplicate key in json file python

I have json file which has duplicate keys.
Example
{
"data":"abc",
"data":"xyz"
}
I want to make this as
{
"data1":"abc",
"data2":"xyz"
}
I tried using object_pairs_hook with json_loads, but it is not working. Could anyone one help me with Python solution for above problem
You can pass the load method a keyword parameter to handle pairing, there you can check for duplicates like this:
raw_text_data = """{
"data":"abc",
"data":"xyz",
"data":"xyz22"
}"""
def manage_duplicates(pairs):
d = {}
k_counter = Counter(defaultdict(int))
for k, v in pairs:
d[k+str(k_counter[k])] = v
k_counter[k] += 1
return d
print(json.loads(raw_text_data, object_pairs_hook=manage_duplicates))
I used Counter to count each key, if it already exists, I'm saving the key as k+str(k_counter[k) - so it will be added with a trailing number.
P.S
If you have control on the input, I would highly recommend to change your json structure to:
{"data": ["abc", "xyz"]}
The rfc 4627 for application/json media type recommends unique keys but it doesn't forbid them explicitly:
The names within an object SHOULD be unique.
A quick and dirty solution using re.
import re
s = '{ "data":"abc", "data":"xyz", "test":"one", "test":"two", "no":"numbering" }'
def find_dupes(s):
keys = re.findall(r'"(\w+)":', s)
return list(set(filter(lambda w: keys.count(w) > 1, keys)))
for key in find_dupes(s):
for i in range(1, len(re.findall(r'"{}":'.format(key), s)) + 1):
s = re.sub(r'"{}":'.format(key), r'"{}{}":'.format(key, i), s, count=1)
print(s)
Prints this string:
{
"data1":"abc",
"data2":"xyz",
"test1":"one",
"test2":"two",
"no":"numbering"
}

Copy keys and list contents from JSON in python

I am trying to skim through a dictionary that contains asymmetrical data and make a list of unique headings. Aside from the normal key:value items, the data within the dictionary also includes other dictionaries, lists, lists of dictionaries, NoneTypes, and so on at various levels throughout. I would like to be able to keep the hierarchy of keys/indexes if possible. This will be used to assess the scope of the data and it's availability. The data comes from a JSON file and it's contents are subject to change.
My latest attempt is to do this through a series of type checks within a function, skim(), as seen below.
def skim(obj, header='', level=0):
if obj is None:
return
def skim_iterable(iterable):
lvl = level +1
if isinstance(iterable, (list, tuple)):
for value in iterable:
h = ':'.join([header, iterable.index(value)])
return skim(value, header=h, level=lvl)
elif isinstance(iterable, dict):
for key, value in iterable.items():
h = ':'.join([header, key])
return skim(value, header=h, level=lvl)
if isinstance(obj, (int, float, str, bool)):
return ':'.join([header, obj, level])
elif isinstance(obj, (list, dict, tuple)):
return skim_iterable(obj)
The intent is to make a recursive call to skim() until the key or list index position at the deepest level is passed and then returned. skim has a inner function that handles iterable objects which carries the level along with the key value or list index position forward through each nestled iterable object.
An example below
test = {"level_0Item_1": {
"level_1Item_1": {
"level_2Item_1": "value",
"level_2Item_2": "value"
},
"level_1Item_2": {
"level_2Item_1": "value",
"level_2Item_2": {}
}},
"level_0Item_2": [
{
"level_1Item_1": "value",
"level_1Item_2": 569028742
}
],
"level_0Item_3": []
}
collection = [skim(test)]
Right now I'm getting a return of [None] on the above code and would like some help troubleshooting or guidance on how best to approach this. What I was expecting is something like this:
['level_0Item_1:level_1Item_1:level_2Item_1',
'level_0Item_1:level_1Item_1:level_2Item_2',
'level_0Item_1:level_1Item_2:level_2Item_1',
'level_0Item_1:level_1Item_2:level_2Item_2',
'level_0Item_2:level_1Item_1',
'level_0Item_2:level_1Item_2',
'level_0Item_3]
Among other resources, I recently came across this question (python JSON complex objects (accounting for subclassing)), read it and it's included references. Full disclosure here, I've only began coding recently.
Thank you for your help.
You can try something like:
def skim(obj, connector=':', level=0, builded_str= ''):
if isinstance(obj, dict):
for k, v in obj.items():
if isinstance(v, dict) and v:
yield from skim(v, connector, level + 1, builded_str + k + connector)
elif isinstance(v, list) and v:
yield from skim(v[0], connector, level + 1, builded_str + k + connector)
else:
yield builded_str + k
else:
yield builded_str
Test:
test = {"level_0Item_1": {
"level_1Item_1": {
"level_2Item_1": "value",
"level_2Item_2": "value"
},
"level_1Item_2": {
"level_2Item_1": "value",
"level_2Item_2": {}
}},
"level_0Item_2": [
{
"level_1Item_1": "value",
"level_1Item_2": 569028742
}
],
"level_0Item_3": []
}
lst = list(skim(test))
print(lst)
['level_0Item_1:level_1Item_2:level_2Item_1`',
'level_0Item_1:level_1Item_2:level_2Item_2',
'level_0Item_1:level_1Item_1:level_2Item_1',
'level_0Item_1:level_1Item_1:level_2Item_2',
'level_0Item_2:level_1Item_2',
'level_0Item_2:level_1Item_1',
'level_0Item_3']`

Accessing json type data without knowing layout of data?

I have a file with JSON data I am loading using json.load.
Suppose I want to put a variable in the json data, which references another data field. How can I process this reference in python?
eg:
{
"dictionary" : {
"list_1" : [
"item_1"
],
"list_2" : [
"$dictionary.list_1"
]
}
}
when I come across $, I then want list_2 to grab the data from: dictionary.list_1
and extend list_2, as if I had written in my python code:
jsonData["dictionary"]["list_2"].extend(jsonData["dictionary"]["list_1"])
As far as I know, there is nothing in the JSON standard for doing references. My first suggestion would be to use YAML which does have references in the form of Node Anchors. Python has a good implementation of YAML which supports those.
That being said, if you're set on using JSON, you'll have to roll your own implementation.
One possible example(though this doesn't extend the current array by the referenced array because that's ambiguous in the case of dicts, it replaces the reference by the value it refers to) is below. Note that it doesn't handle malformed references you'll have to add the error-checking yourself or guarantee that there aren't malformed references. If you want to change it to extend instead of replacing, you can, but you know your use-case better than I so you'll be able to specify it that way. This is meant to give you a starting point.
def resolve_references(structure, sub_structure=None):
if sub_structure is None:
return resolve_references(structure, structure)
if isinstance(sub_structure, list):
tmp = []
for item in sub_structure:
tmp.append(resolve_references(structure, item))
return tmp
if isinstance(sub_structure, dict):
tmp = {}
for key,value in sub_structure.items():
tmp[key] = resolve_references(structure, value)
return tmp
if isinstance(sub_structure, str) or isinstance(sub_structure, unicode):
if sub_structure[0] != "$":
return sub_structure
keys = sub_structure[1:].split(".")
def get_value(obj, key):
if isinstance(obj, dict):
return obj[key]
if isinstance(obj, list):
return obj[int(key)]
return value
value = get_value(structure, keys[0])
for key in keys[1:]:
value = get_value(value, key)
return value
return sub_structure
Example usage:
>>> import json
>>> json_str = """
... {
... "dictionary" : {
... "list_1" : [
... "item_1"
... ],
...
... "list_2" : "$dictionary.list_1"
... }
... }
... """
>>> obj = json.loads(json_str)
>>> resolve_references(obj)
{u'dictionary': {u'list_2': [u'item_1'], u'list_1': [u'item_1']}}

Categories

Resources