Recursively accessing paths and values of a nested dictionary

Recursively accessing paths and values of a nested dictionary - python

In Python 2.7, how does one dynamically access and print out the keys and values of a nested dictionary? Here's a nonsensical example: https://jsoneditoronline.org/?id=da7a486dc2e24bf8b94add9f04c71b4d
Normally, I would do something like:
import json
json_sample = 'sample_dict.json'
json_file = open(json_sample, 'r')
json_data = json.load(json_file)
items = json_data['sample_dict']
for item in items:
dict_id = item['dict_id']
person = item['person']['person_id']
family = item['family']['members']
print dict_id
print person
print family
I can hard code it like this and it'll give me desirable results, but how would I access each of the keys and values dynamically so that:
The first row just prints the keys (dict_id, person['person_id'], person['name'], family['members']['father'])
The second row prints the values respectively (5, 15, "Martin", "Jose")
The end result should be in a CSV file.

You can use a recursive visitor/generator which returns all the path/value pairs of the leaves:
def visit_dict(d, path=[]):
for k, v in d.items():
if not isinstance(v, dict):
yield path + [k], v
else:
yield from visit_dict(v, path + [k])
(replace the yield from ... with the appropriate equivalent if using Python < 3.4)
Getting the keys:
>>> ','.join('/'.join(k) for k, v in visit_dict(json_data['sample_dict'][0]))
'dict_id,person/person_id,person/name,person/age,family/person_id,family/members/father,family/members/mother,family/members/son,family/family_id,items_id,furniture/type,furniture/color,furniture/size,furniture/purchases'
and the values:
>>> ','.join(str(v) for k, v in visit_dict(json_data['sample_dict'][0]))
'5,15,Martin,18,20,Jose,Maddie,Jerry,2,None,Chair,Brown,Large,[]'

Related

How to collect nested json keys to a linear list

I'm working with large nested json and need collect all Json keys to list ,
i.e:
for json:
{"taxIncludedAmount":{},"impactingPriceParameter":[{}],"extensions":{"additionalProp1":{}}}}
I'd like to collect the key to a list,
and add the brackets so i know the type of the key.
so for above json i'd like to get(include the right order):
eventType
event{}.dataStrategy
event{}.error{}.code
event{}.error{}.characteristics[].name
I manage to get all keys using some code example found,
but having trouble find a way to add the brackets {} for dic and [] for list.
code:
def get_keys(d, curr_key=[]):
for k, v in d.items():
if isinstance(v, dict):
yield from get_keys(v, curr_key + [k])
elif isinstance(v, list):
for i in v:
yield from get_keys(i, curr_key + [k])
else:
yield '.'.join(curr_key + [k])
def main():
array_json_keys = [*get_keys(json_data)]
output:
event.dataStrategy
event.error.characteristics.name
event.error.code
eventType
this is "almost there ,I need to add the brackets ({} for dic ,[] for array)
in addition i'd like to get it sorted so first level object will be display first.
Update:
Thanks for #blhsing - it solve the brackets ,
from some reason it skipping empty keys in example
"impactingPriceParameter": [
{}
]
or
"extensions": {
"additionalProp1": {}
}

You can simply concatenate '{}' or '[]' to the key k depending on the data type of the value v:
def get_keys(d, curr_key=[]):
for k, v in d.items():
if isinstance(v, dict):
yield from get_keys(v, curr_key + [k + '{}'])
elif isinstance(v, list):
for i in v:
yield from get_keys(i, curr_key + [k + '[]'])
else:
yield '.'.join(curr_key + [k])

How to create a list if it doesn't exist and add to list if it does

Say I have a dictionary that looks like this:
mappings = {"some_key": 3}
or it could look like this:
mappings = {"some_key": [4,5,6]}
Say I have a value 100 and a key of "some_key" in this function:
def add_to_mappings(key, value):
if key in mappings:
mappings[key] = ?
and I either want to add to the list if it exists or create one if it does not. At the end, I want my mappings to look like either:
mappings = {"some_key": [3, 100]}
or
mappings = {"some_key": [4,5,6,100]}

Without defaultdict:
mappings = dict()
def add_to_mappings(key, value):
try:
mappings[key].append(100)
except KeyError:
mappings[key] = [100]
With defaultdict:
from collections import defaultdict
mappings = defaultdict(list)
def add_to_mappings(key, value):
mappings[key].append(value)
Edit: I misunderstood the original requirements, to take an item if it already existed and create a list out of it and the new item, then the first example could be changed to this:
mappings = dict(foo=3)
def add_to_mappings(key, value):
try:
mappings[key].append(100)
except KeyError:
mappings[key] = [100]
except AttributeError:
mappings[key] = [mappings[key], value]
add_to_mappings("foo", 5)
# mappings ==> { "foo": [3, 5] }

You check if something is a list with isinstance(x, list). You can extract existing values from a dictionary and replace the value with simple assignment. So:
def add_to_mappings(d, key, value): # Remember to pass in the dict too!
if key in d:
# The value is present
v = d[k]
if isinstance(v, list):
# Already a list: just append to it
v.append(value)
else:
# Not a list: make a new list
d[k] = [v, value]
else:
# Not present at all: make a new list
d[key] = [value]

Get specific key of a nested iterable and check if its value exists in a list

I am trying to access a specific key in a nest dictionary, then match its value to a string in a list. If the string in the list contains the string in the dictionary value, I want to override the dictionary value with the list value. below is an example.
my_list = ['string1~', 'string2~', 'string3~', 'string4~', 'string5~', 'string6~']
my_iterable = {'A':'xyz',
'B':'string6',
'C':[{'B':'string4', 'D':'123'}],
'E':[{'F':'321', 'B':'string1'}],
'G':'jkl'
}
The key I'm looking for is B, the objective is to override string6 with string6~, string4 with string4~, and so on for all B keys found in the my_iterable.
I have written a function to compute the Levenshtein distance between two strings, but I am struggling to write an efficient ways to override the values of the keys.
def find_and_replace(key, dictionary, original_list):
for k, v in dictionary.items():
if k == key:
#function to check if original_list item contains v
yield v
elif isinstance(v, dict):
for result in find_and_replace(key, v, name_list):
yield result
elif isinstance(v, list):
for d in v:
if isinstance(d, dict):
for result in find_and_replace(key, d, name_list):
yield result
if I call
updated_dict = find_and_replace('B', my_iterable, my_list)
I want updated_dict to return the below:
{'A':'xyz',
'B':'string6~',
'C':[{'B':'string4~', 'D':'123'}],
'E':[{'F':'321', 'B':'string1~'}],
'G':'jkl'
}
Is this the right approach to the most efficient solution, and how can I modify it to return a dictionary with the updated values for B?

You can use below code. I have assumed the structure of input dict to be same throughout the execution.
# Input List
my_list = ['string1~', 'string2~', 'string3~', 'string4~', 'string5~', 'string6~']
# Input Dict
# Removed duplicate key "B" from the dict
my_iterable = {'A':'xyz',
'B':'string6',
'C':[{'B':'string4', 'D':'123'}],
'E':[{'F':'321', 'B':'string1'}],
'G':'jkl',
}
# setting search key
search_key = "B"
# Main code
for i, v in my_iterable.items():
if i == search_key:
if not isinstance(v,list):
search_in_list = [i for i in my_list if v in i]
if search_in_list:
my_iterable[i] = search_in_list[0]
else:
try:
for j, k in v[0].items():
if j == search_key:
search_in_list = [l for l in my_list if k in l]
if search_in_list:
v[0][j] = search_in_list[0]
except:
continue
# print output
print (my_iterable)
# Result -> {'A': 'xyz', 'B': 'string6~', 'C': [{'B': 'string4~', 'D': '123'}], 'E': [{'F': '321', 'B': 'string1~'}], 'G': 'jkl'}
Above can has scope of optimization using list comprehension or using
a function
I hope this helps and counts!

In some cases, if your nesting is kind of complex you can treat the dictionary like a json string and do all sorts of replacements. Its probably not what people would call very pythonic, but gives you a little more flexibility.
import re, json
my_list = ['string1~', 'string2~', 'string3~', 'string4~', 'string5~', 'string6~']
my_iterable = {'A':'xyz',
'B':'string6',
'C':[{'B':'string4', 'D':'123'}],
'E':[{'F':'321', 'B':'string1'}],
'G':'jkl'}
json_str = json.dumps(my_iterable, ensure_ascii=False)
for val in my_list:
json_str = re.sub(re.compile(f"""("[B]":\\W?")({val[:-1]})(")"""), r"\1" + val + r"\3", json_str)
my_iterable = json.loads(json_str)
print(my_iterable)

Find duplicates for mixed type values in dictionaries

I would like to recognize and group duplicates values in a dictionary. To do this I build a pseudo-hash (better read signature) of my data set as follow:
from pickle import dumps
taxonomy = {}
binder = defaultdict(list)
for key, value in ds.items():
signature = dumps(value)
taxonomy[signature] = value
binder[signature].append(key)
For a concrete use-case see this question.
Unfortunately I realized that if the following statement is True:
>>> ds['key1'] == ds['key2']
True
This one is not always True anymore:
>>> dumps(ds['key1']) == dumps(ds['key2'])
False
I notice the key order on the dumped output differ for both dict. If I copy/paste the output of ds['key1'] and ds['key2'] into new dictionaries I can make the comparison successful.
As an overkill alternative I could traverse my dataset recursively and replace dict instances with OrderedDict:
import copy
def faithfulrepr(od):
od = od.deepcopy(od)
if isinstance(od, collections.Mapping):
res = collections.OrderedDict()
for k, v in sorted(od.items()):
res[k] = faithfulrepr(v)
return repr(res)
if isinstance(od, list):
for i, v in enumerate(od):
od[i] = faithfulrepr(v)
return repr(od)
return repr(od)
>>> faithfulrepr(ds['key1']) == faithfulrepr(ds['key2'])
True
I am worried about this naive approach because I do not know whether I cover all the possible situations.
What other (generic) alternative can I use?

The first thing is to remove the call to deepcopy which is your bottleneck here:
def faithfulrepr(ds):
if isinstance(ds, collections.Mapping):
res = collections.OrderedDict(
(k, faithfulrepr(v)) for k, v in sorted(ds.items())
)
elif isinstance(ds, list):
res = [faithfulrepr(v) for v in ds]
else:
res = ds
return repr(res)
However sorted and repr have their drawbacks:
you can't trully compare custom types;
you can't use mappings with different types of keys.
So the second thing is to get rid of faithfulrepr and compare objects with __eq__:
binder, values = [], []
for key, value in ds.items():
try:
index = values.index(value)
except ValueError:
values.append(value)
binder.append([key])
else:
binder[index].append(key)
grouped = dict(zip(map(tuple, binder), values))

Method of expanding list of dictionaries into list of formatted strings

I've run across a need to call out to an external executable using the subprocess module. Everything is working fine, however I'd like to improve how I'm generating the commandline arguments.
The executable's command line options require formatting as follows:
--argname=argvalue
I currently have a list of dictionaries as follows:
[{arg1:value1},{arg2:value2}]
What is the best method of expanding these dictionaries into their proper string format? I'm currently iterating over the list, appending to a different list, however it feels there's a more pythonic method I should utilize.

Use items() as in http://docs.quantifiedcode.com/python-code-patterns/readability/not_using_items_to_iterate_over_a_dictionary.html
for key,val in d.items():
print("{} = {}".format(key, val))

' '.join('--{key}={value}'.format(key = k, value = v) for d in arg_list for k, v in d.items())
Essentially, this iterates over each dictionary in the list (for d in arg_list) and then iterates over the items in each dictionary (for k, v in d.items()). Each item is formatted into the proper form, and then all of those key-value pairs are combined.
It is equivalent to:
arg_list = [{arg1:value1},{arg2:value2}]
formatted_args = []
for d in arg_list:
for k, v in d.items():
# Format each item per dictionary
formatted_args.append('--{key}={value}'.format(key = k, value = v))
# Combine the arguments into one string
args_string = ' '.join(formatted_args)

Try this
','.join('{}={}'.format(k, v) for d in arg_list for k, v in d.items())

How about
def format_dict(d):
key = d.keys()[0]
return '--%s=%s' % (key, d[key])
ex = [{'a':'3'}, {'b':'4'}]
print ' '.join(map(format_dict, ex)) # --a=3 --b=4

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Recursively accessing paths and values of a nested dictionary - python

Related

How to collect nested json keys to a linear list

How to create a list if it doesn't exist and add to list if it does

Get specific key of a nested iterable and check if its value exists in a list

Find duplicates for mixed type values in dictionaries

Method of expanding list of dictionaries into list of formatted strings

Categories

Resources