Recursing in nested dictionaries and lists

Recursing in nested dictionaries and lists - python

This is sort of a followup question to one of my previous questions. I have some dictionaries where I need to look at every value they contain and if that value is a datetime I need to format it a specific way. I also need to be able to recurse into nested dictionaries and lists. This is what I have so far:
def fix_time(in_time):
out_time = '{}-{:02d}-{:02d} {:02d}:{:02d}:{:02d}'.format(in_time.year, in_time.month, in_time.day, in_time.hour, in_time.minute, in_time.second)
return out_time
def fix_recursive(dct):
for key, value in dct.items():
if isinstance(value, datetime.datetime):
mydict[key] = fix_time(value)
elif isinstance(value, dict):
fix_recursive(value)
mydict={
'Field1':'Value1'
'SomeDateField1':1516312413.729,
'Field2':'Value2',
'Field3': [
{
'Subfield3_1':'SubValue1',
'SubDateField3_1':1516312413.729
},
{
'Subfield3_2':'SubValue2',
'SubDateField3_2':1516312413.729
},
{
'Subfield3_3':'SubValue3',
'SubDateField3_3':1516312413.729
}
],
'Field4': {
'Subfield4_1':'SubValue1',
'SubDateField4_1':1516312413.729
}
}
fix_recursive(mydict)
This works great for dictionaries and nested dictionaries, but not so much for lists. So in the above example fix_recursive would correct SomeDateField1 and SubDateField4_1, but not SubDateField3_1, SubDateField3_2, or SubDateField3_3. Also, as I don't know what the input will look like before I get it, I am trying to create a function that could get values in listed nested 3 or 4 levels deep.
And suggestions would be appreciated.
Thanks!

You need to differentiate between looping over a list and a dictionary
def fix_recursive(obj):
if isinstance(obj, list): # could replace with collections.abc.MutableSequence
itr = enumerate(obj)
elif isinstance(obj, dict): # could replace with collections.abc.MutableMapping
itr = obj.items()
else:
return # don't iterate -- pass back up
for key, value in itr:
if isinstance(value, datetime.datetime):
obj[key] = fix_time(value)
else:
fix_recursive(value)

Flow your current route, added list support to recursive function.
Why use fix_time, for serialize and deserialize? Use JSON or pickle, no need to convert datetime.

Related

How to find all instances of a substring inside a nested dict that could contain more lists or lists of dicts

I am trying to write a function that takes a substring and a dict as arguments and traverses said dict and finds all instances of the substring in the dict, no matter if its inside the key, or the value. It could also be inside a list thats the value of a key, so it should be really universal no matter the dict structure.
When found the whole string containing that substring should be replaced by another value. For that I already have a function that takes that string and looks up a corresponding id.
Where Im stuck is to replace the string if its one element of a list inside a value of the dict.
What I have so far looks like this:
def dict_extract(self, search_str: str, d: dict) -> None:
if hasattr(d, "items"):
for k, v in d.copy().items():
if str(search_str) in k:
id_ = find_id(search_str)
d[id_] = d.pop(k)
if isinstance(v, str) and str(search_str) in v:
id_ = find_id(search_str)
d[k] = d[k].replace(d[k], id_)
if isinstance(v, dict):
self.dict_extract(search_str, v)
elif isinstance(v, list):
for i in v:
if isinstance(i, dict):
self.dict_extract(search_str, i)
if isinstance(i, str):
# please replace the string inside the list
An example input dict:
d = {
"level1": {
"level2": {
"k1": "v1",
"k2": [
"{replaceme:test123}",
"{replaceme:test456}"
]
},
"replaceme:test789": "blaaa"
}
}
using the function like so: dict_extract("replaceme", d)
should replace all strings that contain "replaceme" with the looked up id of it.
In the end the original dict passed in should have the ids instead of the strings.
final dict:
d = {
"level1": {
"level2": {
"k1": "v1",
"k2": [
"id123",
"id456"
]
},
"id789": "blaaa"
}
}
As said this is just an example and the stucture could look different and the replaceme strings could be in any position there.

Lists are mutable. Easiest is to replace by index. you can get this with enumerate
def dict_extract(self, search_str: str, d: dict) -> None:
...
for indx, item in enumerate(v):
if isinstance(item, dict):
self.dict_extract(search_str, item)
if isinstance(item, str):
v[indx] = <your new value>
You might have an easier time manipulating the dictionary as a string instead of as a Python structure?
The function below illustrates a straightforward replace. You could extend it with a regex that replaces everything between quotes " instead of just old.
import json
def replace_terms_in_dict(old, new, _dict):
string = json.dumps(_dict)
string = string.replace(old, new)
_dict = json.loads(string)
return _dict

How to parse JSON and determine if values are collections or nested collections?

I am looking for different ways to check values within JSON.
For instance, to check that the value is:
int - isinstance(value, int)
String - isinstance(value, str)
Array - isinstance(value, list)
But what is the cleanest way to check if values are list, dict or a list of dictionaries? How to correctly differentiate between them?
Example
{
"test": ["a","b"]
}
vs
{
"test": {"a":0, "b":1}
}
vs
{
"test": [
{"a":0},
{"b":1}
]
}

If JSON data schema validation is what you are after, the examples you gave can be easily handled by GoodJSON
Is list (of specific type of values)
from goodjson.validators import gj_all, foreach, is_list, is_string
validate_fn = gj_all(
is_list(),
foreach(is_string)
)
validate_fn(['a', 'b', 'c'])
Is dict
from goodjson.validators import foreach_key, is_dict
validate_fn = foreach_key(test=[is_dict])
validate_fn({
'test': {
'is_dict': 'yes'
}
})
Is list of dict
from goodjson.validators import foreach_key, foreach, is_dict, is_list
validate_fn = foreach_key(
test=[
is_list(),
foreach(is_dict)
]
)
validate_fn({
'test': [{'foo': 'bar'}]
})

To recursively search through a JSON data structure, and handle the case where the items are collections such as lists or dictionaries, you could do something like the following.
Example: Recursively search for JSON key
def find_key(json_input, lookup_key):
if isinstance(json_input, dict):
for k, v in json_input.items():
if k == lookup_key:
yield v
else:
yield from find_key(v, lookup_key)
elif isinstance(json_input, list):
for item in json_input:
yield from find_key(item, lookup_key)
Also, be sure to take advantage of the standard library json package. The main functions for JSON encoding and decoding are:
json.dumps()
json.loads()
See also
recursive iteration through nested json for specific key in python
Iterating through a JSON object
Validate JSON data using python

How to build path from keys of nested dictionary?

I'm writing a script that broadcasts a number of data streams over an MQTT network. I'm trying to convert the keys of the nested dicts to a string that I can then use as the MQTT broadcast channel. The data is coming in every second already formatted into a nested dict like so:
my_dict = { 'stream1': { 'dataset1': { 'value1': 123.4}},
'dataset2': { 'value1': 123.4,
'value2': 567.8},
'stream2': { 'dataset3': { 'value4': 910.2}},
'stream3': { 'value5': 'abcd'}}
I've indented it to add readability, the extra spaces aren't in the actual data. As you can see it has multiple levels, not all levels have the same number of values, and some value keys are repeated. Also, one level is shallower than the rest but I can easily make it the same depth as the rest if that makes the problem easier to solve.
The dict above should provide an output like this:
("stream1/dataset1/value1", "stream1/dataset2/value1", ..., "stream3/value5")
and so on.
I feel like recursion might be a good solution to this but I'm not sure how to maintain an ordered list of keys as I pass through the structure, as well as make sure I hit each item in the structure, generating a new path for each base-level item (note the absence of "stream1/dataset1").
Here's the code I have so far:
my_dict = { as defined above }
def get_keys(input_dict, path_list, current_path):
for key, value in input_dict.items():
if isinstance(value, dict):
current_path += value
get_keys(value, path_list, current_path)
else:
path = '/'.join(current_path)
path_list.append(path)
my_paths = []
cur_path = []
get_keys(my_dict, my_paths, cur_path)
[print(p) for p in my_paths]

This is a great opportunity to use yield to turn your function into a generator. A generator can yield a whole bunch of items and behave much like a list or other iterable. The caller loops over its return value and gets one yielded item each iteration until the function returns.
def get_keys(input_dict):
for key, value in input_dict.items():
if isinstance(value, dict):
for subkey in get_keys(value):
yield key + '/' + subkey
else:
yield key
for key in get_keys(my_dict):
print(key)
Inside the outer for loop each value is either a dict or a plain value. If it's a plain value, just yield the key. If it's a dict, iterate over it and prepend key + '/' to each sub-key.
The nice thing is that you don't have to maintain any state. path_list and current_path are gone. get_keys() simply yields the strings one by one and the yield statements and recursive loop make the flattening of keys naturally shake out.
stream1/dataset1/value1
dataset2/value1
dataset2/value2
stream2/dataset3/value4
stream3/value5

You can use a generator for that purpose:
def convert(d):
for k, v in d.items():
if isinstance(v, dict):
yield from (f'{k}/{x}' for x in convert(v))
else:
yield k
Considering your expected output you seem to have a misplaced curly brace } in your example data, but using this test data:
my_dict = { 'stream1': { 'dataset1': { 'value1': 123.4},
'dataset2': { 'value1': 123.4,
'value2': 567.8}},
'stream2': { 'dataset3': { 'value4': 910.2}},
'stream3': { 'value5': 'abcd'}}
This is the output:
print(list(convert(d)))
# ['stream1/dataset1/value1', 'stream1/dataset2/value1', 'stream1/dataset2/value2', 'stream2/dataset3/value4', 'stream3/value5']

Accessing elements in arbitarily nested structure of lists / dicts

I have nested structure of python lists and dictionaries.
tree = { 'blah': [ "booz", {'foobar': [ { 'somekey': 'someval' } ] } ] }
I also have several recursive functions that allow me to traverse tree hierarchy from top to bottom and return keys and values that I need. eg.:
def get_objectcontent(obj, objid):
result = None
if isinstance(obj, dict):
for key, val in obj.items():
if key == objid:
result = val
elif isinstance(val, list) or isinstance(val, dict):
retval = get_objectcontent(val, objid)
if retval is not None:
result = retval
elif isinstance(obj, list):
for elem in obj:
if isinstance(elem, list) or isinstance(elem, dict):
retval = get_objectcontent(elem, objid)
if retval is not None:
result = retval
return result
Unfortunately, I want to modify the data in tree too and that is the problem. Only possible solution that I can see is to construct the 'path' to element dynamically while walking down through the tree and construct something like:
tree['blah'][1]['foobar'][0]['somekey']) = 'newval'
I didn't found any way how could I point to my key in Python (when I know where in structure it is).
Is there some other, more intelligent way to solve this in Python 3?

You're ultimately looking for objid as a key in a dict, so you can change:
result = val
to:
result = obj
Then the caller can do:
result[objid] = new_val
You might also consider replacing the assignments to result with return statements, assuming you don't mind getting the first instance rather than the last.

How to change the keys of a dictionary?

Let's say I have a pretty complex dictionary.
{'fruit':'orange','colors':{'dark':4,'light':5}}
Anyway, my objective is to scan every key in this complex multi-level dictionary. Then, append "abc" to the end of each key.
So that it will be:
{'fruitabc':'orange','colorsabc':{'darkabc':4,'lightabc':5}}
How would you do that?

Keys cannot be changed. You will need to add a new key with the modified value then remove the old one, or create a new dict with a dict comprehension or the like.

For example like this:
def appendabc(somedict):
return dict(map(lambda (key, value): (str(key)+"abc", value), somedict.items()))
def transform(multilevelDict):
new = appendabc(multilevelDict)
for key, value in new.items():
if isinstance(value, dict):
new[key] = transform(value)
return new
print transform({1:2, "bam":4, 33:{3:4, 5:7}})
This will append "abc" to each key in the dictionary and any value that is a dictionary.
EDIT: There's also a really cool Python 3 version, check it out:
def transform(multilevelDict):
return {str(key)+"abc" : (transform(value) if isinstance(value, dict) else value) for key, value in multilevelDict.items()}
print(transform({1:2, "bam":4, 33:{3:4, 5:7}}))

I use the following utility function that I wrote that takes a target dict and another dict containing the translation and switches all the keys according to it:
def rename_keys(d, keys):
return dict([(keys.get(k), v) for k, v in d.items()])
So with the initial data:
data = { 'a' : 1, 'b' : 2, 'c' : 3 }
translation = { 'a' : 'aaa', 'b' : 'bbb', 'c' : 'ccc' }
We get the following:
>>> data
{'a': 1, 'c': 3, 'b': 2}
>>> rename_keys(data, translation)
{'aaa': 1, 'bbb': 2, 'ccc': 3}

>>> mydict={'fruit':'orange','colors':{'dark':4,'light':5}}
>>> def f(mydict):
... return dict((k+"abc",f(v) if hasattr(v,'keys') else v) for k,v in mydict.items())
...
>>> f(mydict)
{'fruitabc': 'orange', 'colorsabc': {'darkabc': 4, 'lightabc': 5}}

My understanding is that you can't change the keys, and that you would need to make a new set of keys and assign their values to the ones the original keys were pointing to.
I'd do something like:
def change_keys(d):
if type(d) is dict:
return dict([(k+'abc', change_keys(v)) for k, v in d.items()])
else:
return d
new_dict = change_keys(old_dict)

here's a tight little function:
def keys_swap(orig_key, new_key, d):
d[new_key] = d.pop(orig_key)
for your particular problem:
def append_to_dict_keys(appendage, d):
#note that you need to iterate through the fixed list of keys, because
#otherwise we will be iterating through a never ending key list!
for each in d.keys():
if type(d[each]) is dict:
append_to_dict_keys(appendage, d[each])
keys_swap(each, str(each) + appendage, d)
append_to_dict_keys('abc', d)

#! /usr/bin/env python
d = {'fruit':'orange', 'colors':{'dark':4,'light':5}}
def add_abc(d):
newd = dict()
for k,v in d.iteritems():
if isinstance(v, dict):
v = add_abc(v)
newd[k + "abc"] = v
return newd
d = add_abc(d)
print d

Something like that
def applytoallkeys( dic, func ):
def yielder():
for k,v in dic.iteritems():
if isinstance( v, dict):
yield func(k), applytoallkeys( v, func )
else:
yield func(k), v
return dict(yielder())
def appendword( s ):
def appender( x ):
return x+s
return appender
d = {'fruit':'orange','colors':{'dark':4,'light':5}}
print applytoallkeys( d, appendword('asd') )
I kinda like functional style, you can read just the last line and see what it does ;-)

You could do this with recursion:
import collections
in_dict={'fruit':'orange','colors':{'dark':4,'light':5}}
def transform_dict(d):
out_dict={}
for k,v in d.iteritems():
k=k+'abc'
if isinstance(v,collections.MutableMapping):
v=transform_dict(v)
out_dict[k]=v
return out_dict
out_dict=transform_dict(in_dict)
print(out_dict)
# {'fruitabc': 'orange', 'colorsabc': {'darkabc': 4, 'lightabc': 5}}

you should also consider that there is the possibility of nested dicts in nested lists, which will not be covered by the above solutions. This function ads a prefix and/or a postfix to every key within the dict.
def transformDict(multilevelDict, prefix="", postfix=""):
"""adds a prefix and/or postfix to every key name in a dict"""
new_dict = multilevelDict
if prefix != "" or postfix != "":
new_key = "%s#key#%s" % (prefix, postfix)
new_dict = dict(map(lambda (key, value): (new_key.replace('#key#', str(key)), value), new_dict.items()))
for key, value in new_dict.items():
if isinstance(value, dict):
new_dict[key] = transformDict(value, prefix, postfix)
elif isinstance(value, list):
for index, item in enumerate(value):
if isinstance(item, dict):
new_dict[key][index] = transformDict(item, prefix, postfix)
return new_dict

for k in theDict: theDict[k+'abc']=theDict.pop(k)

I use this for converting docopt POSIX-compliant command-line keys to PEP8 keys
(e.g. "--option" --> "option", "" --> "option2", "FILENAME" --> "filename")
arguments = docopt.docopt(__doc__) # dictionary
for key in arguments.keys():
if re.match('.*[-<>].*', key) or key != key.lower():
value = arguments.pop(key)
newkey = key.lower().translate(None, '-<>')
arguments[newkey] = value

Hi I'm a new user but finding an answer for same question, I can't get anything fully functional to my problem, I make this little piece of cake with a full nested replace of keys, you can send list with dict or dict.
Finally your dicts can have list with dict or more dict nested and it is all replaced with your new key needs.
To indicate who key want replace with a new key use "to" parameter sending a dict.
See at end my little example.
P/D: Sorry my bad english. =)
def re_map(value, to):
"""
Transform dictionary keys to map retrieved on to parameters.
to parameter should have as key a key name to replace an as value key name
to new dictionary.
this method is full recursive to process all levels of
#param value: list with dictionary or dictionary
#param to: dictionary with re-map keys
#type to: dict
#return: list or dict transformed
"""
if not isinstance(value, dict):
if not isinstance(value, list):
raise ValueError(
"Only dict or list with dict inside accepted for value argument.") # #IgnorePep8
if not isinstance(to, dict):
raise ValueError("Only dict accepted for to argument.")
def _re_map(value, to):
if isinstance(value, dict):
# Re map dictionary key.
# If key of original dictionary is not in "to" dictionary use same
# key otherwise use re mapped key on new dictionary with already
# value.
return {
to.get(key) or key: _re_map(dict_value, to)
for key, dict_value in value.items()
}
elif isinstance(value, list):
# if value is a list iterate it a call _re_map again to parse
# values on it.
return [_re_map(item, to) for item in value]
else:
# if not dict or list only return value.
# it can be string, integer or others.
return value
result = _re_map(value, to)
return result
if __name__ == "__main__":
# Sample test of re_map method.
# -----------------------------------------
to = {"$id": "id"}
x = []
for i in range(100):
x.append({
"$id": "first-dict",
"list_nested": [{
"$id": "list-dict-nested",
"list_dic_nested": [{
"$id": "list-dict-list-dict-nested"
}]
}],
"dict_nested": {
"$id": "non-nested"
}
})
result = re_map(x, to)
print(str(result))

A functional (and flexible) solution: this allows an arbitrary transform to be applied to keys (recursively for embedded dicts):
def remap_keys(d, keymap_f):
"""returns a new dict by recursively remapping all of d's keys using keymap_f"""
return dict([(keymap_f(k), remap_keys(v, keymap_f) if isinstance(v, dict) else v)
for k,v in d.items()])
Let's try it out; first we define our key transformation function, then apply it to the example:
def transform_key(key):
"""whatever transformation you'd like to apply to keys"""
return key + "abc"
remap_keys({'fruit':'orange','colors':{'dark':4,'light':5}}, transform_key)
{'fruitabc': 'orange', 'colorsabc': {'darkabc': 4, 'lightabc': 5}}
(note: if you're still on Python 2.x, you'll need to replace d.items() on the last line with d.iteritems() -- thanks to #Rudy for reminding me to update this post for Python 3).

Based on #AndiDog's python 3 version and similar to #sxc731's version but with a flag for whether to apply it recursively:
def transform_keys(dictionary, key_fn, recursive=True):
"""
Applies function to keys and returns as a new dictionary.
Example of key_fn:
lambda k: k + "abc"
"""
return {key_fn(key): (transform_keys(value, key_fn=key_fn, recursive=recursive)
if recursive and isinstance(value, dict) else value)
for key, value in dictionary.items()}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Recursing in nested dictionaries and lists - python

Flow your current route, added list support to recursive function. Why use fix_time, for serialize and deserialize? Use JSON or pickle, no need to convert datetime.

Related

How to find all instances of a substring inside a nested dict that could contain more lists or lists of dicts

How to parse JSON and determine if values are collections or nested collections?

How to build path from keys of nested dictionary?

Accessing elements in arbitarily nested structure of lists / dicts

How to change the keys of a dictionary?

Categories

Resources