Accessing elements in arbitarily nested structure of lists / dicts - python

I have nested structure of python lists and dictionaries.
tree = { 'blah': [ "booz", {'foobar': [ { 'somekey': 'someval' } ] } ] }
I also have several recursive functions that allow me to traverse tree hierarchy from top to bottom and return keys and values that I need. eg.:
def get_objectcontent(obj, objid):
result = None
if isinstance(obj, dict):
for key, val in obj.items():
if key == objid:
result = val
elif isinstance(val, list) or isinstance(val, dict):
retval = get_objectcontent(val, objid)
if retval is not None:
result = retval
elif isinstance(obj, list):
for elem in obj:
if isinstance(elem, list) or isinstance(elem, dict):
retval = get_objectcontent(elem, objid)
if retval is not None:
result = retval
return result
Unfortunately, I want to modify the data in tree too and that is the problem. Only possible solution that I can see is to construct the 'path' to element dynamically while walking down through the tree and construct something like:
tree['blah'][1]['foobar'][0]['somekey']) = 'newval'
I didn't found any way how could I point to my key in Python (when I know where in structure it is).
Is there some other, more intelligent way to solve this in Python 3?

You're ultimately looking for objid as a key in a dict, so you can change:
result = val
to:
result = obj
Then the caller can do:
result[objid] = new_val
You might also consider replacing the assignments to result with return statements, assuming you don't mind getting the first instance rather than the last.

Related

How to find all instances of a substring inside a nested dict that could contain more lists or lists of dicts

I am trying to write a function that takes a substring and a dict as arguments and traverses said dict and finds all instances of the substring in the dict, no matter if its inside the key, or the value. It could also be inside a list thats the value of a key, so it should be really universal no matter the dict structure.
When found the whole string containing that substring should be replaced by another value. For that I already have a function that takes that string and looks up a corresponding id.
Where Im stuck is to replace the string if its one element of a list inside a value of the dict.
What I have so far looks like this:
def dict_extract(self, search_str: str, d: dict) -> None:
if hasattr(d, "items"):
for k, v in d.copy().items():
if str(search_str) in k:
id_ = find_id(search_str)
d[id_] = d.pop(k)
if isinstance(v, str) and str(search_str) in v:
id_ = find_id(search_str)
d[k] = d[k].replace(d[k], id_)
if isinstance(v, dict):
self.dict_extract(search_str, v)
elif isinstance(v, list):
for i in v:
if isinstance(i, dict):
self.dict_extract(search_str, i)
if isinstance(i, str):
# please replace the string inside the list
An example input dict:
d = {
"level1": {
"level2": {
"k1": "v1",
"k2": [
"{replaceme:test123}",
"{replaceme:test456}"
]
},
"replaceme:test789": "blaaa"
}
}
using the function like so: dict_extract("replaceme", d)
should replace all strings that contain "replaceme" with the looked up id of it.
In the end the original dict passed in should have the ids instead of the strings.
final dict:
d = {
"level1": {
"level2": {
"k1": "v1",
"k2": [
"id123",
"id456"
]
},
"id789": "blaaa"
}
}
As said this is just an example and the stucture could look different and the replaceme strings could be in any position there.
Lists are mutable. Easiest is to replace by index. you can get this with enumerate
def dict_extract(self, search_str: str, d: dict) -> None:
...
for indx, item in enumerate(v):
if isinstance(item, dict):
self.dict_extract(search_str, item)
if isinstance(item, str):
v[indx] = <your new value>
You might have an easier time manipulating the dictionary as a string instead of as a Python structure?
The function below illustrates a straightforward replace. You could extend it with a regex that replaces everything between quotes " instead of just old.
import json
def replace_terms_in_dict(old, new, _dict):
string = json.dumps(_dict)
string = string.replace(old, new)
_dict = json.loads(string)
return _dict

Recursing in nested dictionaries and lists

This is sort of a followup question to one of my previous questions. I have some dictionaries where I need to look at every value they contain and if that value is a datetime I need to format it a specific way. I also need to be able to recurse into nested dictionaries and lists. This is what I have so far:
def fix_time(in_time):
out_time = '{}-{:02d}-{:02d} {:02d}:{:02d}:{:02d}'.format(in_time.year, in_time.month, in_time.day, in_time.hour, in_time.minute, in_time.second)
return out_time
def fix_recursive(dct):
for key, value in dct.items():
if isinstance(value, datetime.datetime):
mydict[key] = fix_time(value)
elif isinstance(value, dict):
fix_recursive(value)
mydict={
'Field1':'Value1'
'SomeDateField1':1516312413.729,
'Field2':'Value2',
'Field3': [
{
'Subfield3_1':'SubValue1',
'SubDateField3_1':1516312413.729
},
{
'Subfield3_2':'SubValue2',
'SubDateField3_2':1516312413.729
},
{
'Subfield3_3':'SubValue3',
'SubDateField3_3':1516312413.729
}
],
'Field4': {
'Subfield4_1':'SubValue1',
'SubDateField4_1':1516312413.729
}
}
fix_recursive(mydict)
This works great for dictionaries and nested dictionaries, but not so much for lists. So in the above example fix_recursive would correct SomeDateField1 and SubDateField4_1, but not SubDateField3_1, SubDateField3_2, or SubDateField3_3. Also, as I don't know what the input will look like before I get it, I am trying to create a function that could get values in listed nested 3 or 4 levels deep.
And suggestions would be appreciated.
Thanks!
You need to differentiate between looping over a list and a dictionary
def fix_recursive(obj):
if isinstance(obj, list): # could replace with collections.abc.MutableSequence
itr = enumerate(obj)
elif isinstance(obj, dict): # could replace with collections.abc.MutableMapping
itr = obj.items()
else:
return # don't iterate -- pass back up
for key, value in itr:
if isinstance(value, datetime.datetime):
obj[key] = fix_time(value)
else:
fix_recursive(value)
Flow your current route, added list support to recursive function.
Why use fix_time, for serialize and deserialize? Use JSON or pickle, no need to convert datetime.

Find duplicates for mixed type values in dictionaries

I would like to recognize and group duplicates values in a dictionary. To do this I build a pseudo-hash (better read signature) of my data set as follow:
from pickle import dumps
taxonomy = {}
binder = defaultdict(list)
for key, value in ds.items():
signature = dumps(value)
taxonomy[signature] = value
binder[signature].append(key)
For a concrete use-case see this question.
Unfortunately I realized that if the following statement is True:
>>> ds['key1'] == ds['key2']
True
This one is not always True anymore:
>>> dumps(ds['key1']) == dumps(ds['key2'])
False
I notice the key order on the dumped output differ for both dict. If I copy/paste the output of ds['key1'] and ds['key2'] into new dictionaries I can make the comparison successful.
As an overkill alternative I could traverse my dataset recursively and replace dict instances with OrderedDict:
import copy
def faithfulrepr(od):
od = od.deepcopy(od)
if isinstance(od, collections.Mapping):
res = collections.OrderedDict()
for k, v in sorted(od.items()):
res[k] = faithfulrepr(v)
return repr(res)
if isinstance(od, list):
for i, v in enumerate(od):
od[i] = faithfulrepr(v)
return repr(od)
return repr(od)
>>> faithfulrepr(ds['key1']) == faithfulrepr(ds['key2'])
True
I am worried about this naive approach because I do not know whether I cover all the possible situations.
What other (generic) alternative can I use?
The first thing is to remove the call to deepcopy which is your bottleneck here:
def faithfulrepr(ds):
if isinstance(ds, collections.Mapping):
res = collections.OrderedDict(
(k, faithfulrepr(v)) for k, v in sorted(ds.items())
)
elif isinstance(ds, list):
res = [faithfulrepr(v) for v in ds]
else:
res = ds
return repr(res)
However sorted and repr have their drawbacks:
you can't trully compare custom types;
you can't use mappings with different types of keys.
So the second thing is to get rid of faithfulrepr and compare objects with __eq__:
binder, values = [], []
for key, value in ds.items():
try:
index = values.index(value)
except ValueError:
values.append(value)
binder.append([key])
else:
binder[index].append(key)
grouped = dict(zip(map(tuple, binder), values))

Flattening a list of dicts of lists of dicts (etc) of unknown depth in Python (nightmarish JSON structure)

I'm dealing with a JSON structure which is output to me in structures like this:
[{u'item': u'something',
u'data': {
u'other': u'',
u'else':
[
{
u'more': u'even more',
u'argh':
{
...etc..etc
As you can see, these are nested dicts and lists.
There is much discussion about flattening these recursively, but I haven't found one yet that can deal with a list of dictionaries which may in turn contain either dictionaries of lists, lists of lists, dictionaries of dictionaries etc; which are of unknown depth! In some cases the depth may be up to 100 or so.
I've been trying this so far without much luck (python 2.7.2):
def flatten(structure):
out = []
for item in structure:
if isinstance(item, (list, tuple)):
out.extend(flatten(item))
if isinstance(item, (dict)):
for dictkey in item.keys():
out.extend(flatten(item[dictkey]))
else:
out.append(item)
return out
Any ideas?
UPDATE
This pretty much works:
def flatten(l):
out = []
if isinstance(l, (list, tuple)):
for item in l:
out.extend(flatten(item))
elif isinstance(l, (dict)):
for dictkey in l.keys():
out.extend(flatten(l[dictkey]))
elif isinstance(l, (str, int, unicode)):
out.append(l)
return out
Since the depth of your data is arbitrary, it is easier to resort to recursion to flatten it. This function creates a flat dictionary, with the path to each data item composed as the key, in order to avoid collisions.
You can retrieve its contents later with for key in sorted(dic_.keys()), for example.
I didn't test it, since you did not provide a "valid" snippet of your data.
def flatten(structure, key="", path="", flattened=None):
if flattened is None:
flattened = {}
if type(structure) not in(dict, list):
flattened[((path + "_") if path else "") + key] = structure
elif isinstance(structure, list):
for i, item in enumerate(structure):
flatten(item, "%d" % i, path + "_" + key, flattened)
else:
for new_key, value in structure.items():
flatten(value, new_key, path + "_" + key, flattened)
return flattened

How to change the keys of a dictionary?

Let's say I have a pretty complex dictionary.
{'fruit':'orange','colors':{'dark':4,'light':5}}
Anyway, my objective is to scan every key in this complex multi-level dictionary. Then, append "abc" to the end of each key.
So that it will be:
{'fruitabc':'orange','colorsabc':{'darkabc':4,'lightabc':5}}
How would you do that?
Keys cannot be changed. You will need to add a new key with the modified value then remove the old one, or create a new dict with a dict comprehension or the like.
For example like this:
def appendabc(somedict):
return dict(map(lambda (key, value): (str(key)+"abc", value), somedict.items()))
def transform(multilevelDict):
new = appendabc(multilevelDict)
for key, value in new.items():
if isinstance(value, dict):
new[key] = transform(value)
return new
print transform({1:2, "bam":4, 33:{3:4, 5:7}})
This will append "abc" to each key in the dictionary and any value that is a dictionary.
EDIT: There's also a really cool Python 3 version, check it out:
def transform(multilevelDict):
return {str(key)+"abc" : (transform(value) if isinstance(value, dict) else value) for key, value in multilevelDict.items()}
print(transform({1:2, "bam":4, 33:{3:4, 5:7}}))
I use the following utility function that I wrote that takes a target dict and another dict containing the translation and switches all the keys according to it:
def rename_keys(d, keys):
return dict([(keys.get(k), v) for k, v in d.items()])
So with the initial data:
data = { 'a' : 1, 'b' : 2, 'c' : 3 }
translation = { 'a' : 'aaa', 'b' : 'bbb', 'c' : 'ccc' }
We get the following:
>>> data
{'a': 1, 'c': 3, 'b': 2}
>>> rename_keys(data, translation)
{'aaa': 1, 'bbb': 2, 'ccc': 3}
>>> mydict={'fruit':'orange','colors':{'dark':4,'light':5}}
>>> def f(mydict):
... return dict((k+"abc",f(v) if hasattr(v,'keys') else v) for k,v in mydict.items())
...
>>> f(mydict)
{'fruitabc': 'orange', 'colorsabc': {'darkabc': 4, 'lightabc': 5}}
My understanding is that you can't change the keys, and that you would need to make a new set of keys and assign their values to the ones the original keys were pointing to.
I'd do something like:
def change_keys(d):
if type(d) is dict:
return dict([(k+'abc', change_keys(v)) for k, v in d.items()])
else:
return d
new_dict = change_keys(old_dict)
here's a tight little function:
def keys_swap(orig_key, new_key, d):
d[new_key] = d.pop(orig_key)
for your particular problem:
def append_to_dict_keys(appendage, d):
#note that you need to iterate through the fixed list of keys, because
#otherwise we will be iterating through a never ending key list!
for each in d.keys():
if type(d[each]) is dict:
append_to_dict_keys(appendage, d[each])
keys_swap(each, str(each) + appendage, d)
append_to_dict_keys('abc', d)
#! /usr/bin/env python
d = {'fruit':'orange', 'colors':{'dark':4,'light':5}}
def add_abc(d):
newd = dict()
for k,v in d.iteritems():
if isinstance(v, dict):
v = add_abc(v)
newd[k + "abc"] = v
return newd
d = add_abc(d)
print d
Something like that
def applytoallkeys( dic, func ):
def yielder():
for k,v in dic.iteritems():
if isinstance( v, dict):
yield func(k), applytoallkeys( v, func )
else:
yield func(k), v
return dict(yielder())
def appendword( s ):
def appender( x ):
return x+s
return appender
d = {'fruit':'orange','colors':{'dark':4,'light':5}}
print applytoallkeys( d, appendword('asd') )
I kinda like functional style, you can read just the last line and see what it does ;-)
You could do this with recursion:
import collections
in_dict={'fruit':'orange','colors':{'dark':4,'light':5}}
def transform_dict(d):
out_dict={}
for k,v in d.iteritems():
k=k+'abc'
if isinstance(v,collections.MutableMapping):
v=transform_dict(v)
out_dict[k]=v
return out_dict
out_dict=transform_dict(in_dict)
print(out_dict)
# {'fruitabc': 'orange', 'colorsabc': {'darkabc': 4, 'lightabc': 5}}
you should also consider that there is the possibility of nested dicts in nested lists, which will not be covered by the above solutions. This function ads a prefix and/or a postfix to every key within the dict.
def transformDict(multilevelDict, prefix="", postfix=""):
"""adds a prefix and/or postfix to every key name in a dict"""
new_dict = multilevelDict
if prefix != "" or postfix != "":
new_key = "%s#key#%s" % (prefix, postfix)
new_dict = dict(map(lambda (key, value): (new_key.replace('#key#', str(key)), value), new_dict.items()))
for key, value in new_dict.items():
if isinstance(value, dict):
new_dict[key] = transformDict(value, prefix, postfix)
elif isinstance(value, list):
for index, item in enumerate(value):
if isinstance(item, dict):
new_dict[key][index] = transformDict(item, prefix, postfix)
return new_dict
for k in theDict: theDict[k+'abc']=theDict.pop(k)
I use this for converting docopt POSIX-compliant command-line keys to PEP8 keys
(e.g. "--option" --> "option", "" --> "option2", "FILENAME" --> "filename")
arguments = docopt.docopt(__doc__) # dictionary
for key in arguments.keys():
if re.match('.*[-<>].*', key) or key != key.lower():
value = arguments.pop(key)
newkey = key.lower().translate(None, '-<>')
arguments[newkey] = value
Hi I'm a new user but finding an answer for same question, I can't get anything fully functional to my problem, I make this little piece of cake with a full nested replace of keys, you can send list with dict or dict.
Finally your dicts can have list with dict or more dict nested and it is all replaced with your new key needs.
To indicate who key want replace with a new key use "to" parameter sending a dict.
See at end my little example.
P/D: Sorry my bad english. =)
def re_map(value, to):
"""
Transform dictionary keys to map retrieved on to parameters.
to parameter should have as key a key name to replace an as value key name
to new dictionary.
this method is full recursive to process all levels of
#param value: list with dictionary or dictionary
#param to: dictionary with re-map keys
#type to: dict
#return: list or dict transformed
"""
if not isinstance(value, dict):
if not isinstance(value, list):
raise ValueError(
"Only dict or list with dict inside accepted for value argument.") # #IgnorePep8
if not isinstance(to, dict):
raise ValueError("Only dict accepted for to argument.")
def _re_map(value, to):
if isinstance(value, dict):
# Re map dictionary key.
# If key of original dictionary is not in "to" dictionary use same
# key otherwise use re mapped key on new dictionary with already
# value.
return {
to.get(key) or key: _re_map(dict_value, to)
for key, dict_value in value.items()
}
elif isinstance(value, list):
# if value is a list iterate it a call _re_map again to parse
# values on it.
return [_re_map(item, to) for item in value]
else:
# if not dict or list only return value.
# it can be string, integer or others.
return value
result = _re_map(value, to)
return result
if __name__ == "__main__":
# Sample test of re_map method.
# -----------------------------------------
to = {"$id": "id"}
x = []
for i in range(100):
x.append({
"$id": "first-dict",
"list_nested": [{
"$id": "list-dict-nested",
"list_dic_nested": [{
"$id": "list-dict-list-dict-nested"
}]
}],
"dict_nested": {
"$id": "non-nested"
}
})
result = re_map(x, to)
print(str(result))
A functional (and flexible) solution: this allows an arbitrary transform to be applied to keys (recursively for embedded dicts):
def remap_keys(d, keymap_f):
"""returns a new dict by recursively remapping all of d's keys using keymap_f"""
return dict([(keymap_f(k), remap_keys(v, keymap_f) if isinstance(v, dict) else v)
for k,v in d.items()])
Let's try it out; first we define our key transformation function, then apply it to the example:
def transform_key(key):
"""whatever transformation you'd like to apply to keys"""
return key + "abc"
remap_keys({'fruit':'orange','colors':{'dark':4,'light':5}}, transform_key)
{'fruitabc': 'orange', 'colorsabc': {'darkabc': 4, 'lightabc': 5}}
(note: if you're still on Python 2.x, you'll need to replace d.items() on the last line with d.iteritems() -- thanks to #Rudy for reminding me to update this post for Python 3).
Based on #AndiDog's python 3 version and similar to #sxc731's version but with a flag for whether to apply it recursively:
def transform_keys(dictionary, key_fn, recursive=True):
"""
Applies function to keys and returns as a new dictionary.
Example of key_fn:
lambda k: k + "abc"
"""
return {key_fn(key): (transform_keys(value, key_fn=key_fn, recursive=recursive)
if recursive and isinstance(value, dict) else value)
for key, value in dictionary.items()}

Categories

Resources