I have a parameter dictionary holding complex data composed of strings, lists and other dictionaries. Now I want to iterate through this data.
My problem is the way - the best practice - for having an iterator, which iterates both lists and dictionaries.
What I have:
def parse_data(key, value):
iterator = None
if isinstance(value, dict):
iterator = value.items()
elif isinstance(value, list):
iterator = enumerate(value)
if iterator is not None:
for key, item in iterator:
parse_data(key, item)
return
# do some cool stuff with the rest
This does not look very pythonish. I thought of a function similar to iter giving me the possibilty to iterate over both key and item.
I think this i quite pythonic. I would just change it to:
def parse_data:
if isinstance(value, dict):
iterator = value.items()
elif isinstance(value, list):
iterator = enumerate(value)
else:
return
for key, item in iterator:
parse_data(value, key, item)
# do some cool stuff with the rest
Im not sure if there is a shorter way of doing it, but Python is built to do one thing many different ways. Maybe this could be another way of doing it. I haven't tested it so it might not work.
def parse_data(key,value):
iterator = isinstance(value,dict)
if iterator is False and isinstance(value,list):
iterator = enumerate(value)
if iterator is not None:
for key,item in iterator:
parse_data(key,item)
return
#do some cool stuff with the rest
Related
I have this nested python dictionary
dictionary = {'a':'1', 'b':{'c':'2', 'd':{'z':'5', 'e':{'f':'13', 'g':'14'}}}}
So the recommended output will be:
output = ['a:1', 'b:c:2', 'b:d:z:5', 'b:d:e:f:13', 'b:d:e:g:13']
using recursive function and without using recursive function
In cases like this, I always like to try and solve the easy part first.
def flatten_dict(dictionary):
output = []
for key, item in dictionary.items():
if isinstance(item, dict):
output.append(f'{key}:???') # Hm, here is the difficult part
else:
output.append(f'{key}:{item}')
return output
Trying flatten_dict(dictionary) now prints ['a:1', 'b:???'] which is obviously not good enough. For one thing, the list has three items too few.
First, I'd like to switch to using generator functions. This is more complicated for now, but will pay off later.
def flatten_dict(dictionary):
return list(flatten_dict_impl(dictionary))
def flatten_dict_impl(dictionary):
for key, item in dictionary.items():
if isinstance(item, dict):
yield f'{key}:???'
else:
yield f'{key}:{item}'
No change in the output yet. Time to go recusrive.
You want the output to be a flat list, so that means we have to yield multiple things in the case item is a dictionary. Only, what things? Let's try plugging in a recursive call to flatten_dict_impl on this subdictionary, that seems the most straightforward way to go.
# flatten_dict is unchanged
def flatten_dict_impl(dictionary):
for key, item in dictionary.items():
if isinstance(item, dict):
for value in flatten_dict_impl(item):
yield f'{key}:{value}'
else:
yield f'{key}:{item}'
The output is now ['a:1', 'b:c:2', 'b:d:z:5', 'b:d:e:f:13', 'b:d:e:g:14'], which is the output you wanted, except the final 14, but I think that's a typo on your part.
Now the non-recursive route. For that we need to manage some state ourselves, because we need to know how deep we are.
def flatten_dict_nonrecursive(dictionary):
return list(flatten_dict_nonrecursive_impl(dictionary))
def flatten_dict_nonrecursive_impl(dictionary):
dictionaries = [iter(dictionary.items())]
keys = []
while dictionaries:
try:
key, value = next(dictionaries[-1])
except StopIteration:
dictionaries.pop()
if keys:
keys.pop()
else:
if isinstance(value, dict):
keys.append(key)
dictionaries.append(iter(value.items()))
else:
yield ':'.join(keys + [key, value])
Now this gives the right output but is a lot less easy to understand, and a lot longer. It took a lot longer for me to get right too. There may be shorter and more obvious ways to do it that I missed, but in general recursive problems are easier to solve with recursive functions.
Such an approach can still be useful: if your dictionaries are nested hundreds or thousands of levels deep, then trying to do it recursively will likely overflow the stack.
I hope this helps. Let me know if I need to go into more detail or something.
You can use a NestedDict. First install ndicts
pip install ndicts
Then:
from ndicts.ndicts import NestedDictionary
dictionary = {'a': '1', 'b': {'c' :'2', 'd': {'z': '5', 'e': {'f': '13', 'g': '14'}}}}
nd = NestedDict(dictionary)
output = [":".join((*key, value)) for key, value in nd.items()]
why this code isn't working? trying to get returns on items which value==key
L=[0,2,2,1,5,5,6,10]
x=dict(enumerate(L))
y=(filter(x.keys()==x.values(), x.items()))
print(list(y))
The keys() method returns a view of all of the keys.
The values() method returns a view of all of the values.
So, x.keys()==x.values() is asking whether all of the keys equal all of the values, which is of course not true.
Also, filter wants a function. But you're not passing it a function, you're just passing it the result of x.keys()==x.values(), or False. To turn that into a function, you'd need to use def or lambda to create a new function.
The function you want to create is a function that takes an item, and returns true if the key equals the value. Since an item is just a 2-element tuple with the key and value for that item, the function to check that is:
y = filter((lambda item: item[0] == item[1]), x.items())
Or, if that's a bit too confusing, don't try to write it inline; just def it separately:
def key_equals_value(item):
key, value = item
return key == value
y = filter(key_equals_value, x.items())
However, this is pretty clumsy; it's much easier to write it as a comprehension than a filter call:
y = ((key, value) for (key, value) in x.items() if key == value)
As a general rule, whenever you don't already have a function to pass to filter or map, and would have to create one with def or lambda, a comprehension will usually be more readable, because you can just write the expression directly.
And, if you want a list rather than a generator, you can do that with a comprehension just by changing the parens to square brackets:
y = [(key, value) for (key, value) in x.items() if key == value]
And, if you want just the values, not the key-value pairs:
y = [value for (key, value) in x.items() if key == value]
If you find yourself confused by comprehensions, they can always be converted into nested statements, with an append at the bottom. So, that last one is equivalent to:
y = []
for key, value in x.items():
if key == value:
y.append(value)
Also, you don't really need a dict here in the first place; you just want to iterate over the index, value pairs. So:
y = [value for (index, value) in enumerate(L) if index == value]
I haven't found is there a way to do this.
Let's say I recieve a JSON object like this:
{'1_data':{'4_data':[{'5_data':'hooray'}, {'3_data':'hooray2'}], '2_data':[]}}
It's hard to instantly say, how should I get value from 3_data key: data['1_data']['4_data'][1]['3_data']
I know about pprint, it helps to understand structure a bit.
But sometimes data is huge, and it takes time
Are there any methods that may help me with that?
Here are a family of recursive generators that can be used to search through an object composed of dicts and lists. find_key yields a tuple containing a list of the dictionary keys and list indices that lead to the key that you pass in; the tuple also contains the value associated with that key. Because it's a generator it will find all matching keys if the object contains multiple matching keys, if desired.
def find_key(obj, key):
if isinstance(obj, dict):
yield from iter_dict(obj, key, [])
elif isinstance(obj, list):
yield from iter_list(obj, key, [])
def iter_dict(d, key, indices):
for k, v in d.items():
if k == key:
yield indices + [k], v
if isinstance(v, dict):
yield from iter_dict(v, key, indices + [k])
elif isinstance(v, list):
yield from iter_list(v, key, indices + [k])
def iter_list(seq, key, indices):
for k, v in enumerate(seq):
if isinstance(v, dict):
yield from iter_dict(v, key, indices + [k])
elif isinstance(v, list):
yield from iter_list(v, key, indices + [k])
# test
data = {
'1_data': {
'4_data': [
{'5_data': 'hooray'},
{'3_data': 'hooray2'}
],
'2_data': []
}
}
for t in find_key(data, '3_data'):
print(t)
output
(['1_data', '4_data', 1, '3_data'], 'hooray2')
To get a single key list you can pass find_key to the next function. And if you want to use a key list to fetch the associated value you can use a simple for loop.
seq, val = next(find_key(data, '3_data'))
print('seq:', seq, 'val:', val)
obj = data
for k in seq:
obj = obj[k]
print('obj:', obj, obj == val)
output
seq: ['1_data', '4_data', 1, '3_data'] val: hooray2
obj: hooray2 True
If the key may be missing, then give next an appropriate default tuple. Eg:
seq, val = next(find_key(data, '6_data'), ([], None))
print('seq:', seq, 'val:', val)
if seq:
obj = data
for k in seq:
obj = obj[k]
print('obj:', obj, obj == val)
output
seq: [] val: None
Note that this code is for Python 3. To run it on Python 2 you need to replace all the yield from statements, eg replace
yield from iter_dict(obj, key, [])
with
for u in iter_dict(obj, key, []):
yield u
How it works
To understand how this code works you need to be familiar with recursion and with Python generators. You may also find this page helpful: Understanding Generators in Python; there are also various Python generators tutorials available online.
The Python object returned by json.load or json.loads is generally a dict, but it can also be a list. We pass that object to the find_key generator as the obj arg, along with the key string that we want to locate. find_key then calls either iter_dict or iter_list, as appropriate, passing them the object, the key, and an empty list indices, which is used to collect the dict keys and list indices that lead to the key we want.
iter_dict iterates over each (k, v) pair at the top level of its d dict arg. If k matches the key we're looking for then the current indices list is yielded with k appended to it, along with the associated value. Because iter_dict is recursive the yielded (indices list, value) pairs get passed up to the previous level of recursion, eventually making their way up to find_key and then to the code that called find_key. Note that this is the "base case" of our recursion: it's the part of the code that determines whether this recursion path leads to the key we want. If a recursion path never finds a key matching the key we're looking for then that recursion path won't add anything to indices and it will terminate without yielding anything.
If the current v is a dict, then we need to examine all the (key, value) pairs it contains. We do that by making a recursive call to iter_dict, passing that v is its starting object and the current indices list. If the current v is a list we instead call iter_list, passing it the same args.
iter_list works similarly to iter_dict except that a list doesn't have any keys, it only contains values, so we don't perform the k == key test, we just recurse into any dicts or lists that the original list contains.
The end result of this process is that when we iterate over find_key we get pairs of (indices, value) where each indices list is the sequence of dict keys and list indices that succesfully terminate in a dict item with our desired key, and value is the value associated with that particular key.
If you'd like to see some other examples of this code in use please see how to modify the key of a nested Json and How can I select deeply nested key:values from dictionary in python.
Also take look at my new, more streamlined show_indices function.
I haven't found is there a way to do this.
Let's say I recieve a JSON object like this:
{'1_data':{'4_data':[{'5_data':'hooray'}, {'3_data':'hooray2'}], '2_data':[]}}
It's hard to instantly say, how should I get value from 3_data key: data['1_data']['4_data'][1]['3_data']
I know about pprint, it helps to understand structure a bit.
But sometimes data is huge, and it takes time
Are there any methods that may help me with that?
Here are a family of recursive generators that can be used to search through an object composed of dicts and lists. find_key yields a tuple containing a list of the dictionary keys and list indices that lead to the key that you pass in; the tuple also contains the value associated with that key. Because it's a generator it will find all matching keys if the object contains multiple matching keys, if desired.
def find_key(obj, key):
if isinstance(obj, dict):
yield from iter_dict(obj, key, [])
elif isinstance(obj, list):
yield from iter_list(obj, key, [])
def iter_dict(d, key, indices):
for k, v in d.items():
if k == key:
yield indices + [k], v
if isinstance(v, dict):
yield from iter_dict(v, key, indices + [k])
elif isinstance(v, list):
yield from iter_list(v, key, indices + [k])
def iter_list(seq, key, indices):
for k, v in enumerate(seq):
if isinstance(v, dict):
yield from iter_dict(v, key, indices + [k])
elif isinstance(v, list):
yield from iter_list(v, key, indices + [k])
# test
data = {
'1_data': {
'4_data': [
{'5_data': 'hooray'},
{'3_data': 'hooray2'}
],
'2_data': []
}
}
for t in find_key(data, '3_data'):
print(t)
output
(['1_data', '4_data', 1, '3_data'], 'hooray2')
To get a single key list you can pass find_key to the next function. And if you want to use a key list to fetch the associated value you can use a simple for loop.
seq, val = next(find_key(data, '3_data'))
print('seq:', seq, 'val:', val)
obj = data
for k in seq:
obj = obj[k]
print('obj:', obj, obj == val)
output
seq: ['1_data', '4_data', 1, '3_data'] val: hooray2
obj: hooray2 True
If the key may be missing, then give next an appropriate default tuple. Eg:
seq, val = next(find_key(data, '6_data'), ([], None))
print('seq:', seq, 'val:', val)
if seq:
obj = data
for k in seq:
obj = obj[k]
print('obj:', obj, obj == val)
output
seq: [] val: None
Note that this code is for Python 3. To run it on Python 2 you need to replace all the yield from statements, eg replace
yield from iter_dict(obj, key, [])
with
for u in iter_dict(obj, key, []):
yield u
How it works
To understand how this code works you need to be familiar with recursion and with Python generators. You may also find this page helpful: Understanding Generators in Python; there are also various Python generators tutorials available online.
The Python object returned by json.load or json.loads is generally a dict, but it can also be a list. We pass that object to the find_key generator as the obj arg, along with the key string that we want to locate. find_key then calls either iter_dict or iter_list, as appropriate, passing them the object, the key, and an empty list indices, which is used to collect the dict keys and list indices that lead to the key we want.
iter_dict iterates over each (k, v) pair at the top level of its d dict arg. If k matches the key we're looking for then the current indices list is yielded with k appended to it, along with the associated value. Because iter_dict is recursive the yielded (indices list, value) pairs get passed up to the previous level of recursion, eventually making their way up to find_key and then to the code that called find_key. Note that this is the "base case" of our recursion: it's the part of the code that determines whether this recursion path leads to the key we want. If a recursion path never finds a key matching the key we're looking for then that recursion path won't add anything to indices and it will terminate without yielding anything.
If the current v is a dict, then we need to examine all the (key, value) pairs it contains. We do that by making a recursive call to iter_dict, passing that v is its starting object and the current indices list. If the current v is a list we instead call iter_list, passing it the same args.
iter_list works similarly to iter_dict except that a list doesn't have any keys, it only contains values, so we don't perform the k == key test, we just recurse into any dicts or lists that the original list contains.
The end result of this process is that when we iterate over find_key we get pairs of (indices, value) where each indices list is the sequence of dict keys and list indices that succesfully terminate in a dict item with our desired key, and value is the value associated with that particular key.
If you'd like to see some other examples of this code in use please see how to modify the key of a nested Json and How can I select deeply nested key:values from dictionary in python.
Also take look at my new, more streamlined show_indices function.
I have a giant dict with a lot of nested dicts -- like a giant tree, and depth in unknown.
I need a function, something like find_value(), that takes dict, value (as string), and returns list of lists, each one of them is "path" (sequential chain of keys from first key to key (or key value) with found value). If nothing found, returns empty list.
I wrote this code:
def find_value(dict, sought_value, current_path, result):
for key,value in dict.items():
current_path.pop()
current_path.append(key)
if sought_value in key:
result.append(current_path)
if type(value) == type(''):
if sought_value in value:
result.append(current_path+[value])
else:
current_path.append(key)
result = find_value(value, sought_value, current_path, result)
current_path.pop()
return result
I call this function to test:
result = find_value(self.dump, sought_value, ['START_KEY_FOR_DELETE'], [])
if not len(result):
print "forgive me, mylord, i'm afraid we didn't find him.."
elif len(result) == 1:
print "bless gods, for all that we have one match, mylord!"
For some inexplicable reasons, my implementation of this function fails some of my tests. I started to debug and find out, that even if current_path prints correct things (it always does, I checked!), the result is inexplicably corrupted. Maybe it is because of recursion magic?
Can anyone help me with this problem? Maybe there is a simple solution for my tasks?
When you write result.append(current_path), you're not copying current_path, which continues to mutate. Change it to result.append(current_path[:]).
I doubt you can do much to optimize a recursive search like that. Assuming there are many lookups on the same dictionary, and the dictionary doesn't change once loaded, then you can index it to get O(1) lookups...
def build_index(src, dest, path=[]):
for k, v in src.iteritems():
fk = path+[k]
if isinstance(v, dict):
build_index(v, dest, fk)
else:
try:
dest[v].append(fk)
except KeyError:
dest[v] = [fk]
>>> data = {'foo': {'sub1': 'blah'}, 'bar': {'sub2': 'whatever'}, 'baz': 'blah'}
>>> index = {}
>>> build_index(data, index)
>>> index
{'blah': [['baz'], ['foo', 'sub1']], 'whatever': [['bar', 'sub2']]}
>>> index['blah']
[['baz'], ['foo', 'sub1']]