I haven't found is there a way to do this.
Let's say I recieve a JSON object like this:
{'1_data':{'4_data':[{'5_data':'hooray'}, {'3_data':'hooray2'}], '2_data':[]}}
It's hard to instantly say, how should I get value from 3_data key: data['1_data']['4_data'][1]['3_data']
I know about pprint, it helps to understand structure a bit.
But sometimes data is huge, and it takes time
Are there any methods that may help me with that?
Here are a family of recursive generators that can be used to search through an object composed of dicts and lists. find_key yields a tuple containing a list of the dictionary keys and list indices that lead to the key that you pass in; the tuple also contains the value associated with that key. Because it's a generator it will find all matching keys if the object contains multiple matching keys, if desired.
def find_key(obj, key):
if isinstance(obj, dict):
yield from iter_dict(obj, key, [])
elif isinstance(obj, list):
yield from iter_list(obj, key, [])
def iter_dict(d, key, indices):
for k, v in d.items():
if k == key:
yield indices + [k], v
if isinstance(v, dict):
yield from iter_dict(v, key, indices + [k])
elif isinstance(v, list):
yield from iter_list(v, key, indices + [k])
def iter_list(seq, key, indices):
for k, v in enumerate(seq):
if isinstance(v, dict):
yield from iter_dict(v, key, indices + [k])
elif isinstance(v, list):
yield from iter_list(v, key, indices + [k])
# test
data = {
'1_data': {
'4_data': [
{'5_data': 'hooray'},
{'3_data': 'hooray2'}
],
'2_data': []
}
}
for t in find_key(data, '3_data'):
print(t)
output
(['1_data', '4_data', 1, '3_data'], 'hooray2')
To get a single key list you can pass find_key to the next function. And if you want to use a key list to fetch the associated value you can use a simple for loop.
seq, val = next(find_key(data, '3_data'))
print('seq:', seq, 'val:', val)
obj = data
for k in seq:
obj = obj[k]
print('obj:', obj, obj == val)
output
seq: ['1_data', '4_data', 1, '3_data'] val: hooray2
obj: hooray2 True
If the key may be missing, then give next an appropriate default tuple. Eg:
seq, val = next(find_key(data, '6_data'), ([], None))
print('seq:', seq, 'val:', val)
if seq:
obj = data
for k in seq:
obj = obj[k]
print('obj:', obj, obj == val)
output
seq: [] val: None
Note that this code is for Python 3. To run it on Python 2 you need to replace all the yield from statements, eg replace
yield from iter_dict(obj, key, [])
with
for u in iter_dict(obj, key, []):
yield u
How it works
To understand how this code works you need to be familiar with recursion and with Python generators. You may also find this page helpful: Understanding Generators in Python; there are also various Python generators tutorials available online.
The Python object returned by json.load or json.loads is generally a dict, but it can also be a list. We pass that object to the find_key generator as the obj arg, along with the key string that we want to locate. find_key then calls either iter_dict or iter_list, as appropriate, passing them the object, the key, and an empty list indices, which is used to collect the dict keys and list indices that lead to the key we want.
iter_dict iterates over each (k, v) pair at the top level of its d dict arg. If k matches the key we're looking for then the current indices list is yielded with k appended to it, along with the associated value. Because iter_dict is recursive the yielded (indices list, value) pairs get passed up to the previous level of recursion, eventually making their way up to find_key and then to the code that called find_key. Note that this is the "base case" of our recursion: it's the part of the code that determines whether this recursion path leads to the key we want. If a recursion path never finds a key matching the key we're looking for then that recursion path won't add anything to indices and it will terminate without yielding anything.
If the current v is a dict, then we need to examine all the (key, value) pairs it contains. We do that by making a recursive call to iter_dict, passing that v is its starting object and the current indices list. If the current v is a list we instead call iter_list, passing it the same args.
iter_list works similarly to iter_dict except that a list doesn't have any keys, it only contains values, so we don't perform the k == key test, we just recurse into any dicts or lists that the original list contains.
The end result of this process is that when we iterate over find_key we get pairs of (indices, value) where each indices list is the sequence of dict keys and list indices that succesfully terminate in a dict item with our desired key, and value is the value associated with that particular key.
If you'd like to see some other examples of this code in use please see how to modify the key of a nested Json and How can I select deeply nested key:values from dictionary in python.
Also take look at my new, more streamlined show_indices function.
Related
why this code isn't working? trying to get returns on items which value==key
L=[0,2,2,1,5,5,6,10]
x=dict(enumerate(L))
y=(filter(x.keys()==x.values(), x.items()))
print(list(y))
The keys() method returns a view of all of the keys.
The values() method returns a view of all of the values.
So, x.keys()==x.values() is asking whether all of the keys equal all of the values, which is of course not true.
Also, filter wants a function. But you're not passing it a function, you're just passing it the result of x.keys()==x.values(), or False. To turn that into a function, you'd need to use def or lambda to create a new function.
The function you want to create is a function that takes an item, and returns true if the key equals the value. Since an item is just a 2-element tuple with the key and value for that item, the function to check that is:
y = filter((lambda item: item[0] == item[1]), x.items())
Or, if that's a bit too confusing, don't try to write it inline; just def it separately:
def key_equals_value(item):
key, value = item
return key == value
y = filter(key_equals_value, x.items())
However, this is pretty clumsy; it's much easier to write it as a comprehension than a filter call:
y = ((key, value) for (key, value) in x.items() if key == value)
As a general rule, whenever you don't already have a function to pass to filter or map, and would have to create one with def or lambda, a comprehension will usually be more readable, because you can just write the expression directly.
And, if you want a list rather than a generator, you can do that with a comprehension just by changing the parens to square brackets:
y = [(key, value) for (key, value) in x.items() if key == value]
And, if you want just the values, not the key-value pairs:
y = [value for (key, value) in x.items() if key == value]
If you find yourself confused by comprehensions, they can always be converted into nested statements, with an append at the bottom. So, that last one is equivalent to:
y = []
for key, value in x.items():
if key == value:
y.append(value)
Also, you don't really need a dict here in the first place; you just want to iterate over the index, value pairs. So:
y = [value for (index, value) in enumerate(L) if index == value]
I haven't found is there a way to do this.
Let's say I recieve a JSON object like this:
{'1_data':{'4_data':[{'5_data':'hooray'}, {'3_data':'hooray2'}], '2_data':[]}}
It's hard to instantly say, how should I get value from 3_data key: data['1_data']['4_data'][1]['3_data']
I know about pprint, it helps to understand structure a bit.
But sometimes data is huge, and it takes time
Are there any methods that may help me with that?
Here are a family of recursive generators that can be used to search through an object composed of dicts and lists. find_key yields a tuple containing a list of the dictionary keys and list indices that lead to the key that you pass in; the tuple also contains the value associated with that key. Because it's a generator it will find all matching keys if the object contains multiple matching keys, if desired.
def find_key(obj, key):
if isinstance(obj, dict):
yield from iter_dict(obj, key, [])
elif isinstance(obj, list):
yield from iter_list(obj, key, [])
def iter_dict(d, key, indices):
for k, v in d.items():
if k == key:
yield indices + [k], v
if isinstance(v, dict):
yield from iter_dict(v, key, indices + [k])
elif isinstance(v, list):
yield from iter_list(v, key, indices + [k])
def iter_list(seq, key, indices):
for k, v in enumerate(seq):
if isinstance(v, dict):
yield from iter_dict(v, key, indices + [k])
elif isinstance(v, list):
yield from iter_list(v, key, indices + [k])
# test
data = {
'1_data': {
'4_data': [
{'5_data': 'hooray'},
{'3_data': 'hooray2'}
],
'2_data': []
}
}
for t in find_key(data, '3_data'):
print(t)
output
(['1_data', '4_data', 1, '3_data'], 'hooray2')
To get a single key list you can pass find_key to the next function. And if you want to use a key list to fetch the associated value you can use a simple for loop.
seq, val = next(find_key(data, '3_data'))
print('seq:', seq, 'val:', val)
obj = data
for k in seq:
obj = obj[k]
print('obj:', obj, obj == val)
output
seq: ['1_data', '4_data', 1, '3_data'] val: hooray2
obj: hooray2 True
If the key may be missing, then give next an appropriate default tuple. Eg:
seq, val = next(find_key(data, '6_data'), ([], None))
print('seq:', seq, 'val:', val)
if seq:
obj = data
for k in seq:
obj = obj[k]
print('obj:', obj, obj == val)
output
seq: [] val: None
Note that this code is for Python 3. To run it on Python 2 you need to replace all the yield from statements, eg replace
yield from iter_dict(obj, key, [])
with
for u in iter_dict(obj, key, []):
yield u
How it works
To understand how this code works you need to be familiar with recursion and with Python generators. You may also find this page helpful: Understanding Generators in Python; there are also various Python generators tutorials available online.
The Python object returned by json.load or json.loads is generally a dict, but it can also be a list. We pass that object to the find_key generator as the obj arg, along with the key string that we want to locate. find_key then calls either iter_dict or iter_list, as appropriate, passing them the object, the key, and an empty list indices, which is used to collect the dict keys and list indices that lead to the key we want.
iter_dict iterates over each (k, v) pair at the top level of its d dict arg. If k matches the key we're looking for then the current indices list is yielded with k appended to it, along with the associated value. Because iter_dict is recursive the yielded (indices list, value) pairs get passed up to the previous level of recursion, eventually making their way up to find_key and then to the code that called find_key. Note that this is the "base case" of our recursion: it's the part of the code that determines whether this recursion path leads to the key we want. If a recursion path never finds a key matching the key we're looking for then that recursion path won't add anything to indices and it will terminate without yielding anything.
If the current v is a dict, then we need to examine all the (key, value) pairs it contains. We do that by making a recursive call to iter_dict, passing that v is its starting object and the current indices list. If the current v is a list we instead call iter_list, passing it the same args.
iter_list works similarly to iter_dict except that a list doesn't have any keys, it only contains values, so we don't perform the k == key test, we just recurse into any dicts or lists that the original list contains.
The end result of this process is that when we iterate over find_key we get pairs of (indices, value) where each indices list is the sequence of dict keys and list indices that succesfully terminate in a dict item with our desired key, and value is the value associated with that particular key.
If you'd like to see some other examples of this code in use please see how to modify the key of a nested Json and How can I select deeply nested key:values from dictionary in python.
Also take look at my new, more streamlined show_indices function.
I have a giant dict with a lot of nested dicts -- like a giant tree, and depth in unknown.
I need a function, something like find_value(), that takes dict, value (as string), and returns list of lists, each one of them is "path" (sequential chain of keys from first key to key (or key value) with found value). If nothing found, returns empty list.
I wrote this code:
def find_value(dict, sought_value, current_path, result):
for key,value in dict.items():
current_path.pop()
current_path.append(key)
if sought_value in key:
result.append(current_path)
if type(value) == type(''):
if sought_value in value:
result.append(current_path+[value])
else:
current_path.append(key)
result = find_value(value, sought_value, current_path, result)
current_path.pop()
return result
I call this function to test:
result = find_value(self.dump, sought_value, ['START_KEY_FOR_DELETE'], [])
if not len(result):
print "forgive me, mylord, i'm afraid we didn't find him.."
elif len(result) == 1:
print "bless gods, for all that we have one match, mylord!"
For some inexplicable reasons, my implementation of this function fails some of my tests. I started to debug and find out, that even if current_path prints correct things (it always does, I checked!), the result is inexplicably corrupted. Maybe it is because of recursion magic?
Can anyone help me with this problem? Maybe there is a simple solution for my tasks?
When you write result.append(current_path), you're not copying current_path, which continues to mutate. Change it to result.append(current_path[:]).
I doubt you can do much to optimize a recursive search like that. Assuming there are many lookups on the same dictionary, and the dictionary doesn't change once loaded, then you can index it to get O(1) lookups...
def build_index(src, dest, path=[]):
for k, v in src.iteritems():
fk = path+[k]
if isinstance(v, dict):
build_index(v, dest, fk)
else:
try:
dest[v].append(fk)
except KeyError:
dest[v] = [fk]
>>> data = {'foo': {'sub1': 'blah'}, 'bar': {'sub2': 'whatever'}, 'baz': 'blah'}
>>> index = {}
>>> build_index(data, index)
>>> index
{'blah': [['baz'], ['foo', 'sub1']], 'whatever': [['bar', 'sub2']]}
>>> index['blah']
[['baz'], ['foo', 'sub1']]
def big(dict, n):
line = []
for k in dict:
if k > n:
line.append(k)
return line
I have to find all the elements in dict larger than n.
However, my code only returns the largest number in dict larger than n.
What do I need to do in order to make it correct?
The return line is tabbed too far over, so it returns when the first key larger than n is found (Note: a dictionary isn't ordered by the way you write it), rather than going over all keys before returning. Try:
def big(dic, n):
line = []
for k in dic:
if k > n:
line.append(k)
return line
In fact, you might prefer it to use list comprehension (and the function becomes just one line).
def big(dic, n):
return [k for k in dic if k>n]
.
Dictionaries compomise of key value pairs, {key: value} and when we iterate over a dictionary we are iterating over it's keys. This explains the use of the variable k to iterate over the keys. That is,
[k for k in dic] = [key1, key2, ...]
Hence, if you want to find that with the largest value in the dictionary, you can use:
return [dic[k] for k in dic if dic[k]>n]
Note: I've changed the variable name to dic since (as #AndrewJaffe mentions) dict is a built-in object, and renaming it here may cause unexpected things to occur, and is generally considered bad practise. For example, if you wanted to check type(dic)==dict.
Naively iterating over a dictionary gives you a sequence of keys. not values.
So to do what you want, you need itervalues:
for k in d.itervalues(): ### call it "d" rather than "dict"
if k>n:
line.append(k)
Or, as others have pointed out, use a list comprehension.
Also, don't use dict for the name, as it shadows a builtin.
def big(dic, n):
line = []
for k in dic:
if dic[k]> n: #compare value instead of key
line.append(k) #use k if you're appending key else dic[k] for val
return line
output:
>>> print big({'a':10,'b':15, 'c':12},11)
['c']
move the return statement backwards two tabs otherwise it will return on the first value larger than n.
I have a parameter dictionary holding complex data composed of strings, lists and other dictionaries. Now I want to iterate through this data.
My problem is the way - the best practice - for having an iterator, which iterates both lists and dictionaries.
What I have:
def parse_data(key, value):
iterator = None
if isinstance(value, dict):
iterator = value.items()
elif isinstance(value, list):
iterator = enumerate(value)
if iterator is not None:
for key, item in iterator:
parse_data(key, item)
return
# do some cool stuff with the rest
This does not look very pythonish. I thought of a function similar to iter giving me the possibilty to iterate over both key and item.
I think this i quite pythonic. I would just change it to:
def parse_data:
if isinstance(value, dict):
iterator = value.items()
elif isinstance(value, list):
iterator = enumerate(value)
else:
return
for key, item in iterator:
parse_data(value, key, item)
# do some cool stuff with the rest
Im not sure if there is a shorter way of doing it, but Python is built to do one thing many different ways. Maybe this could be another way of doing it. I haven't tested it so it might not work.
def parse_data(key,value):
iterator = isinstance(value,dict)
if iterator is False and isinstance(value,list):
iterator = enumerate(value)
if iterator is not None:
for key,item in iterator:
parse_data(key,item)
return
#do some cool stuff with the rest