Advice on making my JSON find-all-nested-occurrences method cleaner - python

I am parsing unknown nested json object, I do not know the structure nor depth ahead of time. I am trying to search through it to find a value. This is what I came up with, but I find it fugly. Could anybody let me know how to make this look more pythonic and cleaner?
def find(d, key):
if isinstance(d, dict):
for k, v in d.iteritems():
try:
if key in str(v):
return 'found'
except:
continue
if isinstance(v, dict):
for key,value in v.iteritems():
try:
if key in str(value):
return "found"
except:
continue
if isinstance(v, dict):
find(v)
elif isinstance(v, list):
for x in v:
find(x)
if isinstance(d, list):
for x in d:
try:
if key in x:
return "found"
except:
continue
if isinstance(v, dict):
find(v)
elif isinstance(v, list):
for x in v:
find(x)
else:
if key in str(d):
return "found"
else:
return "Not Found"

It is generally more "Pythonic" to use duck typing; i.e., just try to search for your target rather than using isinstance. See What are the differences between type() and isinstance()?
However, your need for recursion makes it necessary to recurse the values of the dictionaries and the elements of the list. (Do you also want to search the keys of the dictionaries?)
The in operator can be used for both strings, lists, and dictionaries, so no need to separate the dictionaries from the lists when testing for membership. Assuming you don't want to test for the target as a substring, do use isinstance(basestring) per the previous link. To test whether your target is among the values of a dictionary, test for membership in your_dictionary.values(). See Get key by value in dictionary
Because the dictionary values might be lists or dictionaries, I still might test for dictionary and list types the way you did, but I mention that you can cover both list elements and dictionary keys with a single statement because you ask about being Pythonic, and using an overloaded oeprator like in across two types is typical of Python.
Your idea to use recursion is necessary, but I wouldn't define the function with the name find because that is a Python built-in which you will (sort of) shadow and make the recursive call less readable because another programmer might mistakenly think you're calling the built-in (and as good practice, you might want to leave the usual access to the built in in case you want to call it.)
To test for numeric types, use `numbers.Number' as described at How can I check if my python object is a number?
Also, there is a solution to a variation of your problem at https://gist.github.com/douglasmiranda/5127251 . I found that before posting because ColdSpeed's regex suggestion in the comment made me wonder if I were leading you down the wrong path.
So something like
import numbers
def recursively_search(object_from_json, target):
if isinstance(object_from_json, (basestring, numbers.Number)):
return object_from_json==target # the recursion base cases
elif isinstance(object_from_json, list):
for element in list:
if recursively_search(element, target):
return True # quit at first match
elif isinstance(object_from_json, dict):
if target in object_from_json:
return True # among the keys
else:
for value in object_from_json.values():
if recursively_search(value, target):
return True # quit upon finding first match
else:
print ("recursively_search() did not anticipate type ",type(object_from_json))
return False
return False # no match found among the list elements, dict keys, nor dict values

Related

How can lists be distinguished depending on the types of their items?

I have converted some XML files with xmltodict to native Python types (so it "feel[s] like [I am] working with JSON"). The converted objects have a lot of "P" keys with values that might be one of:
a list of strings
a list of None and a string.
a list of dicts
a list of lists of dicts
If the list contain only strings or if the list contain only strings and None, then it should be converted to a string using join. If the list contain dicts or lists then it should be skipped without processing.
How can the code tell these cases apart, so as to determine which action should be performed?
Example data for the first two cases, which should be joined:
["Bla Bla"]
[null,"Bla bla"]
Example data for the last two cases, which should be skipped:
[{"CPV_CODE":{"CODE":79540000}}]
[
[{"CPV_CODE":{"CODE":79530000}}, {"CPV_CODE":{"CODE":79540000}}],
[{"CPV_CODE":{"CODE":79550000}}]
]
This is done in a function that processes the data:
def recursive_iter(obj):
if isinstance(obj, dict):
for item in obj.values():
if "P" in obj and isinstance(obj["P"], list) and not isinstance(obj["P"], dict):
#need to add a check for not dict and list in list
obj["P"] = " ".join([str(e) for e in obj["P"]])
else:
yield from recursive_iter(item)
elif any(isinstance(obj, t) for t in (list, tuple)):
for item in obj:
yield from recursive_iter(item)
else:
yield obj
Since you want to find the list with strings
if ("P" in obj) and isinstance(obj["P"], list):
if all([isinstance(z, str) for z in obj["P"]]):
... # keep list with strings
is it what you want?
Let's start with the two conditions:
the list contains only strings or the list contains only strings and null / none
the list contains dict or list(s)
The first subcondition of the first condition is covered by the second subcondition, so #1 can be simplified to:
the list contains only strings or None
Now let's rephrase them in something resembling a first order logic:
All the list items are None or strings.
Some list item is a dict or a list.
The way that condition #2 is written, it could use an "All" quantifier, but in the context of the operation (whether or not to join the list items), a "some" is appropriate, and more closely aligns with the negation of condition 1 ("Some list item is not None or a string"). Also, it allows for an illustration of another implementation (shown below).
These two conditions are mutually exclusive, though not necessarily exhaustive. To simplify matters, let's assume that, in practice, these are the only two possibilities. Leaving aside the quantifiers ("All", "Some"), these are easily translatable into generator expressions:
(None == item or isinstance(item, str) for item in items)
(isinstance(item, (dict, list)) for item in items)
Note that isinstance accepts a tuple of types (which basically functions as a union type) for the second argument, allowing multiple types to be checked in one call. You could make use of this to combine the two tests into one by using NoneType (isinstance(item, (str, types.NoneType)) or isinstance(item, (str, type(None)))), but this doesn't gain you much of anything.
The "All" and "Some" quantifiers are expressed as the all and any functions, which take iterables (such as what is produced by generator expressions):
all(item is None or isinstance(item, str) for item in items)
any(isinstance(item, (dict, list)) for item in items)
Abstracting these expressions into functions gives two options for the implementation. From recursive_iter, it looks like the value for a "P" might not always be a list. To guard against this, a isinstance(items, list) condition is included:
# 1
def shouldJoin(items):
return isinstance(items, list) and all([item is None or isinstance(item, str) for item in items])
# 2
def shouldJoin(items):
return isinstance(items, list) and not any([isinstance(item, (dict, list)) for item in items])
If you want a more general version of condition #2, you can use container abstract base classes:
import collections.abc as abc
def shouldJoin(items):
return isinstance(items, list) and not any(isinstance(item, (abc.Mapping, abc.MutableSequence)) for item in items)
Both str and list share many abstract base classes; MutableSequence is the one that is unique to list, so that is what's used in the sample. To see exactly which ABCs each concrete type descends from, you can play around with the following:
import collections.abc as abc
ABCMeta = type(abc.Sequence)
abcs = {name: val for (name, val) in abc.__dict__.items() if isinstance(val, ABCMeta)}
def abcsOf(t):
return {name for (name, kls) in abcs.items() if issubclass(t, kls)}
# examine ABCs
abcsOf(str)
abcsOf(list)
# which ABCs does list descend from, that str doesn't?
abcsOf(list) - abcsOf(str)
# result: {'MutableSequence'}
abcsOf(tuple) - abcsOf(str)
# result: set() (the empty set)
Note that it's not possible to distinguish strs from tuples using just ABCs.
Other Notes
The expression any(isinstance(obj, t) for t in (list, tuple)) can be simplified to isinstance(obj, (list, tuple)).
Dict Loop Bug
All the references to "P" in obj and obj["P"] in the first for loop of recursive_iter are loop-invariant. This means, in general and at the very least, there's an opportunity for loop optimization. However, in this case since the branch tests an item other than the current item, it indicates a bug. How it should be fixed depends on whether or not the joined string should be yielded. If so, the test & join can be moved outside the loop, and the loop will then yield the modified value of "P":
# ...
if "P" in obj and shouldJoin(obj["P"]):
obj["P"] = " ".join([str(item) for item in obj["P"]])
for value in obj.values():
yield from recursive_iter(value)
#...
If not, there are a couple options (note you can use dict.items() to get the keys & their values simultaneously):
Move the test & join outside the loop (as for if the joined "P" should be yielded), but skip the modified value for "P" within the loop:
# ...
if "P" in obj and shouldJoin(obj["P"]):
obj["P"] = " ".join([str(item) for item in obj["P"]])
for (key, value) in obj.items():
if not ("P" == key and isinstance(value, str)):
yield from recursive_iter(item)
Move the test & join outside the loop (as for if the joined "P" should be yielded), but exclude "P" from the loop:
# ...
values = (value for value in obj.values())
if "P" in obj and shouldJoin(obj["P"]):
obj["P"] = " ".join([str(item) for item in obj["P"]])
values = (value for (key, value) in obj.items() if "P" != key)
for value in values:
yield from recursive_iter(value)
Keep the test & join in the loop, but test the current key. In general, you need to be careful about modifying objects while looping over them as that may invalidate or interfere with iterators. In this particular case, dict.items returns a view object, so modifying values shouldn't cause problems (though adding or removing values will cause a runtime error).
# ...
for (key, value) in obj.items():
if "P" == key and shouldJoin(value):
obj["P"] = " ".join([str(item) for item in value])
else:
yield from recursive_iter(item)

Recursive function prints but does not return [duplicate]

This question already has answers here:
Why does my recursive function return None?
(4 answers)
Closed 6 months ago.
I have a function that takes a key and traverses nested dicts to return the value regardless of its depth. However, I can only get the value to print, not return. I've read the other questions on this issue and and have tried 1. implementing yield 2. appending the value to a list and then returning the list.
def get_item(data,item_key):
# data=dict, item_key=str
if isinstance(data,dict):
if item_key in data.keys():
print data[item_key]
return data[item_key]
else:
for key in data.keys():
# recursion
get_item(data[key],item_key)
item = get_item(data,'aws:RequestId')
print item
Sample data:
data = OrderedDict([(u'aws:UrlInfoResponse', OrderedDict([(u'#xmlns:aws', u'http://alexa.amazonaws.com/doc/2005-10-05/'), (u'aws:Response', OrderedDict([(u'#xmlns:aws', u'http://awis.amazonaws.com/doc/2005-07-11'), (u'aws:OperationRequest', OrderedDict([(u'aws:RequestId', u'4dbbf7ef-ae87-483b-5ff1-852c777be012')])), (u'aws:UrlInfoResult', OrderedDict([(u'aws:Alexa', OrderedDict([(u'aws:TrafficData', OrderedDict([(u'aws:DataUrl', OrderedDict([(u'#type', u'canonical'), ('#text', u'infowars.com/')])), (u'aws:Rank', u'1252')]))]))])), (u'aws:ResponseStatus', OrderedDict([(u'#xmlns:aws', u'http://alexa.amazonaws.com/doc/2005-10-05/'), (u'aws:StatusCode', u'Success')]))]))]))])
When I execute, the desired value prints, but does not return:
>>>52c7e94b-dc76-2dd6-1216-f147d991d6c7
>>>None
What is happening? Why isn't the function breaking and returning the value when it finds it?
A simple fix, you have to find a nested dict that returns a value. You don't need to explicitly use an else clause because the if returns. You also don't need to call .keys():
def get_item(data, item_key):
if isinstance(data, dict):
if item_key in data:
return data[item_key]
for key in data:
found = get_item(data[key], item_key)
if found:
return found
return None # Explicit vs Implicit
>>> get_item(data, 'aws:RequestId')
'4dbbf7ef-ae87-483b-5ff1-852c777be012'
One of the design principles of python is EAFP (Easier to Ask for Forgiveness than Permission), which means that exceptions are more commonly used than in other languages. The above rewritten with EAFP design:
def get_item(data, item_key):
try:
return data[item_key]
except KeyError:
for key in data:
found = get_item(data[key], item_key)
if found:
return found
except (TypeError, IndexError):
pass
return None
As other people commented, you need return statement in else blocks, too. You have two if blocks so you would need two more return statement. Here is code that does what you may want
from collections import OrderedDict
def get_item(data,item_key):
result = []
if isinstance(data, dict):
for key in data:
if key == item_key:
print data[item_key]
result.append(data[item_key])
# recursion
result += get_item(data[key],item_key)
return result
return result
Your else block needs to return the value if it finds it.
I've made a few other minor changes to your code. You don't need to do
if item_key in data.keys():
Instead, you can simply do
if item_key in data:
Similarly, you don't need
for key in data.keys():
You can iterate directly over a dict (or any class derived from a dict) to iterate over its keys:
for key in data:
Here's my version of your code, which should run on Python 2.7 as well as Python 3.
from __future__ import print_function
from collections import OrderedDict
def get_item(data, item_key):
if isinstance(data, dict):
if item_key in data:
return data[item_key]
for val in data.values():
v = get_item(val, item_key)
if v is not None:
return v
data = OrderedDict([(u'aws:UrlInfoResponse',
OrderedDict([(u'#xmlns:aws', u'http://alexa.amazonaws.com/doc/2005-10-05/'), (u'aws:Response',
OrderedDict([(u'#xmlns:aws', u'http://awis.amazonaws.com/doc/2005-07-11'), (u'aws:OperationRequest',
OrderedDict([(u'aws:RequestId', u'4dbbf7ef-ae87-483b-5ff1-852c777be012')])), (u'aws:UrlInfoResult',
OrderedDict([(u'aws:Alexa',
OrderedDict([(u'aws:TrafficData',
OrderedDict([(u'aws:DataUrl',
OrderedDict([(u'#type', u'canonical'), ('#text', u'infowars.com/')])),
(u'aws:Rank', u'1252')]))]))])), (u'aws:ResponseStatus',
OrderedDict([(u'#xmlns:aws', u'http://alexa.amazonaws.com/doc/2005-10-05/'),
(u'aws:StatusCode', u'Success')]))]))]))])
item = get_item(data, 'aws:RequestId')
print(item)
output
4dbbf7ef-ae87-483b-5ff1-852c777be012
Note that this function returns None if the isinstance(data, dict) test fails, or if the for loop fails to return. It's generally a good idea to ensure that every possible return path in a recursive function has an explicit return statement, as that makes it clearer what's happening, but IMHO it's ok to leave those returns implicit in this fairly simple function.

Sort both elements of a tuple, ignoring case, within a list, so that the keys of a dictionary line up correctly

I am iterating two instances of the same class in order to check for equality. These two instances of this class are created by different means: one from a pickle, and the other from a json document.
I am iterating through the properties of these objects, to check for equality, however, the keys in the dictionaries do not always line up, so they can not be properly compared. So I tried sorting these tuples, but I cannot get the same key from both of these objects at all times because of case sensitivity.
One attempt gives me the left side sorted:
def __eq__(self, other):
for (self_key, other_key) in sorted(
zip(self.__dict__, other.__dict__),
key=lambda element: (element[0].lower(), element[1].lower())):
print self_key, " ", other_key
....
which outputs
alpha VAR_THRESH
chunksize VAR_MAXITER
decay decay
....
And if I explicitly make only the right side lower() I get the opposite: e.g.
....
key=lambda element: (element[0], element[1].lower())):
....
The left side is not sorted by lower.
VAR_MAXITER alpha
VAR_THRESH chunksize
alpha VAR_THRESH
chunksize VAR_MAXITER
decay decay
If I leave out the .lower() on both of the elements, I get the second example.
How can I ensure that the correct keys always line up? Or in other words, how can I sort by both values in the tuples, ignoring case, so that they always line up?
zip() will give you the corresponding elements in each iterable that you give it. Sorting what zip() returns will not help you. You need to sort the keys before you pass them to zip(). That really isn't what you want, though. Just do this:
def __eq__(self, other):
if self.__dict__.viewkeys() != other_keys.__dict__.viewkeys(): # .keys() in Python 3
return False
return all(other.__dict__[key] == self.__dict__[key] for key in self.__dict__)
The easy way would be this:
def __eq__(self, other):
self.__dict__ == other.__dict__
but the first you can more easily change to define what counts as equivalent.
The following (proposed by Padraic Cunningham) is closer to what you want:
def __eq__(self, other):
return all(
sorted(v) == sorted(self.__dict__[key])
if isinstance(v, np.array)
else self.__dict__[key] == v
for key, v in other.__dict__.iteritems()
)
That is a shortcut for this:
def __eq__(self, other):
for key, v in other.__dict__.iteritems(): # .items() in Python 3
if isinstance(v, np.array) and sorted(v) != sorted(self.__dict__[key]):
return False
elif self.__dict__[key] != v:
return False
return True
Dictionaries are inherently unordered. Coercing the keys into some ordering that fits your particular definition is certainly possible, but also entirely unnecessary. You could instead try something like this:
def __eq__(self, other):
# first, check to make sure that the two sets of keys are identical
if self.__dict__.keys() != other.__dict__.keys():
return False
# now that we know the dictionaries have identical key sets, check
# for equality of values.
for key in self.__dict__.keys():
if self.__dict__[key] != other.__dict__[key]:
return False
# if everything worked, then the dictionaries are identical!
return True

Python type comparision

Ok, so I have a list of tuples containing a three values (code, value, unit)
when I'm to use this I need to check if a value is an str, a list or a matrix. (or check if list and then check if list again)
My question is simply should I do like this, or is there some better way?
for code, value, unit in tuples:
if isinstance(value, str):
# Do for this item
elif isinstance(value, collections.Iterable):
# Do for each item
for x in value:
if isinstance(x, str):
# Do for this item
elif isinstance(x, collections.Iterable):
# Do for each item
for x in value:
# ...
else:
raise Exception
else:
raise Exception
The best solution is to avoid mixing types like this, but if you're stuck with it then what you've written is fine except I'd only check for the str instance. If it isn't a string or an iterable then you'll get a more appropriate exception anyway so no need to do it yourself.
for (code,value,unit) in tuples:
if isinstance(value,str):
#Do for this item
else:
#Do for each item
for x in value:
if isinstance(value,str):
#Do for this item
else:
#Do for each item
for x in value:
This works but every time you call isinstance, you should ask yourself "can I add a method to value instead?" That would change the code into:
for (code,value,unit) in tuples:
value.doSomething(code, unit)
For this to work, you'll have to wrap types like str and lists in helper types that implement doSomething()
An alternative to your approach is to factor out this code into a more general generator function (assuming Python 2.x):
def flatten(x):
if isinstance(x, basestring):
yield x
else:
for y in x:
for z in flatten(y):
yield y
(This also incorporates the simplifications suggested and explained in Duncan's answer.)
Now, your code becomes very simple and readable:
for code, value, unit in tuples:
for v in flatten(value):
# whatever
Factoring the code out also helps to deal with this data structure at several places in the code.
Just use the tuples and catch any exceptions. Don't look before you jump :)
Recursion will help.
def processvalue(value):
if isinstance(value, list): # string is type Iterable (thanks #sven)
for x in value:
processvalue(value)
else:
# Do your processing of string or matrices or whatever.
# Test each the value in each tuple.
for (code, value, unit) in tuples:
processvalue(value)
This is a neater way of dealing with nested structures, and will also give you the ability to process abitrary depths.

Searching for an object

This is how I have been searching for objects in python. Is there any more efficient (faster, simpler) way of doing it?
Obs: A is a known object.
for i in Very_Long_List_Of_Names:
if A == My_Dictionary[i]:
print: "The object you are looking for is ", i
break
The one liner would be: (i for i in List_of_names if A == My_dictionary[i]).next().
This throws a KeyError if there is an item in List_of_names that is not a key in My_dictionary and a StopIteration if the item is not found, else returns the key where it finds A.
I assume you're looking for an object in the values of a Python dictionary.
If you simply want to check for its existence (as in, you don't really care to know which key maps to that value), you can do:
if A in My_Dictionary.values():
print "The object is in the dictionary"
Otherwise, if you do want to get the key associated with that value:
for k, v in My_Dictionary.iteritems():
if v == A:
print "The object you are looking for is ", k
break
EDIT: Note that you can have multiple keys with the same value in the same dictionary. The above code will only find the first occurence. Still, it sure beats having a huge list of names. :-)
Seems to me like you have you're using the dictionary incorrectly if you're searching through all the keys looking for a specific value.
If A is hashable, then store A in a dictionary with its values as i.
d = {A: 'a_name'}
If My_Dictionary is not huge and can trivially fit in memory, and, A is hashable, then, create a duplicate dictionary from it:
d = dict((value, key) for key, value in My_Dictionary.iteritems())
if A in d:
print "word you're looking for is: ", d[A]
Otherwise, you'll have to to iterate over every key:
for word, object_ in My_Dictionary.iteritems():
if object_ == A:
print "word you're looking for is: ", word

Categories

Resources