Python type comparision - python

Ok, so I have a list of tuples containing a three values (code, value, unit)
when I'm to use this I need to check if a value is an str, a list or a matrix. (or check if list and then check if list again)
My question is simply should I do like this, or is there some better way?
for code, value, unit in tuples:
if isinstance(value, str):
# Do for this item
elif isinstance(value, collections.Iterable):
# Do for each item
for x in value:
if isinstance(x, str):
# Do for this item
elif isinstance(x, collections.Iterable):
# Do for each item
for x in value:
# ...
else:
raise Exception
else:
raise Exception

The best solution is to avoid mixing types like this, but if you're stuck with it then what you've written is fine except I'd only check for the str instance. If it isn't a string or an iterable then you'll get a more appropriate exception anyway so no need to do it yourself.
for (code,value,unit) in tuples:
if isinstance(value,str):
#Do for this item
else:
#Do for each item
for x in value:
if isinstance(value,str):
#Do for this item
else:
#Do for each item
for x in value:

This works but every time you call isinstance, you should ask yourself "can I add a method to value instead?" That would change the code into:
for (code,value,unit) in tuples:
value.doSomething(code, unit)
For this to work, you'll have to wrap types like str and lists in helper types that implement doSomething()

An alternative to your approach is to factor out this code into a more general generator function (assuming Python 2.x):
def flatten(x):
if isinstance(x, basestring):
yield x
else:
for y in x:
for z in flatten(y):
yield y
(This also incorporates the simplifications suggested and explained in Duncan's answer.)
Now, your code becomes very simple and readable:
for code, value, unit in tuples:
for v in flatten(value):
# whatever
Factoring the code out also helps to deal with this data structure at several places in the code.

Just use the tuples and catch any exceptions. Don't look before you jump :)

Recursion will help.
def processvalue(value):
if isinstance(value, list): # string is type Iterable (thanks #sven)
for x in value:
processvalue(value)
else:
# Do your processing of string or matrices or whatever.
# Test each the value in each tuple.
for (code, value, unit) in tuples:
processvalue(value)
This is a neater way of dealing with nested structures, and will also give you the ability to process abitrary depths.

Related

How can lists be distinguished depending on the types of their items?

I have converted some XML files with xmltodict to native Python types (so it "feel[s] like [I am] working with JSON"). The converted objects have a lot of "P" keys with values that might be one of:
a list of strings
a list of None and a string.
a list of dicts
a list of lists of dicts
If the list contain only strings or if the list contain only strings and None, then it should be converted to a string using join. If the list contain dicts or lists then it should be skipped without processing.
How can the code tell these cases apart, so as to determine which action should be performed?
Example data for the first two cases, which should be joined:
["Bla Bla"]
[null,"Bla bla"]
Example data for the last two cases, which should be skipped:
[{"CPV_CODE":{"CODE":79540000}}]
[
[{"CPV_CODE":{"CODE":79530000}}, {"CPV_CODE":{"CODE":79540000}}],
[{"CPV_CODE":{"CODE":79550000}}]
]
This is done in a function that processes the data:
def recursive_iter(obj):
if isinstance(obj, dict):
for item in obj.values():
if "P" in obj and isinstance(obj["P"], list) and not isinstance(obj["P"], dict):
#need to add a check for not dict and list in list
obj["P"] = " ".join([str(e) for e in obj["P"]])
else:
yield from recursive_iter(item)
elif any(isinstance(obj, t) for t in (list, tuple)):
for item in obj:
yield from recursive_iter(item)
else:
yield obj
Since you want to find the list with strings
if ("P" in obj) and isinstance(obj["P"], list):
if all([isinstance(z, str) for z in obj["P"]]):
... # keep list with strings
is it what you want?
Let's start with the two conditions:
the list contains only strings or the list contains only strings and null / none
the list contains dict or list(s)
The first subcondition of the first condition is covered by the second subcondition, so #1 can be simplified to:
the list contains only strings or None
Now let's rephrase them in something resembling a first order logic:
All the list items are None or strings.
Some list item is a dict or a list.
The way that condition #2 is written, it could use an "All" quantifier, but in the context of the operation (whether or not to join the list items), a "some" is appropriate, and more closely aligns with the negation of condition 1 ("Some list item is not None or a string"). Also, it allows for an illustration of another implementation (shown below).
These two conditions are mutually exclusive, though not necessarily exhaustive. To simplify matters, let's assume that, in practice, these are the only two possibilities. Leaving aside the quantifiers ("All", "Some"), these are easily translatable into generator expressions:
(None == item or isinstance(item, str) for item in items)
(isinstance(item, (dict, list)) for item in items)
Note that isinstance accepts a tuple of types (which basically functions as a union type) for the second argument, allowing multiple types to be checked in one call. You could make use of this to combine the two tests into one by using NoneType (isinstance(item, (str, types.NoneType)) or isinstance(item, (str, type(None)))), but this doesn't gain you much of anything.
The "All" and "Some" quantifiers are expressed as the all and any functions, which take iterables (such as what is produced by generator expressions):
all(item is None or isinstance(item, str) for item in items)
any(isinstance(item, (dict, list)) for item in items)
Abstracting these expressions into functions gives two options for the implementation. From recursive_iter, it looks like the value for a "P" might not always be a list. To guard against this, a isinstance(items, list) condition is included:
# 1
def shouldJoin(items):
return isinstance(items, list) and all([item is None or isinstance(item, str) for item in items])
# 2
def shouldJoin(items):
return isinstance(items, list) and not any([isinstance(item, (dict, list)) for item in items])
If you want a more general version of condition #2, you can use container abstract base classes:
import collections.abc as abc
def shouldJoin(items):
return isinstance(items, list) and not any(isinstance(item, (abc.Mapping, abc.MutableSequence)) for item in items)
Both str and list share many abstract base classes; MutableSequence is the one that is unique to list, so that is what's used in the sample. To see exactly which ABCs each concrete type descends from, you can play around with the following:
import collections.abc as abc
ABCMeta = type(abc.Sequence)
abcs = {name: val for (name, val) in abc.__dict__.items() if isinstance(val, ABCMeta)}
def abcsOf(t):
return {name for (name, kls) in abcs.items() if issubclass(t, kls)}
# examine ABCs
abcsOf(str)
abcsOf(list)
# which ABCs does list descend from, that str doesn't?
abcsOf(list) - abcsOf(str)
# result: {'MutableSequence'}
abcsOf(tuple) - abcsOf(str)
# result: set() (the empty set)
Note that it's not possible to distinguish strs from tuples using just ABCs.
Other Notes
The expression any(isinstance(obj, t) for t in (list, tuple)) can be simplified to isinstance(obj, (list, tuple)).
Dict Loop Bug
All the references to "P" in obj and obj["P"] in the first for loop of recursive_iter are loop-invariant. This means, in general and at the very least, there's an opportunity for loop optimization. However, in this case since the branch tests an item other than the current item, it indicates a bug. How it should be fixed depends on whether or not the joined string should be yielded. If so, the test & join can be moved outside the loop, and the loop will then yield the modified value of "P":
# ...
if "P" in obj and shouldJoin(obj["P"]):
obj["P"] = " ".join([str(item) for item in obj["P"]])
for value in obj.values():
yield from recursive_iter(value)
#...
If not, there are a couple options (note you can use dict.items() to get the keys & their values simultaneously):
Move the test & join outside the loop (as for if the joined "P" should be yielded), but skip the modified value for "P" within the loop:
# ...
if "P" in obj and shouldJoin(obj["P"]):
obj["P"] = " ".join([str(item) for item in obj["P"]])
for (key, value) in obj.items():
if not ("P" == key and isinstance(value, str)):
yield from recursive_iter(item)
Move the test & join outside the loop (as for if the joined "P" should be yielded), but exclude "P" from the loop:
# ...
values = (value for value in obj.values())
if "P" in obj and shouldJoin(obj["P"]):
obj["P"] = " ".join([str(item) for item in obj["P"]])
values = (value for (key, value) in obj.items() if "P" != key)
for value in values:
yield from recursive_iter(value)
Keep the test & join in the loop, but test the current key. In general, you need to be careful about modifying objects while looping over them as that may invalidate or interfere with iterators. In this particular case, dict.items returns a view object, so modifying values shouldn't cause problems (though adding or removing values will cause a runtime error).
# ...
for (key, value) in obj.items():
if "P" == key and shouldJoin(value):
obj["P"] = " ".join([str(item) for item in value])
else:
yield from recursive_iter(item)

Advice on making my JSON find-all-nested-occurrences method cleaner

I am parsing unknown nested json object, I do not know the structure nor depth ahead of time. I am trying to search through it to find a value. This is what I came up with, but I find it fugly. Could anybody let me know how to make this look more pythonic and cleaner?
def find(d, key):
if isinstance(d, dict):
for k, v in d.iteritems():
try:
if key in str(v):
return 'found'
except:
continue
if isinstance(v, dict):
for key,value in v.iteritems():
try:
if key in str(value):
return "found"
except:
continue
if isinstance(v, dict):
find(v)
elif isinstance(v, list):
for x in v:
find(x)
if isinstance(d, list):
for x in d:
try:
if key in x:
return "found"
except:
continue
if isinstance(v, dict):
find(v)
elif isinstance(v, list):
for x in v:
find(x)
else:
if key in str(d):
return "found"
else:
return "Not Found"
It is generally more "Pythonic" to use duck typing; i.e., just try to search for your target rather than using isinstance. See What are the differences between type() and isinstance()?
However, your need for recursion makes it necessary to recurse the values of the dictionaries and the elements of the list. (Do you also want to search the keys of the dictionaries?)
The in operator can be used for both strings, lists, and dictionaries, so no need to separate the dictionaries from the lists when testing for membership. Assuming you don't want to test for the target as a substring, do use isinstance(basestring) per the previous link. To test whether your target is among the values of a dictionary, test for membership in your_dictionary.values(). See Get key by value in dictionary
Because the dictionary values might be lists or dictionaries, I still might test for dictionary and list types the way you did, but I mention that you can cover both list elements and dictionary keys with a single statement because you ask about being Pythonic, and using an overloaded oeprator like in across two types is typical of Python.
Your idea to use recursion is necessary, but I wouldn't define the function with the name find because that is a Python built-in which you will (sort of) shadow and make the recursive call less readable because another programmer might mistakenly think you're calling the built-in (and as good practice, you might want to leave the usual access to the built in in case you want to call it.)
To test for numeric types, use `numbers.Number' as described at How can I check if my python object is a number?
Also, there is a solution to a variation of your problem at https://gist.github.com/douglasmiranda/5127251 . I found that before posting because ColdSpeed's regex suggestion in the comment made me wonder if I were leading you down the wrong path.
So something like
import numbers
def recursively_search(object_from_json, target):
if isinstance(object_from_json, (basestring, numbers.Number)):
return object_from_json==target # the recursion base cases
elif isinstance(object_from_json, list):
for element in list:
if recursively_search(element, target):
return True # quit at first match
elif isinstance(object_from_json, dict):
if target in object_from_json:
return True # among the keys
else:
for value in object_from_json.values():
if recursively_search(value, target):
return True # quit upon finding first match
else:
print ("recursively_search() did not anticipate type ",type(object_from_json))
return False
return False # no match found among the list elements, dict keys, nor dict values

How to handle edge case when iterating over cartesian product of sets in Python?

I have a function on zero or more Pyomo sets:
def myfunc(*sets):
if len(sets) == 0:
return # Do something else that is irrelevant here
indices = reduce(lambda x, y: x * y, sets) # Cartesian product of sets
for i in indices:
call_some_other_function(*i)
This fails when I pass it a single set of integers, like
import pyomo.environ
myset = pyomo.environ.Set(initialize=[1, 2])
myfunc(*myset)
because then I'm evaluating *i on an integer. What's an elegant way of handling this situation?
You can always check if it is an collections.Iterable to catch cases where it is not iterable (lists, sets, etc. are iterables - integer aren't):
from collections import Iterable
a = 1
isinstance(a, Iterable) # returns False
a = [1,2,3]
isinstance(a, Iterable) # returns True
so just do a check before you pass it into the function:
if isinstance(myset, Iterable):
myfunc(*myset)
else:
# something else
I think you're making things harder by implementing your own Cartesian product. Python's provided itertools.product since 2.6, and it works with any number of input sets.
import itertools
def args(*args):
return repr(args)
def for_each_index(*sets):
if not sets:
print('No sets given. Doing something else.')
return
for index in itertools.product(*sets):
print('Do something with ' + args(*index) + '...')
return
I added the args function solely to show the exact result of expanding *args. You don't need it, except maybe for debugging.
Note also that there is no need to call len to test if a tuple is non-empty. if not sets: will do.

Check type of items in list are all same and perform task

I want to be able to create a function that takes a list, checks if every item in the list is of a certain type (one item at a time) and if so, perform the calculation. For this particular function, I want to calculate the product of a list of integers.
My function:
def multpoly(items):
typeInt = []
total = 1
for i in list:
if type(i) is int:
total = total * i
elif type(i) is str:
typelist.append(i)
elif type(i) is list:
typelist.append(i)
return total
return listInt
items = [1,2,3,4,5]
stringitems = ["1","2","3"]
listitems = [[1,1],[2,2]]
print(multpoly(items))
print(multpoly(stringitems))
print(multpoly(listitems))
I would also like to be able to create functions to do the same, changing the list to a list of strings and joining them and changing the list to a list of lists and concatenating them.
This current function doesn't work. I receive an error - "'type' object is not iterable".
If anyone could suggest fixes or could explain what's going on that would be great! :)
You're trying to iterate list, but the argument is named items. Also, i would be an int, but it wouldn't actually be int itself; you'd want isinstance(i, int) or type(i) is int. Lastly, you can't add a str to an int(total); if the goal is to fail when any item is not an int, you need to handle that when the type check fails (otherwise you'll skip the item, but still report that the list was all integers). You probably want code more like this:
# This uses the Py3 style print function, accessible in Py2 if you include
from __future__ import print_function
# at the top of the file. If you want Py2 print, that's left as an exercise
class NotUniformType(TypeError): pass
def multpoly(items):
total = 1
for i in items:
if not isinstance(i, int):
raise NotUniformType("{!r} is not of type int".format(i))
total *= i
return total
try:
print(multpoly(items), "Items in list are integers"))
except NotUniformType as e:
print("Items in list include non-integer types:", e)

How to convert an int to a list?

I have some code that might return either a single int, or a list. When it returns an int, I need to convert it into a list that contains only the int.
I tried the following, but it doesn't work:
newValue = list(retValue)
Apparently I can't do list(some int) because ints are not iterable. Is there another way of doing this?
Thanks in advance!
define your own function:
def mylist(x):
if isinstance(x,(list,tuple)):
return x
else:
return [x]
>>> mylist(5)
[5]
>>> mylist([10])
[10]
In Python, duck typing is preferable - don't test for a specific type, just test whether it supports the method you need ("I don't care if it's a duck, so long as it quacks").
def make_list(n):
if hasattr(n, '__iter__'):
return n
else:
return [n]
a = make_list([1,2,3]) # => [1,2,3]
b = make_list(4) # => [4]
Try to convert the variable to an int. If it is already an int this is a no-op. If it is a list then this raises a TypeError.
try:
return [int(x)]
except TypeError:
return x
Though using exceptions for flow control is generally frowned upon if the exceptional circumstance has a high probability of occurring. This is because processing exceptions is quite a lengthy task.
The other way is to use the isinstance operator.
if isinstance(x, list):
return x
else:
return [x]
listInt = intVal if isinstance(intVal,list) else [intVal]
this will always return a list if value is not a list.
Hope this helps
if isinstance(x,list): return x
else: return [x]
That's it.
Of course, this won't deal intelligently with other iterable types, but it's not clear that you want to treat all iterables as if they were lists (maybe you do, maybe you don't).
This is really just a variation on Hugh Bothwell's answer, but... if you want state-of-the-art duck typing, you can get the semantics of hasattr(rval, '__iter__') in a more attractive package with isinstance(rval, collections.Iterable). So...
def wrap_in_iterable(x):
if isinstance(x, collections.Iterable):
return x
else:
return [x]
Also, perhaps you want a list, and not just an iterable; to get list-like things but eliminate generator-like and dict-like things, collections.Sequence is handy. (Just don't pass an infinite generator to this function.)
def convert_to_sequence(x):
if isinstance(x, collections.Sequence):
return x
elif isinstance(x, collections.Iterable):
return list(x)
else
return [x]
These work because collections.Sequence and collection.Iterable define __subclasshook__s that perform the appropriate hasattr checks.
Finally, at the risk of being boring -- if you have control over the returning function, just return a one-item list if possible.

Categories

Resources