How do we test whether X quacks like a list/tuple?
By that I mean we can possibly subset it by 0 and 1 (etc), though it cannot be a string (which could also be subset).
I thought to test hasattr(X, '__iter__') and not isinstance(X, str), but that would mean a dictionary would still pass which I do not want it to. You could then also test it is not a dictionary, but I wouldn't be so sure about sublasses of dicts etc.
Is there a more official way to test whether something quacks like a list or tuple under this simple specification?
E.g. allowed input should be:
'emailaddr' --> ('emailaddr', 'emailaddr')
('emailaddr', 'alias')
['emailaddr', 'alias']
SequenceThing('emailaddr', 'alias')
Would you like to check if a value is a sequence?
An iterable which supports efficient element access using integer indices via the getitem() special method and defines a len() method that returns the length of the sequence. Some built-in sequence types are list, str, tuple, and bytes. Note that dict also supports getitem() and len(), but is considered a mapping rather than a sequence because the lookups use arbitrary immutable keys rather than integers.
Old way (deprecated in 2.7):
import operator
print operator.isSequenceType(x)
New way:
import collections
print isinstance(x, collections.Sequence)
They are not equivalent - the latter is more accurate but custom types need to be registered with it.
Also, strings are sequences according to both definitions. You might want to treat basestring specially.
Don't allow a bare string as an argument; it's a special case waiting to bite you. Instead, document that the input must be in the form of a tuple/list/whatever, but the second element is optional. That is,
['emailaddr', ]
('emailaddr', )
SequenceThing('emailaddr')
are legal, but not
'emailaddr'
This way, your function can assume that X[0] will be an email address, and X[1] may exist and be an alias, but it is the user's problem if they supply something else.
Duck typing means you just do want you want to do, and if it doesn't work it means you're not dealing with a duck.
If you specifically don't want to get strings, check for strings and raise an exception.
And then do whatever you want to do, no type checking.
A solution based on Kevin's useful comment:
>>> import collections
>>> isinstance([], collections.Sequence) and not isinstance([], str)
True
>>> isinstance((), collections.Sequence) and not isinstance((), str)
True
>>> isinstance('', collections.Sequence) and not isinstance('', str)
False
>>> isinstance({}, collections.Sequence) and not isinstance({}, str)
False
>>> isinstance(set(), collections.Sequence) and not isinstance(set(), str)
False
Mappings such as dicts and sets will not pass the first test. Strings will, but they will obviously fail on the second test.
Related
In Python, suppose one wants to test whether the variable x is a reference to a list object. Is there a difference between if type(x) is list: and if type(x) == list:? This is how I understand it. (Please correct me if I am wrong)
type(x) is list tests whether the expressions type(x) and list evaluate to the same object and type(x) == list tests whether the two objects have equivalent (in some sense) values.
type(x) == list should evaluate to True as long as x is a list. But can type(x) evaluate to a different object from what list refers to?
What exactly does the expression list evaluate to? (I am new to Python, coming from C++, and still can't quite wrap my head around the notion that types are also objects.) Does list point to somewhere in memory? What data live there?
The "one obvious way" to do it, that will preserve the spirit of "duck typing" is isinstance(x, list). Rather, in most cases, one's code won't be specific to a list, but could work with any sequence (or maybe it needs a mutable sequence). So the recomendation is actually:
from collections.abc import MutableSequence
...
if isinstance(x, MutableSequence):
...
Now, going into your specific questions:
What exactly does the expression list evaluate to? Does list point to somewhere in memory? What data live there?
list in Python points to a class. A class that can be inherited, extended, etc...and thanks to a design choice of Python, the syntax for creating an instance of a class is indistinguishable from calling a function.
So, when teaching Python to novices, one could just casually mention that list is a "function" (I prefer not, since it is straightout false - the generic term for both functions and classes in regards to that they can be "called" and will return a result is callable)
Being a class, list does live in a specific place in memory - the "where" does not make any difference when coding in Python - but yes, there is one single place in memory where a class, which in Python is also an object, an instance of type, exists as a data structure with pointers to the various methods that one can use in a Python list.
As for:
type(x) is list tests whether the expressions type(x) and list evaluate to the same object and type(x) == list tests whether the two objects have equivalent (in some sense) values.
That is correct: is is a special operator that unlike others cannot be overriden for any class and checks for object itentity - in the cPython implementation, it checks if both operands are at the same memory address (but keep in mind that that address, though visible through the built-in function id, behaves as if it is opaque from Python code). As for the "sense" in which objects are "equal" in Python: one can always override the behavior of the == operator for a given object, by creating the special named method __eq__ in its class. (The same is true for each other operator - the language data model lists all available "magic" methods).
For lists, the implemented default comparison automatically compares each element recursively (calling the .__eq__ method for each item pair in both lists, if they have the same size to start with, of course)
type(x) == list should evaluate to True as long as x is a list. But can type(x) evaluate to a different object from what list refers to?
Not if "x" is a list proper: type(x) will always evaluate to list. But == would fail if x were an instance of a subclass of list, or another Sequence implementation: that is why it is always better to compare classes using the builtins isinstance and issubclass.
is checks exact identity, and only works if there is exactly one and only one of the list type. Fortunately, that's true for the list type (unless you've done something silly), so it usually works.
== test standard equality, which is equivalent to is for most types including list
Taking those together, there is no effective difference between type(x) is list and type(x) == list, but the former is construction better describes what's happening under the hood.
Consider avoiding use of the type(x) is sometype construction in favor of the isinstance function instead, because it will work for inherited classes too:
x = [1, 2, 3]
isinstance(x, list) # True
class Y(list):
'''A class that inherits from list'''
...
y = Y([1, 2, 3])
isinstance(y, list) # True
type(y) is list # False
Better yet, if you really just want to see if something is list-like, then use isinstance with either typing or collections.abc like so:
import collections.abc
x = [1, 2, 3]
isinstance(x, collections.abc.Iterable) # True
isinstance(x, collections.abc.Sequence) # True
x = set([1, 2, 3])
isinstance(x, collections.abc.Iterable) # True
isinstance(x, collections.abc.Sequence) # False
Note that lists are both Iterable (usable in a for loop) and a Sequence (ordered). A set is Iterable but not a Sequence, because you can use it in a for loop, but it isn't in any particular order. Use Iterable when you want to use in a for loop, or Sequence if it's important that the list be in a certain order.
Final note: The dict type is Mapping, but also counts as Iterable since you can loop over it's keys in a for loop.
What can be the best to check if var is having str, unicode or None; but not anything else?
I tried with
if not isinstance(ads['number'], (str, unicode, None)):
....
but got below exception:
TypeError: isinstance() arg 2 must be a class, type, or tuple of classes and types
`
You have the right idea, but None is not a class. It's type is NoneType, or alternatively type(None). The latter works out of the box in both Python 2 and 3. The former requires an import: from types import NoneType and only works in Python 2.
you can use
if not isinstance(a, (str, unicode, type(None))):
....
and in python 2 (which you seem to be using) this also works:
from types import NoneType
if not isinstance(a, (str, unicode, NoneType)):
....
You can use type to find the method and check that in a list
Ex:
if type(None) in [str, unicode, type(None)]:
print "ok"
In complement to the other answers (which are correct), I would add a comment on the typology of cases where one would need such a test.
Distinguishing between a string and a scalar type
If the question is distinguishing between a string and a number, then asking for forgiveness instead of permission works just as well:
a = 1
try:
b = a.upper()
except AttributeError:
...
Since that's the mindset of Python, it often makes things simpler, and it also takes care of the case where a is None. If one is trying to prevent such an error, then it might be easier to let it instead happen and then catch it. It could also be more reliable, because it could also take care of cases one had not thought of.
Distinguishing between a string and other iterables
One case where it won't work, however, is when one wants to distinguish between a string and a list (several libraries do that with function arguments). The problem is that both a string and a list are iterables, so the string might happily run through the code for the list without any error... except that now you have your string cut into pieces of one character.
>>> for el in a: print(el.upper())
...
H
E
L
L
O
(And if the code fails, the error could be confusing.) To avoid that effect:
a = "hello"
if isinstance(a, str):
...
else:
...
In my experience, it's the only case where it's really advisable to use a test with isinstance(x, (str,...)). I am curious to know whether someone knows others?
I would like to perform an operation on an argument based on the fact that it might be a map-like object or a sequence-like object. I understand that no strategy is going to be 100% reliable for type-like checking, but I'm looking for a robust solution.
Based on this answer, I know how to determine whether something is a sequence and I can do this check after checking if the object is a map.
def ismap(arg):
# How to implement this?
def isseq(arg):
return hasattr(arg,"__iter__")
def operation(arg):
if ismap(arg):
# Do something with a dict-like object
elif isseq(arg):
# Do something with a sequence-like object
else:
# Do something else
Because a sequence can be seen as a map where keys are integers, should I just try to find a key that is not an integer? Or maybe I could look at the string representation? or...?
UPDATE
I selected SilentGhost's answer because it looks like the most "correct" one, but for my needs, here is the solution I ended up implementing:
if hasattr(arg, 'keys') and hasattr(arg, '__getitem__'):
# Do something with a map
elif hasattr(arg, '__iter__'):
# Do something with a sequence/iterable
else:
# Do something else
Essentially, I don't want to rely on an ABC because there are many custom classes that behave like sequences and dictionary but that still do not extend the python collections ABCs (see #Manoj comment). I thought the keys attribute (mentioned by someone who removed his/her answer) was a good enough check for mappings.
Classes extending the Sequence and Mapping ABCs will work with this solution as well.
>>> from collections import Mapping, Sequence
>>> isinstance('ac', Sequence)
True
>>> isinstance('ac', Mapping)
False
>>> isinstance({3:42}, Mapping)
True
>>> isinstance({3:42}, Sequence)
False
collections abstract base classes (ABCs)
Sequences have an __add__ method that implements the + operator. Maps do not have that method, since adding to a map requires both a key and a value, and the + operator only has one right-hand side.
So you may try:
def ismap(arg):
return isseq(arg) and not hasattr(arg, "__add__")
In python is there an easy way to tell if something is not a sequence? I tried to just do:
if x is not sequence but python did not like that
iter(x) will raise a TypeError if x cannot be iterated on -- but that check "accepts" sets and dictionaries, though it "rejects" other non-sequences such as None and numbers.
On the other hands, strings (which most applications want to consider "single items" rather than sequences) are in fact sequences (so, any test, unless specialcased for strings, is going to confirm that they are). So, such simple checks are often not sufficient.
In Python 2.6 and better, abstract base classes were introduced, and among other powerful features they offer more good, systematic support for such "category checking".
>>> import collections
>>> isinstance([], collections.Sequence)
True
>>> isinstance((), collections.Sequence)
True
>>> isinstance(23, collections.Sequence)
False
>>> isinstance('foo', collections.Sequence)
True
>>> isinstance({}, collections.Sequence)
False
>>> isinstance(set(), collections.Sequence)
False
You'll note strings are still considered "a sequence" (since they are), but at least you get dicts and sets out of the way. If you want to exclude strings from your concept of "being sequences", you could use collections.MutableSequence (but that also excludes tuples, which, like strings, are sequences, but are not mutable), or do it explicitly:
import collections
def issequenceforme(obj):
if isinstance(obj, basestring):
return False
return isinstance(obj, collections.Sequence)
Season to taste, and serve hot!-)
PS: For Python 3, use str instead of basestring, and for Python 3.3+: Abstract Base Classes like Sequence have moved to collections.abc.
For Python 3 and 2.6+, you can check if it's a subclass of collections.Sequence:
>>> import collections
>>> isinstance(myObject, collections.Sequence)
True
In Python 3.7 you must use collections.abc.Sequence (collections.Sequence will be removed in Python 3.8):
>>> import collections.abc
>>> isinstance(myObject, collections.abc.Sequence)
True
However, this won't work for duck-typed sequences which implement __len__() and __getitem__() but do not (as they should) subclass collections.Sequence. But it will work for all the built-in Python sequence types: lists, tuples, strings, etc.
While all sequences are iterables, not all iterables are sequences (for example, sets and dictionaries are iterable but not sequences). Checking hasattr(type(obj), '__iter__') will return True for dictionaries and sets.
Since Python "adheres" duck typing, one of the approach is to check if an object has some member (method).
A sequence has length, has sequence of items, and support slicing [doc]. So, it would be like this:
def is_sequence(obj):
t = type(obj)
return hasattr(t, '__len__') and hasattr(t, '__getitem__')
# additionally: and hasattr(t, '__setitem__') and hasattr(t, '__delitem__')
They are all special methods, __len__() should return number of items, __getitem__(i) should return an item (in sequence it is i-th item, but not with mapping), __getitem__(slice(start, stop, step)) should return subsequence, and __setitem__ and __delitem__ like you expect. This is such a contract, but whether the object really do these or not depends on whether the object adheres the contract or not.
Note that, the function above will also return True for mapping, e.g. dict, since mapping also has these methods. To overcome this, you can do a heavier work:
def is_sequence(obj):
try:
len(obj)
obj[0:0]
return True
except TypeError:
return False
But most of the time you don't need this, just do what you want as if the object is a sequence and catch an exception if you wish. This is more pythonic.
For the sake of completeness. There is a utility is_sequence in numpy library ("The fundamental package for scientific computing with Python").
>>> from numpy.distutils.misc_util import is_sequence
>>> is_sequence((2,3,4))
True
>>> is_sequence(45.9)
False
But it accepts sets as sequences and rejects strings
>>> is_sequence(set((1,2)))
True
>>> is_sequence("abc")
False
The code looks a bit like #adrian 's (See numpy git code), which is kind of shaky.
def is_sequence(seq):
if is_string(seq):
return False
try:
len(seq)
except Exception:
return False
return True
The Python 2.6.5 documentation describes the following sequence types: string, Unicode string, list, tuple, buffer, and xrange.
def isSequence(obj):
return type(obj) in [str, unicode, list, tuple, buffer, xrange]
why ask why
try getting a length and if exception return false
def haslength(seq):
try:
len(seq)
except:
return False
return True
Why are you doing this? The normal way here is to require a certain type of thing (A sequence or a number or a file-like object, etc.) and then use it without checking anything. In Python, we don't typically use classes to carry semantic information but simply use the methods defined (this is called "duck typing"). We also prefer APIs where we know exactly what to expect; use keyword arguments, preprocessing, or defining another function if you want to change how a function works.
Is there some simple method that can check if an input object to some function adheres to a specific structure? For example, I want only a dictionary of string keys and values that are a list of integers.
One method would be to write a recursive function that you pass in the object and you iterate over it, checking at each level it is what you expect. But I feel that there should be a more elegant way to do it than this in python.
Why would you expect Python to provide an "elegant way" to check types, since the whole idea of type-checking is so utterly alien to the Pythonic way of conceiving the world and interacting with it?! Normally in Python you'd use duck typing -- so "an integer" might equally well be an int, a long, a gmpy.mpz -- types with no relation to each other except they all implement the same core signature... just as "a dict" might be any implementation of mapping, and so forth.
The new-in-2.6-and-later concept of "abstract base classes" provides a more systematic way to implement and verify duck typing, and 3.0-and-later function annotations let you interface with such a checking system (third-party, since Python adopts no such system for the foreseeable future). For example, this recipe provides a 3.0-and-later way to perform "kinda but not quite" type checking based on function annotations -- though I doubt it goes anywhere as deep as you desire, but then, it's early times for function annotations, and most of us Pythonistas feel so little craving for such checking that we're unlikely to run flat out to implement such monumental systems in lieu of actually useful code;-).
Short answer, no, you have to create your own function.
Long answer: its not pythonic to do what you're asking. There might be some special cases (e.g, marshalling a dict to xmlrpc), but by and large, assume the objects will act like what they're documented to be. If they don't, let the AttributeError bubble up. If you are ok with coercing values, then use str() and int() to convert them. They could, afterall, implement __str__, __add__, etc that makes them not descendants of int/str, but still usable.
def dict_of_string_and_ints(obj):
assert isinstance(obj, dict)
for key, value in obj.iteritems(): # py2.4
assert isinstance(key, basestring)
assert isinstance(value, list)
assert sum(isinstance(x, int) for x in value) == len(list)
Since Python emphasizes things just working, your best bet is to just assert as you go and trust the users of your library to feed you proper data. Let exceptions happen, if you must; that's on your clients for not reading your docstring.
In your case, something like this:
def myfunction(arrrrrgs):
assert issubclass(dict, type(arrrrrgs)), "Need a dictionary!"
for key in arrrrrgs:
assert type(key) is str, "Need a string!"
val = arrrrrgs[key]
assert type(val) is list, "Need a list!"
And so forth.
Really, it isn't worth the effort, and let your program blow up if you express yourself clearly in the docstring, or throw well-placed exceptions to guide the late-night debugger.
I will take a shot and propose a helper function that can do something like that for you in a more generic+elegant way:
def check_type(value, type_def):
"""
This validates an object instanct <value> against a type template <type_def>
presented as a simplified object.
E.g.
if value is list of dictionaries that have string values as key and integers
as values:
>> check_type(value, [{'':0}])
if value is list of dictionaries, no restriction on key/values
>> check_type(value, [{}])
"""
if type(value) != type(type_def):
return False
if hasattr(value, '__iter__'):
if len(type_def) == 0:
return True
type_def_val = iter(type_def).next()
for key in value:
if not check_type(key, type_def_val):
return False
if type(value) is dict:
if not check_type(value.values(), type_def.values()):
return False
return True
The comment explains a sample of usage, but you can always go pretty deep, e.g.
>>> check_type({1:['a', 'b'], 2:['c', 'd']}, {0:['']})
True
>>> check_type({1:['a', 'b'], 2:['c', 3]}, {0:['']})
False
P.S. Feel free to modify it if you want one-by-one tuple validation (e.g. validation against ([], '', {0:0}) which is not handled as it is expected now)