Python: check if an object is a sequence - python

In python is there an easy way to tell if something is not a sequence? I tried to just do:
if x is not sequence but python did not like that

iter(x) will raise a TypeError if x cannot be iterated on -- but that check "accepts" sets and dictionaries, though it "rejects" other non-sequences such as None and numbers.
On the other hands, strings (which most applications want to consider "single items" rather than sequences) are in fact sequences (so, any test, unless specialcased for strings, is going to confirm that they are). So, such simple checks are often not sufficient.
In Python 2.6 and better, abstract base classes were introduced, and among other powerful features they offer more good, systematic support for such "category checking".
>>> import collections
>>> isinstance([], collections.Sequence)
True
>>> isinstance((), collections.Sequence)
True
>>> isinstance(23, collections.Sequence)
False
>>> isinstance('foo', collections.Sequence)
True
>>> isinstance({}, collections.Sequence)
False
>>> isinstance(set(), collections.Sequence)
False
You'll note strings are still considered "a sequence" (since they are), but at least you get dicts and sets out of the way. If you want to exclude strings from your concept of "being sequences", you could use collections.MutableSequence (but that also excludes tuples, which, like strings, are sequences, but are not mutable), or do it explicitly:
import collections
def issequenceforme(obj):
if isinstance(obj, basestring):
return False
return isinstance(obj, collections.Sequence)
Season to taste, and serve hot!-)
PS: For Python 3, use str instead of basestring, and for Python 3.3+: Abstract Base Classes like Sequence have moved to collections.abc.

For Python 3 and 2.6+, you can check if it's a subclass of collections.Sequence:
>>> import collections
>>> isinstance(myObject, collections.Sequence)
True
In Python 3.7 you must use collections.abc.Sequence (collections.Sequence will be removed in Python 3.8):
>>> import collections.abc
>>> isinstance(myObject, collections.abc.Sequence)
True
However, this won't work for duck-typed sequences which implement __len__() and __getitem__() but do not (as they should) subclass collections.Sequence. But it will work for all the built-in Python sequence types: lists, tuples, strings, etc.
While all sequences are iterables, not all iterables are sequences (for example, sets and dictionaries are iterable but not sequences). Checking hasattr(type(obj), '__iter__') will return True for dictionaries and sets.

Since Python "adheres" duck typing, one of the approach is to check if an object has some member (method).
A sequence has length, has sequence of items, and support slicing [doc]. So, it would be like this:
def is_sequence(obj):
t = type(obj)
return hasattr(t, '__len__') and hasattr(t, '__getitem__')
# additionally: and hasattr(t, '__setitem__') and hasattr(t, '__delitem__')
They are all special methods, __len__() should return number of items, __getitem__(i) should return an item (in sequence it is i-th item, but not with mapping), __getitem__(slice(start, stop, step)) should return subsequence, and __setitem__ and __delitem__ like you expect. This is such a contract, but whether the object really do these or not depends on whether the object adheres the contract or not.
Note that, the function above will also return True for mapping, e.g. dict, since mapping also has these methods. To overcome this, you can do a heavier work:
def is_sequence(obj):
try:
len(obj)
obj[0:0]
return True
except TypeError:
return False
But most of the time you don't need this, just do what you want as if the object is a sequence and catch an exception if you wish. This is more pythonic.

For the sake of completeness. There is a utility is_sequence in numpy library ("The fundamental package for scientific computing with Python").
>>> from numpy.distutils.misc_util import is_sequence
>>> is_sequence((2,3,4))
True
>>> is_sequence(45.9)
False
But it accepts sets as sequences and rejects strings
>>> is_sequence(set((1,2)))
True
>>> is_sequence("abc")
False
The code looks a bit like #adrian 's (See numpy git code), which is kind of shaky.
def is_sequence(seq):
if is_string(seq):
return False
try:
len(seq)
except Exception:
return False
return True

The Python 2.6.5 documentation describes the following sequence types: string, Unicode string, list, tuple, buffer, and xrange.
def isSequence(obj):
return type(obj) in [str, unicode, list, tuple, buffer, xrange]

why ask why
try getting a length and if exception return false
def haslength(seq):
try:
len(seq)
except:
return False
return True

Why are you doing this? The normal way here is to require a certain type of thing (A sequence or a number or a file-like object, etc.) and then use it without checking anything. In Python, we don't typically use classes to carry semantic information but simply use the methods defined (this is called "duck typing"). We also prefer APIs where we know exactly what to expect; use keyword arguments, preprocessing, or defining another function if you want to change how a function works.

Related

type(x) is list vs type(x) == list

In Python, suppose one wants to test whether the variable x is a reference to a list object. Is there a difference between if type(x) is list: and if type(x) == list:? This is how I understand it. (Please correct me if I am wrong)
type(x) is list tests whether the expressions type(x) and list evaluate to the same object and type(x) == list tests whether the two objects have equivalent (in some sense) values.
type(x) == list should evaluate to True as long as x is a list. But can type(x) evaluate to a different object from what list refers to?
What exactly does the expression list evaluate to? (I am new to Python, coming from C++, and still can't quite wrap my head around the notion that types are also objects.) Does list point to somewhere in memory? What data live there?
The "one obvious way" to do it, that will preserve the spirit of "duck typing" is isinstance(x, list). Rather, in most cases, one's code won't be specific to a list, but could work with any sequence (or maybe it needs a mutable sequence). So the recomendation is actually:
from collections.abc import MutableSequence
...
if isinstance(x, MutableSequence):
...
Now, going into your specific questions:
What exactly does the expression list evaluate to? Does list point to somewhere in memory? What data live there?
list in Python points to a class. A class that can be inherited, extended, etc...and thanks to a design choice of Python, the syntax for creating an instance of a class is indistinguishable from calling a function.
So, when teaching Python to novices, one could just casually mention that list is a "function" (I prefer not, since it is straightout false - the generic term for both functions and classes in regards to that they can be "called" and will return a result is callable)
Being a class, list does live in a specific place in memory - the "where" does not make any difference when coding in Python - but yes, there is one single place in memory where a class, which in Python is also an object, an instance of type, exists as a data structure with pointers to the various methods that one can use in a Python list.
As for:
type(x) is list tests whether the expressions type(x) and list evaluate to the same object and type(x) == list tests whether the two objects have equivalent (in some sense) values.
That is correct: is is a special operator that unlike others cannot be overriden for any class and checks for object itentity - in the cPython implementation, it checks if both operands are at the same memory address (but keep in mind that that address, though visible through the built-in function id, behaves as if it is opaque from Python code). As for the "sense" in which objects are "equal" in Python: one can always override the behavior of the == operator for a given object, by creating the special named method __eq__ in its class. (The same is true for each other operator - the language data model lists all available "magic" methods).
For lists, the implemented default comparison automatically compares each element recursively (calling the .__eq__ method for each item pair in both lists, if they have the same size to start with, of course)
type(x) == list should evaluate to True as long as x is a list. But can type(x) evaluate to a different object from what list refers to?
Not if "x" is a list proper: type(x) will always evaluate to list. But == would fail if x were an instance of a subclass of list, or another Sequence implementation: that is why it is always better to compare classes using the builtins isinstance and issubclass.
is checks exact identity, and only works if there is exactly one and only one of the list type. Fortunately, that's true for the list type (unless you've done something silly), so it usually works.
== test standard equality, which is equivalent to is for most types including list
Taking those together, there is no effective difference between type(x) is list and type(x) == list, but the former is construction better describes what's happening under the hood.
Consider avoiding use of the type(x) is sometype construction in favor of the isinstance function instead, because it will work for inherited classes too:
x = [1, 2, 3]
isinstance(x, list) # True
class Y(list):
'''A class that inherits from list'''
...
y = Y([1, 2, 3])
isinstance(y, list) # True
type(y) is list # False
Better yet, if you really just want to see if something is list-like, then use isinstance with either typing or collections.abc like so:
import collections.abc
x = [1, 2, 3]
isinstance(x, collections.abc.Iterable) # True
isinstance(x, collections.abc.Sequence) # True
x = set([1, 2, 3])
isinstance(x, collections.abc.Iterable) # True
isinstance(x, collections.abc.Sequence) # False
Note that lists are both Iterable (usable in a for loop) and a Sequence (ordered). A set is Iterable but not a Sequence, because you can use it in a for loop, but it isn't in any particular order. Use Iterable when you want to use in a for loop, or Sequence if it's important that the list be in a certain order.
Final note: The dict type is Mapping, but also counts as Iterable since you can loop over it's keys in a for loop.

Questions about python dictionary equality [duplicate]

Curiously:
>>> a = 123
>>> b = 123
>>> a is b
True
>>> a = 123.
>>> b = 123.
>>> a is b
False
Seems a is b being more or less defined as id(a) == id(b). It is easy to make bugs this way:
basename, ext = os.path.splitext(fname)
if ext is '.mp3':
# do something
else:
# do something else
Some fnames unexpectedly ended up in the else block. The fix is simple, we should use ext == '.mp3' instead, but nonetheless if ext is '.mp3' on the surface seems like a nice pythonic way to write this and it's more readable than the "correct" way.
Since strings are immutable, what are the technical details of why it's wrong? When is an identity check better, and when is an equality check better?
They are fundamentally different.
== compares by calling the __eq__ method
is returns true if and only if the two references are to the same object
So in comparision with say Java:
is is the same as == for objects
== is the same as equals for objects
As far as I can tell, is checks for object identity equivalence. As there's no compulsory "string interning", two strings that just happen to have the same characters in sequence are, typically, not the same string object.
When you extract a substring from a string (or, really, any subsequence from a sequence), you will end up with two different objects, containing the same value(s).
So, use is when and only when you are comparing object identities. Use == when comparing values.
Simple rule for determining if to use is or == in Python
Here is an easy rule (unless you want to go to theory in Python interpreter or building frameworks doing funny things with Python objects):
Use is only for None comparison.
if foo is None
Otherwise use ==.
if x == 3
Then you are on the safe side. The rationale for this is already explained int the above comments. Don't use is if you are not 100% sure why to do it.
It would be also useful to define a class like this to be used as the default value for constants used in your API. In this case, it would be more correct to use is than the == operator.
class Sentinel(object):
"""A constant object that does not change even when copied."""
def __deepcopy__(self, memo):
# Always return the same object because this is essentially a constant.
return self
def __copy__(self):
# called via copy.copy(x)
return self
You should be warned by PyCharm when you use is with a literal with a warning such as SyntaxWarning: "is" with a literal. Did you mean "=="?. So, when comparing with a literal, always use ==. Otherwise, you may prefer using is in order to compare objects through their references.

How to test whether X quacks like a list/tuple

How do we test whether X quacks like a list/tuple?
By that I mean we can possibly subset it by 0 and 1 (etc), though it cannot be a string (which could also be subset).
I thought to test hasattr(X, '__iter__') and not isinstance(X, str), but that would mean a dictionary would still pass which I do not want it to. You could then also test it is not a dictionary, but I wouldn't be so sure about sublasses of dicts etc.
Is there a more official way to test whether something quacks like a list or tuple under this simple specification?
E.g. allowed input should be:
'emailaddr' --> ('emailaddr', 'emailaddr')
('emailaddr', 'alias')
['emailaddr', 'alias']
SequenceThing('emailaddr', 'alias')
Would you like to check if a value is a sequence?
An iterable which supports efficient element access using integer indices via the getitem() special method and defines a len() method that returns the length of the sequence. Some built-in sequence types are list, str, tuple, and bytes. Note that dict also supports getitem() and len(), but is considered a mapping rather than a sequence because the lookups use arbitrary immutable keys rather than integers.
Old way (deprecated in 2.7):
import operator
print operator.isSequenceType(x)
New way:
import collections
print isinstance(x, collections.Sequence)
They are not equivalent - the latter is more accurate but custom types need to be registered with it.
Also, strings are sequences according to both definitions. You might want to treat basestring specially.
Don't allow a bare string as an argument; it's a special case waiting to bite you. Instead, document that the input must be in the form of a tuple/list/whatever, but the second element is optional. That is,
['emailaddr', ]
('emailaddr', )
SequenceThing('emailaddr')
are legal, but not
'emailaddr'
This way, your function can assume that X[0] will be an email address, and X[1] may exist and be an alias, but it is the user's problem if they supply something else.
Duck typing means you just do want you want to do, and if it doesn't work it means you're not dealing with a duck.
If you specifically don't want to get strings, check for strings and raise an exception.
And then do whatever you want to do, no type checking.
A solution based on Kevin's useful comment:
>>> import collections
>>> isinstance([], collections.Sequence) and not isinstance([], str)
True
>>> isinstance((), collections.Sequence) and not isinstance((), str)
True
>>> isinstance('', collections.Sequence) and not isinstance('', str)
False
>>> isinstance({}, collections.Sequence) and not isinstance({}, str)
False
>>> isinstance(set(), collections.Sequence) and not isinstance(set(), str)
False
Mappings such as dicts and sets will not pass the first test. Strings will, but they will obviously fail on the second test.

How to distinguish between a sequence and a mapping

I would like to perform an operation on an argument based on the fact that it might be a map-like object or a sequence-like object. I understand that no strategy is going to be 100% reliable for type-like checking, but I'm looking for a robust solution.
Based on this answer, I know how to determine whether something is a sequence and I can do this check after checking if the object is a map.
def ismap(arg):
# How to implement this?
def isseq(arg):
return hasattr(arg,"__iter__")
def operation(arg):
if ismap(arg):
# Do something with a dict-like object
elif isseq(arg):
# Do something with a sequence-like object
else:
# Do something else
Because a sequence can be seen as a map where keys are integers, should I just try to find a key that is not an integer? Or maybe I could look at the string representation? or...?
UPDATE
I selected SilentGhost's answer because it looks like the most "correct" one, but for my needs, here is the solution I ended up implementing:
if hasattr(arg, 'keys') and hasattr(arg, '__getitem__'):
# Do something with a map
elif hasattr(arg, '__iter__'):
# Do something with a sequence/iterable
else:
# Do something else
Essentially, I don't want to rely on an ABC because there are many custom classes that behave like sequences and dictionary but that still do not extend the python collections ABCs (see #Manoj comment). I thought the keys attribute (mentioned by someone who removed his/her answer) was a good enough check for mappings.
Classes extending the Sequence and Mapping ABCs will work with this solution as well.
>>> from collections import Mapping, Sequence
>>> isinstance('ac', Sequence)
True
>>> isinstance('ac', Mapping)
False
>>> isinstance({3:42}, Mapping)
True
>>> isinstance({3:42}, Sequence)
False
collections abstract base classes (ABCs)
Sequences have an __add__ method that implements the + operator. Maps do not have that method, since adding to a map requires both a key and a value, and the + operator only has one right-hand side.
So you may try:
def ismap(arg):
return isseq(arg) and not hasattr(arg, "__add__")

Verifying that an object in python adheres to a specific structure

Is there some simple method that can check if an input object to some function adheres to a specific structure? For example, I want only a dictionary of string keys and values that are a list of integers.
One method would be to write a recursive function that you pass in the object and you iterate over it, checking at each level it is what you expect. But I feel that there should be a more elegant way to do it than this in python.
Why would you expect Python to provide an "elegant way" to check types, since the whole idea of type-checking is so utterly alien to the Pythonic way of conceiving the world and interacting with it?! Normally in Python you'd use duck typing -- so "an integer" might equally well be an int, a long, a gmpy.mpz -- types with no relation to each other except they all implement the same core signature... just as "a dict" might be any implementation of mapping, and so forth.
The new-in-2.6-and-later concept of "abstract base classes" provides a more systematic way to implement and verify duck typing, and 3.0-and-later function annotations let you interface with such a checking system (third-party, since Python adopts no such system for the foreseeable future). For example, this recipe provides a 3.0-and-later way to perform "kinda but not quite" type checking based on function annotations -- though I doubt it goes anywhere as deep as you desire, but then, it's early times for function annotations, and most of us Pythonistas feel so little craving for such checking that we're unlikely to run flat out to implement such monumental systems in lieu of actually useful code;-).
Short answer, no, you have to create your own function.
Long answer: its not pythonic to do what you're asking. There might be some special cases (e.g, marshalling a dict to xmlrpc), but by and large, assume the objects will act like what they're documented to be. If they don't, let the AttributeError bubble up. If you are ok with coercing values, then use str() and int() to convert them. They could, afterall, implement __str__, __add__, etc that makes them not descendants of int/str, but still usable.
def dict_of_string_and_ints(obj):
assert isinstance(obj, dict)
for key, value in obj.iteritems(): # py2.4
assert isinstance(key, basestring)
assert isinstance(value, list)
assert sum(isinstance(x, int) for x in value) == len(list)
Since Python emphasizes things just working, your best bet is to just assert as you go and trust the users of your library to feed you proper data. Let exceptions happen, if you must; that's on your clients for not reading your docstring.
In your case, something like this:
def myfunction(arrrrrgs):
assert issubclass(dict, type(arrrrrgs)), "Need a dictionary!"
for key in arrrrrgs:
assert type(key) is str, "Need a string!"
val = arrrrrgs[key]
assert type(val) is list, "Need a list!"
And so forth.
Really, it isn't worth the effort, and let your program blow up if you express yourself clearly in the docstring, or throw well-placed exceptions to guide the late-night debugger.
I will take a shot and propose a helper function that can do something like that for you in a more generic+elegant way:
def check_type(value, type_def):
"""
This validates an object instanct <value> against a type template <type_def>
presented as a simplified object.
E.g.
if value is list of dictionaries that have string values as key and integers
as values:
>> check_type(value, [{'':0}])
if value is list of dictionaries, no restriction on key/values
>> check_type(value, [{}])
"""
if type(value) != type(type_def):
return False
if hasattr(value, '__iter__'):
if len(type_def) == 0:
return True
type_def_val = iter(type_def).next()
for key in value:
if not check_type(key, type_def_val):
return False
if type(value) is dict:
if not check_type(value.values(), type_def.values()):
return False
return True
The comment explains a sample of usage, but you can always go pretty deep, e.g.
>>> check_type({1:['a', 'b'], 2:['c', 'd']}, {0:['']})
True
>>> check_type({1:['a', 'b'], 2:['c', 3]}, {0:['']})
False
P.S. Feel free to modify it if you want one-by-one tuple validation (e.g. validation against ([], '', {0:0}) which is not handled as it is expected now)

Categories

Resources