How to distinguish between a sequence and a mapping - python

I would like to perform an operation on an argument based on the fact that it might be a map-like object or a sequence-like object. I understand that no strategy is going to be 100% reliable for type-like checking, but I'm looking for a robust solution.
Based on this answer, I know how to determine whether something is a sequence and I can do this check after checking if the object is a map.
def ismap(arg):
# How to implement this?
def isseq(arg):
return hasattr(arg,"__iter__")
def operation(arg):
if ismap(arg):
# Do something with a dict-like object
elif isseq(arg):
# Do something with a sequence-like object
else:
# Do something else
Because a sequence can be seen as a map where keys are integers, should I just try to find a key that is not an integer? Or maybe I could look at the string representation? or...?
UPDATE
I selected SilentGhost's answer because it looks like the most "correct" one, but for my needs, here is the solution I ended up implementing:
if hasattr(arg, 'keys') and hasattr(arg, '__getitem__'):
# Do something with a map
elif hasattr(arg, '__iter__'):
# Do something with a sequence/iterable
else:
# Do something else
Essentially, I don't want to rely on an ABC because there are many custom classes that behave like sequences and dictionary but that still do not extend the python collections ABCs (see #Manoj comment). I thought the keys attribute (mentioned by someone who removed his/her answer) was a good enough check for mappings.
Classes extending the Sequence and Mapping ABCs will work with this solution as well.

>>> from collections import Mapping, Sequence
>>> isinstance('ac', Sequence)
True
>>> isinstance('ac', Mapping)
False
>>> isinstance({3:42}, Mapping)
True
>>> isinstance({3:42}, Sequence)
False
collections abstract base classes (ABCs)

Sequences have an __add__ method that implements the + operator. Maps do not have that method, since adding to a map requires both a key and a value, and the + operator only has one right-hand side.
So you may try:
def ismap(arg):
return isseq(arg) and not hasattr(arg, "__add__")

Related

Questions about python dictionary equality [duplicate]

Curiously:
>>> a = 123
>>> b = 123
>>> a is b
True
>>> a = 123.
>>> b = 123.
>>> a is b
False
Seems a is b being more or less defined as id(a) == id(b). It is easy to make bugs this way:
basename, ext = os.path.splitext(fname)
if ext is '.mp3':
# do something
else:
# do something else
Some fnames unexpectedly ended up in the else block. The fix is simple, we should use ext == '.mp3' instead, but nonetheless if ext is '.mp3' on the surface seems like a nice pythonic way to write this and it's more readable than the "correct" way.
Since strings are immutable, what are the technical details of why it's wrong? When is an identity check better, and when is an equality check better?
They are fundamentally different.
== compares by calling the __eq__ method
is returns true if and only if the two references are to the same object
So in comparision with say Java:
is is the same as == for objects
== is the same as equals for objects
As far as I can tell, is checks for object identity equivalence. As there's no compulsory "string interning", two strings that just happen to have the same characters in sequence are, typically, not the same string object.
When you extract a substring from a string (or, really, any subsequence from a sequence), you will end up with two different objects, containing the same value(s).
So, use is when and only when you are comparing object identities. Use == when comparing values.
Simple rule for determining if to use is or == in Python
Here is an easy rule (unless you want to go to theory in Python interpreter or building frameworks doing funny things with Python objects):
Use is only for None comparison.
if foo is None
Otherwise use ==.
if x == 3
Then you are on the safe side. The rationale for this is already explained int the above comments. Don't use is if you are not 100% sure why to do it.
It would be also useful to define a class like this to be used as the default value for constants used in your API. In this case, it would be more correct to use is than the == operator.
class Sentinel(object):
"""A constant object that does not change even when copied."""
def __deepcopy__(self, memo):
# Always return the same object because this is essentially a constant.
return self
def __copy__(self):
# called via copy.copy(x)
return self
You should be warned by PyCharm when you use is with a literal with a warning such as SyntaxWarning: "is" with a literal. Did you mean "=="?. So, when comparing with a literal, always use ==. Otherwise, you may prefer using is in order to compare objects through their references.

How to test whether X quacks like a list/tuple

How do we test whether X quacks like a list/tuple?
By that I mean we can possibly subset it by 0 and 1 (etc), though it cannot be a string (which could also be subset).
I thought to test hasattr(X, '__iter__') and not isinstance(X, str), but that would mean a dictionary would still pass which I do not want it to. You could then also test it is not a dictionary, but I wouldn't be so sure about sublasses of dicts etc.
Is there a more official way to test whether something quacks like a list or tuple under this simple specification?
E.g. allowed input should be:
'emailaddr' --> ('emailaddr', 'emailaddr')
('emailaddr', 'alias')
['emailaddr', 'alias']
SequenceThing('emailaddr', 'alias')
Would you like to check if a value is a sequence?
An iterable which supports efficient element access using integer indices via the getitem() special method and defines a len() method that returns the length of the sequence. Some built-in sequence types are list, str, tuple, and bytes. Note that dict also supports getitem() and len(), but is considered a mapping rather than a sequence because the lookups use arbitrary immutable keys rather than integers.
Old way (deprecated in 2.7):
import operator
print operator.isSequenceType(x)
New way:
import collections
print isinstance(x, collections.Sequence)
They are not equivalent - the latter is more accurate but custom types need to be registered with it.
Also, strings are sequences according to both definitions. You might want to treat basestring specially.
Don't allow a bare string as an argument; it's a special case waiting to bite you. Instead, document that the input must be in the form of a tuple/list/whatever, but the second element is optional. That is,
['emailaddr', ]
('emailaddr', )
SequenceThing('emailaddr')
are legal, but not
'emailaddr'
This way, your function can assume that X[0] will be an email address, and X[1] may exist and be an alias, but it is the user's problem if they supply something else.
Duck typing means you just do want you want to do, and if it doesn't work it means you're not dealing with a duck.
If you specifically don't want to get strings, check for strings and raise an exception.
And then do whatever you want to do, no type checking.
A solution based on Kevin's useful comment:
>>> import collections
>>> isinstance([], collections.Sequence) and not isinstance([], str)
True
>>> isinstance((), collections.Sequence) and not isinstance((), str)
True
>>> isinstance('', collections.Sequence) and not isinstance('', str)
False
>>> isinstance({}, collections.Sequence) and not isinstance({}, str)
False
>>> isinstance(set(), collections.Sequence) and not isinstance(set(), str)
False
Mappings such as dicts and sets will not pass the first test. Strings will, but they will obviously fail on the second test.

Modifying the immutable class instance

I've got a somewhat complex data-type Mask that I'd like to be able to use fast identity checking for such as:
seen = set()
m = Mask(...)
if m in seen:
...
This suggests that Mask should be hashable and therefore immutable. However, I'd like to generate variants of m and Mask seems like the place to encapsulate the variation logic. Here is a minimalish example that demonstrates what I want to accomplish:
class Mask(object):
def __init__(self, seq):
self.seq = seq
self.hash = reduce(lambda x, y: x ^ y, self.seq)
# __hash__ and __cmp__ satisfy the hashable contract ยง3.4.1)
def __hash__(self):
return self.hash
def __cmp__(self, other):
return cmp(self.seq, other.seq)
def complement(self):
# cannot modify self without violating immutability requirement
# so return a new instance instead
return Mask([-x for x in self.seq])
This satisfies all of the hashable and immutable properties. The peculiarity is having what is effectively a Factory method complement; is this a reasonable way to implement the desired function? Note that I am not at all interested in protecting against "malicious" modification as many related questions on SO are looking to achieve.
As this example is intentionally small it could be trivially solved by making a tuple of seq. The type that I am actually working with does not afford such simple casting.
Yes, this is pretty much how you write an immutable class; methods that would otherwise change an object's state become, in your terms, "factories" that create new instances.
Note that in the specific case of computing a complement, you can name the method __invert__ and use it as
inv_mask = ~mask
Operator syntax is a strong signal to client code that operations return new values of the same type.
When using immutable types, all 'variation' methods must return new objects/instances, and thus be factories in that sense.
Many languages make strings immutable (including Python). So all string operations return new strings; strB = strA.replace(...) etc.
If you could change the instance at all, it wouldn't be immutable.
Edit:
Rereading, you're asking if this is reasonable, and I say yes. The logic would not be better put somewhere else, and as I pointed out with string immutability it is a common paradigm to get new variations from existing ones.
Python doesn't inforce immutability. It is up to you to make sure that objects are not modified while they are in a set. Otherwise you don't need immutability to use sets, dictionaries in Python.

Python: check if an object is a sequence

In python is there an easy way to tell if something is not a sequence? I tried to just do:
if x is not sequence but python did not like that
iter(x) will raise a TypeError if x cannot be iterated on -- but that check "accepts" sets and dictionaries, though it "rejects" other non-sequences such as None and numbers.
On the other hands, strings (which most applications want to consider "single items" rather than sequences) are in fact sequences (so, any test, unless specialcased for strings, is going to confirm that they are). So, such simple checks are often not sufficient.
In Python 2.6 and better, abstract base classes were introduced, and among other powerful features they offer more good, systematic support for such "category checking".
>>> import collections
>>> isinstance([], collections.Sequence)
True
>>> isinstance((), collections.Sequence)
True
>>> isinstance(23, collections.Sequence)
False
>>> isinstance('foo', collections.Sequence)
True
>>> isinstance({}, collections.Sequence)
False
>>> isinstance(set(), collections.Sequence)
False
You'll note strings are still considered "a sequence" (since they are), but at least you get dicts and sets out of the way. If you want to exclude strings from your concept of "being sequences", you could use collections.MutableSequence (but that also excludes tuples, which, like strings, are sequences, but are not mutable), or do it explicitly:
import collections
def issequenceforme(obj):
if isinstance(obj, basestring):
return False
return isinstance(obj, collections.Sequence)
Season to taste, and serve hot!-)
PS: For Python 3, use str instead of basestring, and for Python 3.3+: Abstract Base Classes like Sequence have moved to collections.abc.
For Python 3 and 2.6+, you can check if it's a subclass of collections.Sequence:
>>> import collections
>>> isinstance(myObject, collections.Sequence)
True
In Python 3.7 you must use collections.abc.Sequence (collections.Sequence will be removed in Python 3.8):
>>> import collections.abc
>>> isinstance(myObject, collections.abc.Sequence)
True
However, this won't work for duck-typed sequences which implement __len__() and __getitem__() but do not (as they should) subclass collections.Sequence. But it will work for all the built-in Python sequence types: lists, tuples, strings, etc.
While all sequences are iterables, not all iterables are sequences (for example, sets and dictionaries are iterable but not sequences). Checking hasattr(type(obj), '__iter__') will return True for dictionaries and sets.
Since Python "adheres" duck typing, one of the approach is to check if an object has some member (method).
A sequence has length, has sequence of items, and support slicing [doc]. So, it would be like this:
def is_sequence(obj):
t = type(obj)
return hasattr(t, '__len__') and hasattr(t, '__getitem__')
# additionally: and hasattr(t, '__setitem__') and hasattr(t, '__delitem__')
They are all special methods, __len__() should return number of items, __getitem__(i) should return an item (in sequence it is i-th item, but not with mapping), __getitem__(slice(start, stop, step)) should return subsequence, and __setitem__ and __delitem__ like you expect. This is such a contract, but whether the object really do these or not depends on whether the object adheres the contract or not.
Note that, the function above will also return True for mapping, e.g. dict, since mapping also has these methods. To overcome this, you can do a heavier work:
def is_sequence(obj):
try:
len(obj)
obj[0:0]
return True
except TypeError:
return False
But most of the time you don't need this, just do what you want as if the object is a sequence and catch an exception if you wish. This is more pythonic.
For the sake of completeness. There is a utility is_sequence in numpy library ("The fundamental package for scientific computing with Python").
>>> from numpy.distutils.misc_util import is_sequence
>>> is_sequence((2,3,4))
True
>>> is_sequence(45.9)
False
But it accepts sets as sequences and rejects strings
>>> is_sequence(set((1,2)))
True
>>> is_sequence("abc")
False
The code looks a bit like #adrian 's (See numpy git code), which is kind of shaky.
def is_sequence(seq):
if is_string(seq):
return False
try:
len(seq)
except Exception:
return False
return True
The Python 2.6.5 documentation describes the following sequence types: string, Unicode string, list, tuple, buffer, and xrange.
def isSequence(obj):
return type(obj) in [str, unicode, list, tuple, buffer, xrange]
why ask why
try getting a length and if exception return false
def haslength(seq):
try:
len(seq)
except:
return False
return True
Why are you doing this? The normal way here is to require a certain type of thing (A sequence or a number or a file-like object, etc.) and then use it without checking anything. In Python, we don't typically use classes to carry semantic information but simply use the methods defined (this is called "duck typing"). We also prefer APIs where we know exactly what to expect; use keyword arguments, preprocessing, or defining another function if you want to change how a function works.

Verifying that an object in python adheres to a specific structure

Is there some simple method that can check if an input object to some function adheres to a specific structure? For example, I want only a dictionary of string keys and values that are a list of integers.
One method would be to write a recursive function that you pass in the object and you iterate over it, checking at each level it is what you expect. But I feel that there should be a more elegant way to do it than this in python.
Why would you expect Python to provide an "elegant way" to check types, since the whole idea of type-checking is so utterly alien to the Pythonic way of conceiving the world and interacting with it?! Normally in Python you'd use duck typing -- so "an integer" might equally well be an int, a long, a gmpy.mpz -- types with no relation to each other except they all implement the same core signature... just as "a dict" might be any implementation of mapping, and so forth.
The new-in-2.6-and-later concept of "abstract base classes" provides a more systematic way to implement and verify duck typing, and 3.0-and-later function annotations let you interface with such a checking system (third-party, since Python adopts no such system for the foreseeable future). For example, this recipe provides a 3.0-and-later way to perform "kinda but not quite" type checking based on function annotations -- though I doubt it goes anywhere as deep as you desire, but then, it's early times for function annotations, and most of us Pythonistas feel so little craving for such checking that we're unlikely to run flat out to implement such monumental systems in lieu of actually useful code;-).
Short answer, no, you have to create your own function.
Long answer: its not pythonic to do what you're asking. There might be some special cases (e.g, marshalling a dict to xmlrpc), but by and large, assume the objects will act like what they're documented to be. If they don't, let the AttributeError bubble up. If you are ok with coercing values, then use str() and int() to convert them. They could, afterall, implement __str__, __add__, etc that makes them not descendants of int/str, but still usable.
def dict_of_string_and_ints(obj):
assert isinstance(obj, dict)
for key, value in obj.iteritems(): # py2.4
assert isinstance(key, basestring)
assert isinstance(value, list)
assert sum(isinstance(x, int) for x in value) == len(list)
Since Python emphasizes things just working, your best bet is to just assert as you go and trust the users of your library to feed you proper data. Let exceptions happen, if you must; that's on your clients for not reading your docstring.
In your case, something like this:
def myfunction(arrrrrgs):
assert issubclass(dict, type(arrrrrgs)), "Need a dictionary!"
for key in arrrrrgs:
assert type(key) is str, "Need a string!"
val = arrrrrgs[key]
assert type(val) is list, "Need a list!"
And so forth.
Really, it isn't worth the effort, and let your program blow up if you express yourself clearly in the docstring, or throw well-placed exceptions to guide the late-night debugger.
I will take a shot and propose a helper function that can do something like that for you in a more generic+elegant way:
def check_type(value, type_def):
"""
This validates an object instanct <value> against a type template <type_def>
presented as a simplified object.
E.g.
if value is list of dictionaries that have string values as key and integers
as values:
>> check_type(value, [{'':0}])
if value is list of dictionaries, no restriction on key/values
>> check_type(value, [{}])
"""
if type(value) != type(type_def):
return False
if hasattr(value, '__iter__'):
if len(type_def) == 0:
return True
type_def_val = iter(type_def).next()
for key in value:
if not check_type(key, type_def_val):
return False
if type(value) is dict:
if not check_type(value.values(), type_def.values()):
return False
return True
The comment explains a sample of usage, but you can always go pretty deep, e.g.
>>> check_type({1:['a', 'b'], 2:['c', 'd']}, {0:['']})
True
>>> check_type({1:['a', 'b'], 2:['c', 3]}, {0:['']})
False
P.S. Feel free to modify it if you want one-by-one tuple validation (e.g. validation against ([], '', {0:0}) which is not handled as it is expected now)

Categories

Resources