Why does the 'in' keyword claim it needs an iterable object?

Why does the 'in' keyword claim it needs an iterable object? - python

>>> non_iterable = 1
>>> 5 in non_iterable
Traceback (most recent call last):
File "<input>", line 1, in <module>
TypeError: 'int' object is not iterable
>>> class also_non_iterable:
... def __contains__(self,thing):
... return True
>>> 5 in also_non_iterable()
True
>>> isinstance(also_non_iterable(), Iterable)
False
Is there a reason in keyword claims to want an iterable object when what it truly wants is an object that implements __contains__?

It claims to want an iterable because, if the object's class does not implement an __contains__ , then in tries to iterate through the object and check if the values are equal to the values yield by it.
An Example to show that -
>>> class C:
... def __iter__(self):
... return iter([1,2,3,4])
>>>
>>> c = C()
>>> 2 in c
True
>>> 5 in c
False
This is explained in the documentation -
For user-defined classes which define the __contains__() method, x in y is true if and only if y.__contains__(x) is true.
For user-defined classes which do not define __contains__() but do define __iter__() , x in y is true if some value z with x == z is produced while iterating over y . If an exception is raised during the iteration, it is as if in raised that exception.

Is there a reason in keyword claims to want an iterable object when what it truly wants is an object that implements __contains__?
x in thing and for x in thing are very closely related. Almost everything that supports x in thing follows the rule that x in thing is true if and only if a for loop over thing will find an element equal to x. In particular, if an object supports iteration but not __contains__, Python will use iteration as a fallback for in tests.
The error message could say it needs __contains__, but that would be about as wrong as the current message, since __contains__ isn't strictly necessary. It could say it needs a container, but it's not immediately clear what counts as a container. For example, dicts support in, but calling them containers is questionable. The current message, which says it needs an iterable, is about as accurate as the other options. Its advantage is that in practice, "is it iterable" is a better check than "is it a container" or "does it support __contains__" for determining whether actual objects support in.

There are two different uses of in:
test if a container has a value (eg. left argument implements __contains__)
traverse through a sequence (eg. right argument is Iterable)

Related

How does hash-table in set works in python?

As far as I know, set in python works via a hash-table to achieve O(1) look-up complexity. While it is hash-table, every entry in a set must be hashable (or immutable).
So This peace of code raises exception:
>>> {dict()}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict'
Because dict is not hashable. But we can create our own class inherited from dict and implement the __hash__ magic method. I created my own in this way:
>>> class D(dict):
... def __hash__(self):
... return 3
...
I know it should not work properly but I just wanted to experiment with it. So I checked that I can now use this type in set:
>>> {D()}
{{}}
>>> {D(name='ali')}
{{'name': 'ali'}}
So far so good, but I thought that this way of implementing the __hash__ magic method would screw up the look up in set. Because every object of the D has the same hash value.
>>> d1 = D(n=1)
>>> d2 = D(n=2)
>>> hash(d1), hash(d2)
(3, 3)
>>>
>>> {d1, d2}
{{'n': 2}, {'n': 1}}
But the surprise for me was this:
>>> d3 = D()
>>> d3 in {d1, d2}
False
I expected the result to be True, because hash of d3 is 3 and currently there are values in our set with the same hash value. How does the set works internally?

To be usable in sets and dicts, a __hash__ method must guarantee that if x == y, then hash(x) == hash(y). But that's a one-sided implication. It's not at all required that if hash(x) == hash(h) then x == y must be true. Indeed, that's impossible to achieve in general (for example, there are an unbounded number of distinct Python ints, but only a finite number of hash codes - there must be distinct ints that have the same hash value).
That your hashes are all the same is fine. They only tell the set/dict where to start looking. All objects in the container with the same hash are then compared, one by one, for equality, until success, or until all such objects have been tried without success.
However, while making all hashes the same doesn't hurt correctness, it's a disaster for performance: it effectively turns the set/dict into an exceptionally slow way to do an O(n) linear search.

Why isinstance() method returning True (Python 3.xx)

I'm not too familiar with the OOP side of Python yet so this is a bit above my head.
>>> a=[[2,3,4]]
>>> types = (list,str,int,str,int)
>>> b=isinstance(a,types)
True
No matter what type I add on after the list type, the expression always returns True.
I have read that isinstance() method accepts derived class types but I can't really convince myself I know whats going on here.
This returns true as well. Is it because all 3 belong to the same class?
>>> isinstance([], (tuple, list, set))
True

In [1]: isinstance?
Docstring:
isinstance(object, class-or-type-or-tuple) -> bool
Return whether an object is an instance of a class or of a subclass thereof.
With a type as second argument, return whether that is the object's type.
The form using a tuple, isinstance(x, (A, B, ...)), is a shortcut for:
isinstance(x, A) or isinstance(x, B) or ... (etc.).
Type: builtin_function_or_method
So, it is actually a or statement. Very clear, right?

That's because when you give a tuple of types as second argument to isinstance it works as 'OR' (or any) , it would return True , if the type of first argument is any of the types in that tuple.
Example -
>>> isinstance('asd',(str,bool,int))
True

Although a list [] is considered an Iterator in Python, just like tuple, dictionary and set, it is not derived from the same based class (unless you consider object as the base type, which is pretty much for every type).
Try this:
>>> [].__class__
<type 'list'>
>>> ().__class__
<type 'tuple'>
>>> {}.__class__
<type 'dictionary'>
>>> [].__class__.__bases__
<type 'object'> # yeah sure
Like other have answered, giving a tuple to isinstance() is like saying this:
>>> type([]) in (tuple, list, set)
True
Classes like tuple and list are not subclasses derived from a same base class, they both just conform to a protocol of Iterable by providing __inter__() and __next__() methods. It's beauty of duck typing.

Checking for NaN presence in a container

NaN is handled perfectly when I check for its presence in a list or a set. But I don't understand how. [UPDATE: no it's not; it is reported as present if the identical instance of NaN is found; if only non-identical instances of NaN are found, it is reported as absent.]
I thought presence in a list is tested by equality, so I expected NaN to not be found since NaN != NaN.
hash(NaN) and hash(0) are both 0. How do dictionaries and sets tell NaN and 0 apart?
Is it safe to check for NaN presence in an arbitrary container using in operator? Or is it implementation dependent?
My question is about Python 3.2.1; but if there are any changes existing/planned in future versions, I'd like to know that too.
NaN = float('nan')
print(NaN != NaN) # True
print(NaN == NaN) # False
list_ = (1, 2, NaN)
print(NaN in list_) # True; works fine but how?
set_ = {1, 2, NaN}
print(NaN in set_) # True; hash(NaN) is some fixed integer, so no surprise here
print(hash(0)) # 0
print(hash(NaN)) # 0
set_ = {1, 2, 0}
print(NaN in set_) # False; works fine, but how?
Note that if I add an instance of a user-defined class to a list, and then check for containment, the instance's __eq__ method is called (if defined) - at least in CPython. That's why I assumed that list containment is tested using operator ==.
EDIT:
Per Roman's answer, it would seem that __contains__ for list, tuple, set, dict behaves in a very strange way:
def __contains__(self, x):
for element in self:
if x is element:
return True
if x == element:
return True
return False
I say 'strange' because I didn't see it explained in the documentation (maybe I missed it), and I think this is something that shouldn't be left as an implementation choice.
Of course, one NaN object may not be identical (in the sense of id) to another NaN object. (This not really surprising; Python doesn't guarantee such identity. In fact, I never saw CPython share an instance of NaN created in different places, even though it shares an instance of a small number or a short string.) This means that testing for NaN presence in a built-in container is undefined.
This is very dangerous, and very subtle. Someone might run the very code I showed above, and incorrectly conclude that it's safe to test for NaN membership using in.
I don't think there is a perfect workaround to this issue. One, very safe approach, is to ensure that NaN's are never added to built-in containers. (It's a pain to check for that all over the code...)
Another alternative is watch out for cases where in might have NaN on the left side, and in such cases, test for NaN membership separately, using math.isnan(). In addition, other operations (e.g., set intersection) need to also be avoided or rewritten.

Question #1: why is NaN found in a container when it's an identical object.
From the documentation:
For container types such as list, tuple, set, frozenset, dict, or
collections.deque, the expression x in y is equivalent to any(x is e
or x == e for e in y).
This is precisely what I observe with NaN, so everything is fine. Why this rule? I suspect it's because a dict/set wants to honestly report that it contains a certain object if that object is actually in it (even if __eq__() for whatever reason chooses to report that the object is not equal to itself).
Question #2: why is the hash value for NaN the same as for 0?
From the documentation:
Called by built-in function hash() and for operations on members of
hashed collections including set, frozenset, and dict. hash()
should return an integer. The only required property is that objects
which compare equal have the same hash value; it is advised to somehow
mix together (e.g. using exclusive or) the hash values for the
components of the object that also play a part in comparison of
objects.
Note that the requirement is only in one direction; objects that have the same hash do not have to be equal! At first I thought it's a typo, but then I realized that it's not. Hash collisions happen anyway, even with default __hash__() (see an excellent explanation here). The containers handle collisions without any problem. They do, of course, ultimately use the == operator to compare elements, hence they can easily end up with multiple values of NaN, as long as they are not identical! Try this:
>>> nan1 = float('nan')
>>> nan2 = float('nan')
>>> d = {}
>>> d[nan1] = 1
>>> d[nan2] = 2
>>> d[nan1]
1
>>> d[nan2]
2
So everything works as documented. But... it's very very dangerous! How many people knew that multiple values of NaN could live alongside each other in a dict? How many people would find this easy to debug?..
I would recommend to make NaN an instance of a subclass of float that doesn't support hashing and hence cannot be accidentally added to a set/dict. I'll submit this to python-ideas.
Finally, I found a mistake in the documentation here:
For user-defined classes which do not define __contains__() but do
define __iter__(), x in y is true if some value z with x == z is
produced while iterating over y. If an exception is raised during the
iteration, it is as if in raised that exception.
Lastly, the old-style iteration protocol is tried: if a class defines
__getitem__(), x in y is true if and only if there is a non-negative
integer index i such that x == y[i], and all lower integer indices do
not raise IndexError exception. (If any other exception is raised, it
is as if in raised that exception).
You may notice that there is no mention of is here, unlike with built-in containers. I was surprised by this, so I tried:
>>> nan1 = float('nan')
>>> nan2 = float('nan')
>>> class Cont:
... def __iter__(self):
... yield nan1
...
>>> c = Cont()
>>> nan1 in c
True
>>> nan2 in c
False
As you can see, the identity is checked first, before == - consistent with the built-in containers. I'll submit a report to fix the docs.

I can't repro you tuple/set cases using float('nan') instead of NaN.
So i assume that it worked only because id(NaN) == id(NaN), i.e. there is no interning for NaN objects:
>>> NaN = float('NaN')
>>> id(NaN)
34373956456
>>> id(float('NaN'))
34373956480
And
>>> NaN is NaN
True
>>> NaN is float('NaN')
False
I believe tuple/set lookups has some optimization related to comparison of the same objects.
Answering your question - it seam to be unsafe to relay on in operator while checking for presence of NaN. I'd recommend to use None, if possible.
Just a comment. __eq__ has nothing to do with is statement, and during lookups comparison of objects' ids seem to happen prior to any value comparisons:
>>> class A(object):
... def __eq__(*args):
... print '__eq__'
...
>>> A() == A()
__eq__ # as expected
>>> A() is A()
False # `is` checks only ids
>>> A() in [A()]
__eq__ # as expected
False
>>> a = A()
>>> a in [a]
True # surprise!

Which of these are immutable in Python?

I am trying to figure out whether the following are immutable in Sage (which is built on Python so I believe if it is immutable in python, I believe in most cases it will be immutable in Sage)
Below are objects e, f, g, i
class e: pass
f = e()
g = pi # (g's "type" in Sage is symbolic expression. It's supposed to be 3.1415....etc)
i = lambda x: x*x
I gather that e is a class which means it is mutable (Does an immutable class make sense? Can't all classes be modified?). Since f is an instance of a class, I am guessing it is also mutable since classes are mutable.
Since numbers are immutable in Python, g should be immutable as well since it is a number despite being irrational
Finally i is a function which means it should be mutable?
I'm not quite sure I understand that concept of immutability. What would it mean for a function to be immutable? For a class to be immutable?

e is mutable. You can, for instance, add a new method on the class: e.foo = lambda self,x: x.
f is mutable. You can, for instance, add a new field to this class instance: f.x = 99.
g is immutable. You can't change anything about it.
i is not immutable. You can do all sorts of evil things to it: i.func_code = (lambda x: 123).func_code after which i(10) will be 123 instead of 100. (You can also do more reasonable things to it. After i.__doc__ = "This function returns the square of its argument." you will get a more helpful result from help(i).)
An object is mutable if there's something you can do to the object that changes its possible future behaviour. You can't change the behaviour of 10; you can change the behaviour of a function object, or a class, or a class instance, or a list. (But not a tuple. Once a tuple is made, it stays just as it is for as long as it exists.)

Formally? An object is mutable if it can change value without changing identity.
Lists are mutable, so the value of a particular instance can change over time:
>>> x = orig_x = []
>>> x == []
True
>>> x += [1]
>>> x == [] # The value of x has changed
False
>>> x is orig_x # But the identity remains the same
True
Numbers are immutable, however, so their value can't change. Instead, the variable has to be updated to refer to a completely different object:
>>> x = orig_x = 1
>>> x == 1
True
>>> x += 1
>>> x == 1 # Again, the value of x has changed
False
>>> x is orig_x # But now the identity has changed as well
False
Immutability is an important concept, since knowing that an object's value can't change lets you make certain assumptions about it (for example, dict effectively requires immutable keys and set and frozenset require immutable members, as the value of an object affects how it should be stored in the data structure. If mutable entries were permitted, they may end up being in the wrong place if they are modified after being stored)
Contrary to popular belief, user defined classes that don't override the definition of equality are technically immutable. This is because the default definition of the "value" of a user defined class is just id(self). When an object's value is its identity, there is obviously no way for them to differ over time, and hence the object doesn't quality as "mutable".
Informally? Most people use an intuitive "Can I change it?" definition along the lines of Gareth McCaughan's answer. It's the same basic idea as the formal definition, just using a broader meaning of the term "value" than the technical definition in terms of equality checks.

Under what circumstances are rmul called?

Say I have a list l. Under what circumstance is l.__rmul__(self, other) called?
I basically understood the documentation, but I would also like to see an example to clarify its usages beyond any doubt.

When Python attempts to multiply two objects, it first tries to call the left object's __mul__() method. If the left object doesn't have a __mul__() method (or the method returns NotImplemented, indicating it doesn't work with the right operand in question), then Python wants to know if the right object can do the multiplication. If the right operand is the same type as the left, Python knows it can't, because if the left object can't do it, another object of the same type certainly can't either.
If the two objects are different types, though, Python figures it's worth a shot. However, it needs some way to tell the right object that it is the right object in the operation, in case the operation is not commutative. (Multiplication is, of course, but not all operators are, and in any case * is not always used for multiplication!) So it calls __rmul__() instead of __mul__().
As an example, consider the following two statements:
print "nom" * 3
print 3 * "nom"
In the first case, Python calls the string's __mul__() method. The string knows how to multiply itself by an integer, so all is well. In the second case, the integer does not know how to multiply itself by a string, so its __mul__() returns NotImplemented and the string's __rmul__() is called. It knows what to do, and you get the same result as the first case.
Now we can see that __rmul__() allows all of the string's special multiplication behavior to be contained in the str class, such that other types (such as integers) do not need to know anything about strings to be able to multiply by them. A hundred years from now (assuming Python is still in use) you will be able to define a new type that can be multiplied by an integer in either order, even though the int class has known nothing of it for more than a century.
By the way, the string class's __mul__() has a bug in some versions of Python. If it doesn't know how to multiply itself by an object, it raises a TypeError instead of returning NotImplemented. That means you can't multiply a string by a user-defined type even if the user-defined type has an __rmul__() method, because the string never lets it have a chance. The user-defined type has to go first (e.g. Foo() * 'bar' instead of 'bar' * Foo()) so its __mul__() is called. They seem to have fixed this in Python 2.7 (I tested it in Python 3.2 also), but Python 2.6.6 has the bug.

Binary operators by their nature have two operands. Each operand may be on either the left or the right side of an operator. When you overload an operator for some type, you can specify for which side of the operator the overloading is done. This is useful when invoking the operator on two operands of different types. Here's an example:
class Foo(object):
def __init__(self, val):
self.val = val
def __str__(self):
return "Foo [%s]" % self.val
class Bar(object):
def __init__(self, val):
self.val = val
def __rmul__(self, other):
return Bar(self.val * other.val)
def __str__(self):
return "Bar [%s]" % self.val
f = Foo(4)
b = Bar(6)
obj = f * b # Bar [24]
obj2 = b * f # ERROR
Here, obj will be a Bar with val = 24, but the assignment to obj2 generates an error because Bar has no __mul__ and Foo has no __rmul__.
I hope this is clear enough.

__mul__() is the case of a dot product and the result of the dot product should be a scalar or just a number, i.e. __mul__() results in a dot product multiplication like x1*x2+y1*y2. In __rmul__() , the result is a point with x = x1*x2 and y = y1*y2 .

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why does the 'in' keyword claim it needs an iterable object? - python

There are two different uses of in: test if a container has a value (eg. left argument implements contains) traverse through a sequence (eg. right argument is Iterable)

Related

How does hash-table in set works in python?

Why isinstance() method returning True (Python 3.xx)

Checking for NaN presence in a container

Which of these are immutable in Python?

Under what circumstances are rmul called?

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why does the 'in' keyword claim it needs an iterable object? - python

There are two different uses of in: test if a container has a value (eg. left argument implements __contains__) traverse through a sequence (eg. right argument is Iterable)

Related

How does hash-table in set works in python?

Why isinstance() method returning True (Python 3.xx)

Checking for NaN presence in a container

Which of these are immutable in Python?

Under what circumstances are __rmul__ called?

Categories

Resources

There are two different uses of in: test if a container has a value (eg. left argument implements contains) traverse through a sequence (eg. right argument is Iterable)

Under what circumstances are rmul called?