`in` defined for generator - python

Why is in operator defined for generators?
>>> def foo():
... yield 42
...
>>>
>>> f = foo()
>>> 10 in f
False
What are the possible use cases?
I know that range(...) objects have a __contains__ function defined so that we can do stuff like this:
>>> r = range(10)
>>> 4 in r
True
>>> r.__contains__
<method-wrapper '__contains__' of range object at 0x7f82bd51cc00>
But f above doesn't have __contains__ method.

"What are the possible use cases?" To check if the generator will produce some value.
Dunder methods serve as hooks for the particular syntax they are associated with. __contains__ isn't some kind of one-to-one mapping to x in y. The language ultimately defines the semantics of these operators.
From the documentation of membership testing, we see there are several ways for x in y to be evaluated, depending on various properties of the objects involved. I've highlted the relevant one for generator objects, which do not define a __contains__ but are iterable, i.e., they define an __iter__ method:
The operators in and not in test for membership. x in s evaluates to
True if x is a member of s, and False otherwise. x not in s returns
the negation of x in s. All built-in sequences and set types support
this as well as dictionary, for which in tests whether the dictionary
has a given key. For container types such as list, tuple, set,
frozenset, dict, or collections.deque, the expression x in y is
equivalent to any(x is e or x == e for e in y).
For the string and bytes types, x in y is True if and only if x is a
substring of y. An equivalent test is y.find(x) != -1. Empty strings
are always considered to be a substring of any other string, so "" in "abc" will return True.
For user-defined classes which define the __contains__() method, x in
y returns True if y.__contains__(x) returns a true value, and False
otherwise.
For user-defined classes which do not define contains() but do
define __iter__(), x in y is True if some value z, for which the
expression x is z or x == z is true, is produced while iterating over
y. If an exception is raised during the iteration, it is as if in
raised that exception.
Lastly, the old-style iteration protocol is tried: if a class defines
__getitem__(), x in y is True if and only if there is a non-negative integer index i such that x is y[i] or x == y[i], and no lower integer
index raises the IndexError exception. (If any other exception is
raised, it is as if in raised that exception).
The operator not in is defined to have the inverse truth value of in.
To summarize, x in y will be defined for objects that:
Are strings or bytes, and it is defined as a substring relationship.
types that define __contains__
types that are iterators, i.e. that define __iter__
the old-style iteration protocol (relies on __getitem__)
Generators fall into 3.
A broader point, you really shouldn't use the dunder methods directly, unless you really understand what they are doing. Even then, it may be best to aviod it.
It usually isn't worth trying to be credible or succinct by using something to the effect of:
x.__lt__(y)
Instead of:
x < y
You should at least understand, that this might happen:
>>> (1).__lt__(3.)
NotImplemented
>>>
And if you are just naively doing stuff like filter((1).__lt__, iterable) then you've probably got a bug.

Generators are iterable, and so have an .__iter__ method, which can be used to check membership
How this behaves is described in the Expressions docs on Membership test operations
For user-defined classes which do not define __contains__() but do define __iter__(), x in y is True if some value z, for which the expression x is z or x == z is true, is produced while iterating over y. If an exception is raised during the iteration, it is as if in raised that exception.
Emphasis mine!
This pops the whole generator, and your example with 42 simply doesn't include the tested value of 10
>>> def foo():
... yield 5
... yield 10
...
>>> f = foo()
>>> 10 in f
True
>>> 10 in f
False

Related

Comparison operator "==" for value equality or reference equality?

So many tutorials have stated that the == comparison operator is for value equality, like in this answer, quote:
== is for value equality. Use it when you would like to know if two objects have the same value.
is is for reference equality. Use it when you would like to know if two references refer to the same object.
However, I found that the Python doc says that:
x==y calls x.__eq__(y). By default, object implements __eq__() by using is, returning NotImplemented in the case of a false comparison: True if x is y else NotImplemented."
It seems the default behavior of the == operator is to compare the reference quality like the is operator, which contradicts what these tutorials say.
So what exactly should I use == for? value equality or reference equality? Or it just depends on how you implement the __eq__ method.
I think the doc of Value comparisons has illustrated this question clearly:
The operators <, >, ==, >=, <=, and != compare the values of two objects. The value of an object is a rather abstract notion in Python. Comparison operators implement a particular notion of what the value of an object is. One can think of them as defining the value of an object indirectly, by means of their comparison implementation.
The behavior of the default equality comparison, that instances with different identities are always unequal, may be in contrast to what types will need that have a sensible definition of object value and value-based equality. Such types will need to customize their comparison behavior, and in fact, a number of built-in types have done that.
The default behavior for equality comparison (== and !=) is based on the identity of the objects. Hence, equality comparison of instances with the same identity results in equality, and equality comparison of instances with different identities results in inequality. A motivation for this default behavior is the desire that all objects should be reflexive (i.e. x is y implies x == y).
It also includes a list that describes the comparison behavior of the most important built-in types like numbers, strings and sequences, etc.
It solely depends on what __eq__ does. The default __eq__ of type object behaves like is. Some builtin datatypes use their own implementation. For example, two lists are equal if all their values are equal. You just have to know this.
object implements __eq__() by using is, but many classes in the standard library implement __eq__() using value equality. E.g.:
>>> l1 = [1, 2, 3]
>>> l2 = [1, 2, 3]
>>> l3 = l1
>>> l1 is l2
False
>>> l1 == l2
True
>>> l1 is l3
True
In your own classes, you can implement __eq__() as it makes sense, e.g.:
class Point():
def __init__(self, x, y):
self.x = x
self.y = y
def __eq__(self, other):
return self.x == other.x and self.y == other.x
To add your thought process with a simple definition ,
The Equality operator == compares the values of both the operands and checks for value equality. Whereas the is operator checks whether both the operands refer to the same object or not (present in the same memory location).
In a nutshell, is checks whether two references point to the same object or not.== checks whether two objects have the same value or not.
For example:
a=[1,2,3]
b=a #a and b point to the same object
c=list(a) #c points to different object
if a==b:
print('#') #output:#
if a is b:
print('##') #output:##
if a==c:
print('###') #output:##
if a is c:
print('####') #no output as c and a point to different object
one = 1
a = one
b = one
if (a == b): # this if works like this: if (1 == 1)
if (a is 1): # this if works like this: if (int.object(1, memory_location:somewhere) == int.object(1, memory_location:variable.one))
thus, a is 1 won't work because its arguments are not pointing to the same location.

In Python, how is the in operator implemented to work? Does it use the next() method of the iterators?

In Python, it is known that in checks for membership in iterators (lists, dictionaries, etc) and looks for substrings in strings. My question is regarding how in is implemented to achieve all of the following: 1) test for membership, 2) test for substrings and 3) access to the next element in a for-loop. For example, when for i in myList: or if i in myList: is executed, does in call myList.__next__()? If it does call it, how then does it work with strings, given that str objects are not iterators(as checked in Python 2.7) and so do not have the next() method? If a detailed discussion of in's implementation is not possible, would appreciate if a gist of it is supplied here.
A class can define how the in operator works on instances of that class by defining a __contains__ method.
The Python data model documentation says:
For objects that don’t define __contains__(), the membership test first tries iteration via __iter__(), then the old sequence iteration protocol via __getitem__(), see this section in the language reference.
Section 6.10.2, "Membership test operations", of the Python language reference has this to say:
The operators in and not in test for membership. x in s evaluates to True if x is a member of s, and False otherwise. x not in s returns the negation of x in s. All built-in sequences and set types support this as well as dictionary, for which in tests whether the dictionary has a given key. For container types such as list, tuple, set, frozenset, dict, or collections.deque, the expression x in y is equivalent to any(x is e or x == e for e in y).
For the string and bytes types, x in y is True if and only if x is a substring of y. An equivalent test is y.find(x) != -1. Empty strings are always considered to be a substring of any other string, so "" in "abc" will return True.
For user-defined classes which define the __contains__() method, x in y returns True if y.__contains__(x) returns a true value, and False otherwise.
For user-defined classes which do not define __contains__() but do define __iter__(), x in y is True if some value z with x == z is produced while iterating over y. If an exception is raised during the iteration, it is as if in raised that exception.
Lastly, the old-style iteration protocol is tried: if a class defines __getitem__(), x in y is True if and only if there is a non-negative integer index i such that x == y[i], and all lower integer indices do not raise IndexError exception. (If any other exception is raised, it is as if in raised that exception).
The operator not in is defined to have the inverse true value of in.
As a comment indicates above, the expression operator in is distinct from the keyword in which forms a part of the for statement. In the Python grammar, the in is "hardcoded" as a part of the syntax of for:
for_stmt ::= "for" target_list "in" expression_list ":" suite
["else" ":" suite]
So in the context of a for statement, in doesn't behave as an operator, it's simply a syntactic marker to separate the target_list from the expression_list.
Python has __contains__ special method that is used when you do item in collection.
For example, here's a class that "__contains__" all even numbers:
>>> class EvenNumbers:
... def __contains__(self, item):
... return item % 2 == 0
...
>>> en = EvenNumbers()
>>> 2 in en
True
>>> 3 in en
False
>>>

Why does an empty list evaluates to False on a while loop in Python

What is process that occurs for a while-loop to evaluate to False on an empty list ?
For instance:
a=[1, 2, 3]
while a:
a.pop()
Essentially, I want to know which method or attribute of the list object the while-loop is inspecting in order to decide wether to terminate or not.
Loops and conditionals implicitly use bool on all their conditions. The procedure is documented explicitly in the "Truth Value Testing" section of the docs. For a sequence like a list, this usually ends up being a check of the __len__ method.
bool works like this: first it tries the __bool__ method. If __bool__ is not implemented, it checks if __len__ is nonzero, and if that isn't possible, just returns True.
As with all magic method lookup, Python will only look at the class, never the instance (see Special method lookup). If your question is about how to change the behavior, you will need to subclass. Assigning a single replacement method to an instance dictionary won't work at all.
Great question! It's inspecting bool(a), which (usually) calls type(a).__bool__(a).
Python implements certain things using "magic methods". Basically, if you've got a data type defined like so:
class MyExampleDataType:
def __init__(self, val):
self.val = val
def __bool__(self):
return self.val > 20
Then this code will do what it looks like it'll do:
a = MyExampleDataType(5)
b = MyExampleDataType(30)
if a:
print("Won't print; 5 < 20")
if b:
print("Will print; 30 > 20")
For more information, see the Python Documentation: 3.3 Special Method Names.
A condition like if my_var is equivalent to if bool(my_var) and this page explains it rather nicely:
Return Value from bool()
The bool() returns:
False if the value is omitted or false
True if the value is true
The following values are considered false in Python:
None
False
Zero of any numeric type. For example, 0, 0.0, 0j
Empty sequence. For example, (), [], ''.
Empty mapping. For example, {}
objects of Classes which has bool() or len() method which returns 0 or False
All other values except these values are considered true.
You can check whether a list is empty by using the bool() function. It evaluates to False if the variable doesn't exist or in the case of a list, when it is empty.
In your case you could do this:
a=[1,2,3]
while bool(a) is True:
a.pop()
Or even easier:
a = [1,2,3]
while len(a) > 0:
print(a.pop())

Best Practice for Equality in Python

is there a best practice to determine the equality of two arbitrary python objects?
Let's say I write a container for some sort of object and I need to figure out whether new objects are equal to the old ones stored into the container.
The problem is I cannot use "is" since this will only check if the variables are bound to the very same object (but we might have a deep copy of an object, which is in my sense equal to its original). I cannot use "==" either, since some of these objects return an element-wise equal, like numpy arrays.
Is there a best practice to determine the equality of any kind of objects?
For instance would
repr(objectA)==repr(objectB)
suffice?
Or is it common to use:
numpy.all(objectA==objectB)
Which probably fails if objectA == objectB evaluates to "[]"
Cheers,
Robert
EDIT:
Ok, regarding the 3rd comment, I elaborate more on
"What's your definition of "equal objects"?"
In the strong sense I don't have any definition of equality, I rather let the objects decide whether they are equal or not. The problem is, as far as I understand, there is no well agreed standard for eq or ==,respectively. The statement can return arrays or all kinds of things.
What I have in mind is to have some operator lets call it SEQ (strong equality) in between eq and "is".
SEQ is superior to eq in the sense that it will always evaluate to a single boolean value (for numpy arrays that could mean all elements are equal, for example) and determine if the objects consider themselves equal or not. But SEQ would be inferior to "is" in the sense that objects that are distinct in memory can be equal as well.
I suggest you write a custom recursive equality-checker, something like this:
from collections import Sequence, Mapping, Set
import numpy as np
def nested_equal(a, b):
"""
Compare two objects recursively by element, handling numpy objects.
Assumes hashable items are not mutable in a way that affects equality.
"""
# Use __class__ instead of type() to be compatible with instances of
# old-style classes.
if a.__class__ != b.__class__:
return False
# for types that implement their own custom strict equality checking
seq = getattr(a, "seq", None)
if seq and callable(seq):
return seq(b)
# Check equality according to type type [sic].
if isinstance(a, basestring):
return a == b
if isinstance(a, np.ndarray):
return np.all(a == b)
if isinstance(a, Sequence):
return all(nested_equal(x, y) for x, y in zip(a, b))
if isinstance(a, Mapping):
if set(a.keys()) != set(b.keys()):
return False
return all(nested_equal(a[k], b[k]) for k in a.keys())
if isinstance(a, Set):
return a == b
return a == b
The assumption that hashable objects are not mutable in a way that affects equality is rather safe, since it would break dicts and sets if such objects were used as keys.

Is there any difference between "foo is None" and "foo == None"?

Is there any difference between:
if foo is None: pass
and
if foo == None: pass
The convention that I've seen in most Python code (and the code I myself write) is the former, but I recently came across code which uses the latter. None is an instance (and the only instance, IIRC) of NoneType, so it shouldn't matter, right? Are there any circumstances in which it might?
is always returns True if it compares the same object instance
Whereas == is ultimately determined by the __eq__() method
i.e.
>>> class Foo(object):
def __eq__(self, other):
return True
>>> f = Foo()
>>> f == None
True
>>> f is None
False
You may want to read this object identity and equivalence.
The statement 'is' is used for object identity, it checks if objects refer to the same instance (same address in memory).
And the '==' statement refers to equality (same value).
A word of caution:
if foo:
# do something
Is not exactly the same as:
if x is not None:
# do something
The former is a boolean value test and can evaluate to false in different contexts. There are a number of things that represent false in a boolean value tests for example empty containers, boolean values. None also evaluates to false in this situation but other things do too.
(ob1 is ob2) equal to (id(ob1) == id(ob2))
The reason foo is None is the preferred way is that you might be handling an object that defines its own __eq__, and that defines the object to be equal to None. So, always use foo is None if you need to see if it is infact None.
There is no difference because objects which are identical will of course be equal. However, PEP 8 clearly states you should use is:
Comparisons to singletons like None should always be done with is or is not, never the equality operators.
is tests for identity, not equality. For your statement foo is none, Python simply compares the memory address of objects. It means you are asking the question "Do I have two names for the same object?"
== on the other hand tests for equality as determined by the __eq__() method. It doesn't cares about identity.
In [102]: x, y, z = 2, 2, 2.0
In [103]: id(x), id(y), id(z)
Out[103]: (38641984, 38641984, 48420880)
In [104]: x is y
Out[104]: True
In [105]: x == y
Out[105]: True
In [106]: x is z
Out[106]: False
In [107]: x == z
Out[107]: True
None is a singleton operator. So None is None is always true.
In [101]: None is None
Out[101]: True
For None there shouldn't be a difference between equality (==) and identity (is). The NoneType probably returns identity for equality. Since None is the only instance you can make of NoneType (I think this is true), the two operations are the same. In the case of other types this is not always the case. For example:
list1 = [1, 2, 3]
list2 = [1, 2, 3]
if list1==list2: print "Equal"
if list1 is list2: print "Same"
This would print "Equal" since lists have a comparison operation that is not the default returning of identity.
#Jason:
I recommend using something more along the lines of
if foo:
#foo isn't None
else:
#foo is None
I don't like using "if foo:" unless foo truly represents a boolean value (i.e. 0 or 1). If foo is a string or an object or something else, "if foo:" may work, but it looks like a lazy shortcut to me. If you're checking to see if x is None, say "if x is None:".
Some more details:
The is clause actually checks if the two objects are at the same
memory location or not. i.e whether they both point to the same
memory location and have the same id.
As a consequence of 1, is ensures whether, or not, the two lexically represented objects have identical attributes (attributes-of-attributes...) or not
Instantiation of primitive types like bool, int, string(with some exception), NoneType having a same value will always be in the same memory location.
E.g.
>>> int(1) is int(1)
True
>>> str("abcd") is str("abcd")
True
>>> bool(1) is bool(2)
True
>>> bool(0) is bool(0)
True
>>> bool(0)
False
>>> bool(1)
True
And since NoneType can only have one instance of itself in the python's "look-up" table therefore the former and the latter are more of a programming style of the developer who wrote the code(maybe for consistency) rather then having any subtle logical reason to choose one over the other.
John Machin's conclusion that None is a singleton is a conclusion bolstered by this code.
>>> x = None
>>> y = None
>>> x == y
True
>>> x is y
True
>>>
Since None is a singleton, x == None and x is None would have the same result. However, in my aesthetical opinion, x == None is best.
a is b # returns true if they a and b are true alias
a == b # returns true if they are true alias or they have values that are deemed equivalence
a = [1,3,4]
b = a[:] #creating copy of list
a is b # if gives false
False
a == b # gives true
True

Categories

Resources