In Python 3, it is standard procedure to make a class an iterable and iterator at the same time by defining both the __iter__ and __next__ methods. But I have problems to wrap my head around this. Take this example which creates an iterator that produces only even numbers:
class EvenNumbers:
def __init__(self, max_):
self.max_ = max_
def __iter__(self):
self.n = 0
return self
def __next__(self):
if self.n <= self.max_: # edit: self.max --> self.max_
result = 2 * self.n
self.n += 1
return result
raise StopIteration
instance = EvenNumbers(4)
for entry in instance:
print(entry)
To my knowledge (correct me if I'm wrong), when I create the loop, an iterator is created by calling something like itr = iter(instance) which internally calls the __iter__ method. This is expected to return an iterator object (which the instance is due to defining __next__ and therefore I can just return self). To get an element from it, next(itr) is called until the exception is raised.
My question here is now: if and how can __iter__ and __next__ be separated, so that the content of the latter function is defined somewhere else? And when could this be useful? I know that I have to change __iter__ so that it returns an iterator.
Btw the idea to do this comes from this site (LINK), which does not state how to implement this.
It sounds like you're confusing iterators and iterables. Iterables have an __iter__ method which returns an iterator. Iterators have a __next__ method which returns either their next value or raise a StopIteration. Now in python, it is stated that iterators are also iterables (but not visa versa) and that iter(iterator) is iterator so an iterator, itr, should return only itself from it's __iter__ method.
Iterators are required to have an __iter__() method that returns the iterator object itself so every iterator is also iterable and may be used in most places where other iterables are accepted
In code:
class MyIter:
def __iter__(self):
return self
def __next__(self):
# actual iterator logic
If you want to make a custom iterator class, the easiest way is to inherit from collections.abc.Iterator which you can see defines __iter__ as above (it is also a subclass of collections.abc.Iterable). Then all you need is
class MyIter(collections.abc.Iterator):
def __next__(self):
...
There is of course a much easier way to make an iterator, and thats with a generator function
def fib():
a = 1
b = 1
yield a
yield b
while True:
b, a = a + b, b
yield b
list(itertools.takewhile(lambda x: x < 100, fib()))
# --> [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
Just for reference, this is (simplified) code for an abstract iterator and iterable
from abc import ABC, abstractmethod
class Iterable(ABC):
#abstractmethod
def __iter__(self):
'Returns an instance of Iterator'
pass
class Iterator(Iterable, ABC):
#abstractmethod
def __next__(self):
'Return the next item from the iterator. When exhausted, raise StopIteration'
pass
# overrides Iterable.__iter__
def __iter__(self):
return self
I think I have grasped the concept now, even if I do not fully understand the passage from the documentation by #FHTMitchell. I came across an example on how to separate the two methods and wanted to document this.
What I found is a very basic tutorial that clearly distinguishes between the iterable and the iterator (which is the cause of my confusion).
Basically, you define your iterable first as a separate class:
class EvenNumbers:
def __init__(self, max_):
self.max = max_
def __iter__(self):
self.n = 0
return EvenNumbersIterator(self)
The __iter__ method only requires an object that has a __next__ method defined. Therefore, you can do this:
class EvenNumbersIterator:
def __init__(self, source):
self.source = source
def __next__(self):
if self.source.n <= self.source.max:
result = 2 * self.source.n
self.source.n += 1
return result
else:
raise StopIteration
This separates the iterator part from the iterable class. It now makes sense that if I define __next__ within the iterable class, I have to return the reference to the instance itself as it basically does 2 jobs at once.
Related
I am learning python these days and have a small question as follows:
Define a class drop_first that has only two methods init and iter
The constructor of drop_first has parameter iterable, to which an iterable is given.
Method init applies function iter to iterable and records the resulting iterator in some attribute. It then calls function next on the iterator once.
Method iter simply returns the iterator recorded in the attribute.
For example, the instance created by drop_first([1,2,3]) allows iteration skipping the first element 1.
for x in drop_first([1,2,3]): print(x)
2 and 3 will be printed.
I answered with the following codes:
class drop_first:
def __init__(self, iterable):
self.iterable = iterable
self.index = 1
def __iter__(self):
return self
def __next__(self):
try:
result = self.iterable[self.index]
except IndexError:
raise StopIteration
self.index += 1
return result
Nevertheless, I found that it was required to create a class with only 2 methods inside, and am a little bit confused about the specifications of those two methods. So I am wondering if anyone could give me some explainations about the requirement of the class...Thank you.
Define a class drop_first that has only two methods init and iter
Bad instruction. This class should be called DropFirst. But be it.
The constructor of drop_first has parameter iterable, to which an iterable is given.
def __init__(self, iterable):
Method init applies function iter to iterable and records the resulting iterator in some attribute. It then calls function next on the iterator once.
self.iterator = iter(iterable)
next(self.iterator)
Method iter simply returns the iterator recorded in the attribute.
def __iter__(self):
return self.iterator
This results in
class DropFirst: # or, if your instructor insists on it, drop_first
def __init__(self, iterable):
self.iterator = iter(iterable)
next(self.iterator)
def __iter__(self):
return self.iterator
But, to be honest, I'd take the following approach (which breaks the specifications, but is more versatile):
class DropFirst:
def __init__(self, iterable):
self.iterable = iterable
def __iter__(self):
iterator = iter(self.iterable)
next(iterator)
return iterator
This works in all cases the original one also does, but can be iterated over several times.
For the iterator protocol you create both an __iter__ and __next__ method. However, what about the following case:
class Item:
def __init__(self):
self.name = 'James'
def __iter__(self):
return self
Now I can do:
>>> i=Item()
>>> iter(i)
<__main__.Item instance at 0x10bfe6e18>
But not:
>>> next(i)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: instance has no next() method
As far as I'm aware, the definition of iterator/iterable is:
Iterable has the method __iter__
Iterator has the method __next__
Would this then mean that my item above is an Iterable but not an Iterator? Or would it be neither because doing the following wouldn't work:
>>> for item in i:
... print (item)
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: instance has no next() method
Note this would be the full class that has the iterator methods defined:
class Item:
def __init__(self):
self.name = 'James'
self.i = 0
def __iter__(self):
return self
def __next__(self):
if self.i >= len(self.name): raise StopIteration
value = self.name[self.i]
self.i += 1
return value
You're mostly right with your definitions, but not quite.
An object is iterable if it has an __iter__ method that returns an iterator.
An object is an iterator if it has a __next__ method to get the next value while iterating. But iterators in Python are also expected to be iterable. They should all have an __iter__ method that returns self.
Your first example has an __iter__ method, but because it returns self and the object is not an iterator (since it has no __next__ method), it's not really a valid iterable either.
To make a non-iterator iterable, you need to return some other object that is a valid iterator. One sneaky way to do it is to make __iter__ a generator method (by using yield in its implementation). But if you have some sequence of values to return, you could also just return an iterator over that sequence.
The class in your last code block is indeed an iterator. But if you wanted to make it an iterable that is not its own iterator (perhaps because you want to be able to iterate over it several times), you would probably want something more like this:
class Item:
def __init__(self):
self.name = 'James'
def __iter__(self):
return iter(self.name)
__iter__ and __next__, as iterable and iterator, are different things. And although it is possible to have both methods on the same class, with __iter__ returning self, this would work only for proof of concepts, not for production code.
An iterable will have an __iter__ method that returns an object that has __next__. If both are the same instances as in
this is just a demo, with faulty code
class Item:
def __init__(self, data):
self.data = data
def __iter__(self):
self.counter = 0
return self
def __next__(self):
self.counter += 1
if self.counter > len(self.data):
raise StopIteration()
return self.data[self.counter - 1]
This will work - but if you try to create two independent iterators on the same instance of Item, they won't work as desired - since both would share the same counter - the attribute counter in the instance.
It is rare however that one needs to implement __next__: if __iter__ is writen as a generator function, having a yield instead of returning self, that will just work. Python will call __next__ on the generator created automatically with each call to __iter__:
this works
class Item:
def __init__(self, data):
self.data = data
def __iter__(self):
for item in self.data:
yield item
As you can see, the correct way is "nextless" and much simpler, and can also be implemented, in this case, by returning an independent iterator for the data. (The yield implementation is needed if getting to each item requires some custom computation)
this also works
class Item:
def __init__(self, data):
self.data = data
def __iter__(self):
return iter(self.data)
(on this case, Python will call __next__ on the iterator created for self.data)
If you really want to implement __next__, the object with that method have to keep track of any counter or pointers needed to retrieve the next items, and that must be independent of the host instance. The most straightforward way to do that is to have a second class, related to your first one, and have __iter__ return an instance of that instead:
working "full" example
class Item:
def __init__(self, data):
self.data = data
def __iter__(self):
return ItemIterator(self)
class ItemIterator:
def __init__(self, item):
self.item = item
self.counter = 0
def __next__(self):
self.counter += 1
if self.counter > len(self.item.data):
raise StopIteration()
return self.item.data[self.counter - 1]
Would this then mean that my item above is an Iterable but not an Iterator?
No; to be iterable, the __iter__ method should return an iterator. Yours doesn't.
According to the glossary in the official Python docs:
Iterable
An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an __iter__() method or with a __getitem__() method that implements Sequence semantics.
Iterables can be used in a for loop and in many other places where a sequence is needed (zip(), map(), …).
Since your item is not "capable of returning its members one at a time", and cannot "be used in a for loop [or] other places where a sequence is needed", it is not iterable.
Note also that the __next__ method is not sufficient to be an iterator; an iterator must also have an __iter__ method which returns itself:
Iterator objects also need to implement this method; they are required to return themselves.
So I get generator functions for lazy evaluation and generator expressions, aka generator comprehensions as its syntactic sugar equivalent.
I understand classes like
class Itertest1:
def __init__(self):
self.count = 0
self.max_repeats = 100
def __iter__(self):
print("in __inter__()")
return self
def __next__(self):
if self.count >= self.max_repeats:
raise StopIteration
self.count += 1
print(self.count)
return self.count
as a way of implementing the iterator interface, i.e. iter() and next() in one and the same class.
But what then is
class Itertest2:
def __init__(self):
self.data = list(range(100))
def __iter__(self):
print("in __inter__()")
for i, dp in enumerate(self.data):
print("idx:", i)
yield dp
which uses the yield statement within the iter member function?
Also I noticed that upon calling the iter member function
it = Itertest2().__iter__()
batch = it.__next__()
the print statement is only executed when calling next() for the first time.
Is this due to this weird mixture of yield and iter? I think this is quite counter intuitive...
Something equivalent to Itertest2 could be written using a separate iterator class.
class Itertest3:
def __init__(self):
self.data = list(range(100))
def __iter__(self):
return Itertest3Iterator(self.data)
class Itertest3Iterator:
def __init__(self, data):
self.data = enumerate(data)
def __iter__(self):
return self
def __next__(self):
print("in __inter__()")
i, dp = next(self.state) # Let StopIteration exception propagate
print("idx:", i)
return dp
Compare this to Itertest1, where the instance of Itertest1 itself carried the state of the iteration around in it. Each call to Itertest1.__iter__ returned the same object (the instance of Itertest1), so they couldn't iterate over the data independently.
Notice I put print("in __iter__()") in __next__, not __iter__. As you observed, nothing in a generator function actually executes until the first call to __next__. The generator function itself only creates an generator; it does not actually start executing the code in it.
Having the yield statement anywhere in any function wraps the function code in a (native) generator object, and replaces the function with a stub that gives you said generator object.
So, here, calling __iter__ will give you an anonymous generator object that executes the code you want.
The main use case for __next__ is to provide a way to write an iterator without relying on (native) generators.
The use case of __iter__ is to distinguish between an object and an iteration state over said object. Consider code like
c = some_iterable()
for a in c:
for b in c:
# do something with a and b
You would not want the two interleaved iterations to interfere with each other's state. This is why such a loop would desugar to something like
c = some_iterable()
_iter1 = iter(c)
try:
while True:
a = next(_iter1)
_iter2 = iter(c)
try:
while True:
b = next(_iter2)
# do something with a and b
except StopIteration:
pass
except StopIteration:
pass
Typically, custom iterators implement a stub __iter__ that returns self, so that iter(iter(x)) is equivalent to iter(x). This is important when writing iterator wrappers.
I'm trying to figure out how to make iterator, below is an iterator that works fine.
class DoubleIt:
def __init__(self):
self.start = 1
def __iter__(self):
self.max = 10
return self
def __next__(self):
if self.start < self.max:
self.start *= 2
return self.start
else:
raise StopIteration
obj = DoubleIt()
i = iter(obj)
print(next(i))
However, when I try to pass 16 into the second argument in iter() (I expect the iterator will stop when return 16)
i = iter(DoubleIt(), 16)
print(next(i))
It throws TypeError: iter(v, w): v must be callable
Therefore, I try to do so.
i = iter(DoubleIt, 16)
print(next(i))
It returns <main.DoubleIt object at 0x7f4dcd4459e8>. Which is not I expected.
I checked the website of programiz, https://www.programiz.com/python-programming/methods/built-in/iter
Which said that callable object must be passed in the first argument so as to use the second argument, but it doesn't mention can User defined object be passed in it in order to use the second argument.
So my question is, is there a way to do so? Can the second argument be used with the "Self defined Object"?
The documentation could be a bit clearer on this, it only states
iter(object[, sentinel])
...
The iterator created in this case will call object with no arguments
for each call to its __next__() method; if the value returned is equal to sentinel, StopIteration will be raised, otherwise the value will be returned.
What is maybe not said perfectly clearly is that what the iterator yields is whatever the callable returns. And since your callable is a class (with no arguments), it returns a new instance of the class every iteration.
One way around this is to make your class callable and delegate it to the __next__ method:
class DoubleIt:
def __init__(self):
self.start = 1
def __iter__(self):
return self
def __next__(self):
self.start *= 2
return self.start
__call__ = __next__
i = iter(DoubleIt(), 16)
print(next(i))
# 2
print(list(i))
# [4, 8]
This has the dis-/advantage that it is an infinite generator that is only stopped by the sentinel value of iter.
Another way is to make the maximum an argument of the class:
class DoubleIt:
def __init__(self, max=10):
self.start = 1
self.max = max
def __iter__(self):
return self
def __next__(self):
if self.start < self.max:
self.start *= 2
return self.start
else:
raise StopIteration
i = iter(DoubleIt(max=16))
print(next(i))
# 2
print(list(i))
# [4, 8, 16]
One difference to note is that iter stops when it encounters the sentinel value (and does not yield the item), whereas this second way uses <, instead of <= comparison (like your code) and will thus yield the maximum item.
Here's an example of a doubler routine that would work with the two argument mode of iter:
count = 1
def nextcount():
global count
count *= 2
return count
print(list(iter(nextcount, 16)))
# Produces [2, 4, 8]
This mode involves iter creating the iterator for us. Note that we need to reset count before it can work again; it only works given a callable (such as a function or bound method) that has side effects (changing the counter), and the iterator will only stop upon encountering exactly the sentinel value.
Your DoubleIt class provided no particular protocol for setting a max value, and iter doesn't expect or use any such protocol either. The alternate mode of iter creates an iterator from a callable and a sentinel value, quite independent of the iterable or iterator protocols.
The behaviour you expected is more akin to what itertools.takewhile or itertools.islice do, manipulating one iterator to create another.
Another way to make an iterable object is to implement the sequence protocol:
class DoubleSeq:
def __init__(self, steps):
self.steps = steps
def __len__(self):
return self.steps
def __getitem__(self, iteration):
if iteration >= self.steps:
raise IndexError()
return 2**iteration
print(list(iter(DoubleSeq(4))))
# Produces [1, 2, 4, 8]
Note that DoubleSeq isn't an iterator at all; iter created one for us using the sequence protocol. DoubleSeq doesn't hold the iteration counter, the iterator does.
I've learnt from the official documentation of python 2.7.8 how to work with iterators and generators. I've got a question related on a curiosity.
it = iter("abcde")
print it
>>> <iterator object at 0x7ff4c2b3bad0>
class example1():
def __init__(self, word):
self.word = word
self.index = len(word)
def __iter__(self):
for x in range(self.index - 1, -1, -1):
yield self.word[x]
a = example1("altalena")
print iter(a)
>>> <generator object __iter__ at 0x7f24712000a0>
In the above examples, when I print the iterators, i read "generator","iterator" object and the hexadecimal ID. Why I can't do the same with the following code?
class example2():
def __init__(self, word):
self.word = word
self.index = len(word)
def __iter__(self):
return self
def next (self):
if self.index == 0:
raise StopIteration
self.index = self.index - 1
return self.word[self.index]
a = example2()
print iter(a)
>>> <__main__.example2 instance at 0x7f89ee2de440>
I think it is caused by "return self" in iter, that leads to the class instance, but i don't know the solution to get a more right output. It may be useless but I don't know why it is an how to avoid it.
Generators are a type of iterator. Your example1 class is returning a generator because you used yield in the __iter__; it is a separate iterator object, returned from __iter__. This makes your example1 class iterable, not an iterator itself.
In your second example your class __iter__ returns self. It is not just iterable, it is its own iterator. Because it is returning self, there is no separate object here.
Python makes an explicit distinction between iterable and iterator. An iterable can potentially be iterated over. An iterator does the actual work when iterating. By using iter() you ask Python to produce an iterator for a given iterable.
This is why iter(stringobject) returns a new object; you have produced an iterator from the iterable string.
You need this distinction because the process of iterating requires something that keeps state; namely where in the process are we now. An iterator is the object that keeps track of that, so that each time you call the .next() method on the iterator, you get the next value, or StopIteration is raised if there is no next value to produce anymore.
So in your example, the string and example1 are both just iterable, and calling iter() on them produced a new separate iterator.
However, your example2 is it's own iterator. Calling iter() on it does not produce a separate object. You cannot create separate iterators from something that already is an iterator.
If you want to produce a new independent iterator for your example2 class, you need to return a distinct, separate object from __iter__:
class ExampleIterator(object):
def __init__(self, source):
self.source = source
self.position = len(source)
def __iter__(self):
return self
def next(self):
if self.position <= 0:
raise StopIteration
self.position -= 1
return self.source[self.position]
class Example2():
def __init__(self, word):
self.word = word
def __iter__(self):
return ExampleIterator(self)
Here Example2 is just a iterable again, not an iterator. The __iter__ method returns a new, distinct ExampleIterator() instance, and it is responsible for keeping track of the iteration state.
I think you don't have a problem, you just don't understand what happens here.
In general, iter(object) returns an iterator for this iterable object.
This is obtained either via calling __iter__() or, if that doesn't exist, by providing a wrapper object which calls __getitem__() until it is exhausted (raises IndexError).
The object returned by __iter__() can either be the iterable itself (like in your 2nd example) or something else. Especially, it can be a generator object if you make __iter__() a generator function as you do in your 1st example.
The iteration itself happens with the iterator object and its next() resp. __next__() method.