Related
What are "iterable", "iterator", and "iteration" in Python? How are they defined?
Iteration is a general term for taking each item of something, one after another. Any time you use a loop, explicit or implicit, to go over a group of items, that is iteration.
In Python, iterable and iterator have specific meanings.
An iterable is an object that has an __iter__ method which returns an iterator, or which defines a __getitem__ method that can take sequential indexes starting from zero (and raises an IndexError when the indexes are no longer valid). So an iterable is an object that you can get an iterator from.
An iterator is an object with a next (Python 2) or __next__ (Python 3) method.
Whenever you use a for loop, or map, or a list comprehension, etc. in Python, the next method is called automatically to get each item from the iterator, thus going through the process of iteration.
A good place to start learning would be the iterators section of the tutorial and the iterator types section of the standard types page. After you understand the basics, try the iterators section of the Functional Programming HOWTO.
Here's the explanation I use in teaching Python classes:
An ITERABLE is:
anything that can be looped over (i.e. you can loop over a string or file) or
anything that can appear on the right-side of a for-loop: for x in iterable: ... or
anything you can call with iter() that will return an ITERATOR: iter(obj) or
an object that defines __iter__ that returns a fresh ITERATOR,
or it may have a __getitem__ method suitable for indexed lookup.
An ITERATOR is an object:
with state that remembers where it is during iteration,
with a __next__ method that:
returns the next value in the iteration
updates the state to point at the next value
signals when it is done by raising StopIteration
and that is self-iterable (meaning that it has an __iter__ method that returns self).
Notes:
The __next__ method in Python 3 is spelt next in Python 2, and
The builtin function next() calls that method on the object passed to it.
For example:
>>> s = 'cat' # s is an ITERABLE
# s is a str object that is immutable
# s has no state
# s has a __getitem__() method
>>> t = iter(s) # t is an ITERATOR
# t has state (it starts by pointing at the "c"
# t has a next() method and an __iter__() method
>>> next(t) # the next() function returns the next value and advances the state
'c'
>>> next(t) # the next() function returns the next value and advances
'a'
>>> next(t) # the next() function returns the next value and advances
't'
>>> next(t) # next() raises StopIteration to signal that iteration is complete
Traceback (most recent call last):
...
StopIteration
>>> iter(t) is t # the iterator is self-iterable
The above answers are great, but as most of what I've seen, don't stress the distinction enough for people like me.
Also, people tend to get "too Pythonic" by putting definitions like "X is an object that has __foo__() method" before. Such definitions are correct--they are based on duck-typing philosophy, but the focus on methods tends to get between when trying to understand the concept in its simplicity.
So I add my version.
In natural language,
iteration is the process of taking one element at a time in a row of elements.
In Python,
iterable is an object that is, well, iterable, which simply put, means that
it can be used in iteration, e.g. with a for loop. How? By using iterator.
I'll explain below.
... while iterator is an object that defines how to actually do the
iteration--specifically what is the next element. That's why it must have
next() method.
Iterators are themselves also iterable, with the distinction that their __iter__() method returns the same object (self), regardless of whether or not its items have been consumed by previous calls to next().
So what does Python interpreter think when it sees for x in obj: statement?
Look, a for loop. Looks like a job for an iterator... Let's get one. ...
There's this obj guy, so let's ask him.
"Mr. obj, do you have your iterator?" (... calls iter(obj), which calls
obj.__iter__(), which happily hands out a shiny new iterator _i.)
OK, that was easy... Let's start iterating then. (x = _i.next() ... x = _i.next()...)
Since Mr. obj succeeded in this test (by having certain method returning a valid iterator), we reward him with adjective: you can now call him "iterable Mr. obj".
However, in simple cases, you don't normally benefit from having iterator and iterable separately. So you define only one object, which is also its own iterator. (Python does not really care that _i handed out by obj wasn't all that shiny, but just the obj itself.)
This is why in most examples I've seen (and what had been confusing me over and over),
you can see:
class IterableExample(object):
def __iter__(self):
return self
def next(self):
pass
instead of
class Iterator(object):
def next(self):
pass
class Iterable(object):
def __iter__(self):
return Iterator()
There are cases, though, when you can benefit from having iterator separated from the iterable, such as when you want to have one row of items, but more "cursors". For example when you want to work with "current" and "forthcoming" elements, you can have separate iterators for both. Or multiple threads pulling from a huge list: each can have its own iterator to traverse over all items. See #Raymond's and #glglgl's answers above.
Imagine what you could do:
class SmartIterableExample(object):
def create_iterator(self):
# An amazingly powerful yet simple way to create arbitrary
# iterator, utilizing object state (or not, if you are fan
# of functional), magic and nuclear waste--no kittens hurt.
pass # don't forget to add the next() method
def __iter__(self):
return self.create_iterator()
Notes:
I'll repeat again: iterator is not iterable. Iterator cannot be used as
a "source" in for loop. What for loop primarily needs is __iter__()
(that returns something with next()).
Of course, for is not the only iteration loop, so above applies to some other
constructs as well (while...).
Iterator's next() can throw StopIteration to stop iteration. Does not have to,
though, it can iterate forever or use other means.
In the above "thought process", _i does not really exist. I've made up that name.
There's a small change in Python 3.x: next() method (not the built-in) now
must be called __next__(). Yes, it should have been like that all along.
You can also think of it like this: iterable has the data, iterator pulls the next
item
Disclaimer: I'm not a developer of any Python interpreter, so I don't really know what the interpreter "thinks". The musings above are solely demonstration of how I understand the topic from other explanations, experiments and real-life experience of a Python newbie.
An iterable is a object which has a __iter__() method. It can possibly iterated over several times, such as list()s and tuple()s.
An iterator is the object which iterates. It is returned by an __iter__() method, returns itself via its own __iter__() method and has a next() method (__next__() in 3.x).
Iteration is the process of calling this next() resp. __next__() until it raises StopIteration.
Example:
>>> a = [1, 2, 3] # iterable
>>> b1 = iter(a) # iterator 1
>>> b2 = iter(a) # iterator 2, independent of b1
>>> next(b1)
1
>>> next(b1)
2
>>> next(b2) # start over, as it is the first call to b2
1
>>> next(b1)
3
>>> next(b1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> b1 = iter(a) # new one, start over
>>> next(b1)
1
Here's my cheat sheet:
sequence
+
|
v
def __getitem__(self, index: int):
+ ...
| raise IndexError
|
|
| def __iter__(self):
| + ...
| | return <iterator>
| |
| |
+--> or <-----+ def __next__(self):
+ | + ...
| | | raise StopIteration
v | |
iterable | |
+ | |
| | v
| +----> and +-------> iterator
| ^
v |
iter(<iterable>) +----------------------+
|
def generator(): |
+ yield 1 |
| generator_expression +-+
| |
+-> generator() +-> generator_iterator +-+
Quiz: Do you see how...
every iterator is an iterable?
a container object's __iter__() method can be implemented as a generator?
an iterable that has a __next__ method is not necessarily an iterator?
Answers:
Every iterator must have an __iter__ method. Having __iter__ is enough to be an iterable. Therefore every iterator is an iterable.
When __iter__ is called it should return an iterator (return <iterator> in the diagram above). Calling a generator returns a generator iterator which is a type of iterator.
class Iterable1:
def __iter__(self):
# a method (which is a function defined inside a class body)
# calling iter() converts iterable (tuple) to iterator
return iter((1,2,3))
class Iterable2:
def __iter__(self):
# a generator
for i in (1, 2, 3):
yield i
class Iterable3:
def __iter__(self):
# with PEP 380 syntax
yield from (1, 2, 3)
# passes
assert list(Iterable1()) == list(Iterable2()) == list(Iterable3()) == [1, 2, 3]
Here is an example:
class MyIterable:
def __init__(self):
self.n = 0
def __getitem__(self, index: int):
return (1, 2, 3)[index]
def __next__(self):
n = self.n = self.n + 1
if n > 3:
raise StopIteration
return n
# if you can iter it without raising a TypeError, then it's an iterable.
iter(MyIterable())
# but obviously `MyIterable()` is not an iterator since it does not have
# an `__iter__` method.
from collections.abc import Iterator
assert isinstance(MyIterable(), Iterator) # AssertionError
I don’t know if it helps anybody but I always like to visualize concepts in my head to better understand them. So as I have a little son I visualize iterable/iterator concept with bricks and white paper.
Suppose we are in the dark room and on the floor we have bricks for my son. Bricks of different size, color, does not matter now. Suppose we have 5 bricks like those. Those 5 bricks can be described as an object – let’s say bricks kit. We can do many things with this bricks kit – can take one and then take second and then third, can change places of bricks, put first brick above the second. We can do many sorts of things with those. Therefore this bricks kit is an iterable object or sequence as we can go through each brick and do something with it. We can only do it like my little son – we can play with one brick at a time. So again I imagine myself this bricks kit to be an iterable.
Now remember that we are in the dark room. Or almost dark. The thing is that we don’t clearly see those bricks, what color they are, what shape etc. So even if we want to do something with them – aka iterate through them – we don’t really know what and how because it is too dark.
What we can do is near to first brick – as element of a bricks kit – we can put a piece of white fluorescent paper in order for us to see where the first brick-element is. And each time we take a brick from a kit, we replace the white piece of paper to a next brick in order to be able to see that in the dark room. This white piece of paper is nothing more than an iterator. It is an object as well. But an object with what we can work and play with elements of our iterable object – bricks kit.
That by the way explains my early mistake when I tried the following in an IDLE and got a TypeError:
>>> X = [1,2,3,4,5]
>>> next(X)
Traceback (most recent call last):
File "<pyshell#19>", line 1, in <module>
next(X)
TypeError: 'list' object is not an iterator
List X here was our bricks kit but NOT a white piece of paper. I needed to find an iterator first:
>>> X = [1,2,3,4,5]
>>> bricks_kit = [1,2,3,4,5]
>>> white_piece_of_paper = iter(bricks_kit)
>>> next(white_piece_of_paper)
1
>>> next(white_piece_of_paper)
2
>>>
Don’t know if it helps, but it helped me. If someone could confirm/correct visualization of the concept, I would be grateful. It would help me to learn more.
I don't think that you can get it much simpler than the documentation, however I'll try:
Iterable is something that can be iterated over. In practice it usually means a sequence e.g. something that has a beginning and an end and some way to go through all the items in it.
You can think Iterator as a helper pseudo-method (or pseudo-attribute) that gives (or holds) the next (or first) item in the iterable. (In practice it is just an object that defines the method next())
Iteration is probably best explained by the Merriam-Webster definition of the word :
b : the repetition of a sequence of computer instructions a specified
number of times or until a condition is met — compare recursion
Iterables have a __iter__ method that instantiates a new iterator every time.
Iterators implement a __next__ method that returns individual items, and a __iter__ method that returns self .
Therefore, iterators are also iterable, but iterables are not iterators.
Luciano Ramalho, Fluent Python.
Iterable:- something that is iterable is iterable; like sequences like lists ,strings etc.
Also it has either the __getitem__ method or an __iter__ method. Now if we use iter() function on that object, we'll get an iterator.
Iterator:- When we get the iterator object from the iter() function; we call __next__() method (in python3) or simply next() (in python2) to get elements one by one. This class or instance of this class is called an iterator.
From docs:-
The use of iterators pervades and unifies Python. Behind the scenes, the for statement calls iter() on the container object. The function returns an iterator object that defines the method __next__() which accesses elements in the container one at a time. When there are no more elements, __next__() raises a StopIteration exception which tells the for loop to terminate. You can call the __next__() method using the next() built-in function; this example shows how it all works:
>>> s = 'abc'
>>> it = iter(s)
>>> it
<iterator object at 0x00A1DB50>
>>> next(it)
'a'
>>> next(it)
'b'
>>> next(it)
'c'
>>> next(it)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
next(it)
StopIteration
Ex of a class:-
class Reverse:
"""Iterator for looping over a sequence backwards."""
def __init__(self, data):
self.data = data
self.index = len(data)
def __iter__(self):
return self
def __next__(self):
if self.index == 0:
raise StopIteration
self.index = self.index - 1
return self.data[self.index]
>>> rev = Reverse('spam')
>>> iter(rev)
<__main__.Reverse object at 0x00A1DB50>
>>> for char in rev:
... print(char)
...
m
a
p
s
Iterators are objects that implement the iter and next methods. If those methods are defined, we can use for loop or comprehensions.
class Squares:
def __init__(self, length):
self.length = length
self.i = 0
def __iter__(self):
print('calling __iter__') # this will be called first and only once
return self
def __next__(self):
print('calling __next__') # this will be called for each iteration
if self.i >= self.length:
raise StopIteration
else:
result = self.i ** 2
self.i += 1
return result
Iterators get exhausted. It means after you iterate over items, you cannot reiterate, you have to create a new object. Let's say you have a class, which holds the cities properties and you want to iterate over.
class Cities:
def __init__(self):
self._cities = ['Brooklyn', 'Manhattan', 'Prag', 'Madrid', 'London']
self._index = 0
def __iter__(self):
return self
def __next__(self):
if self._index >= len(self._cities):
raise StopIteration
else:
item = self._cities[self._index]
self._index += 1
return item
Instance of class Cities is an iterator. However if you want to reiterate over cities, you have to create a new object which is an expensive operation. You can separate the class into 2 classes: one returns cities and second returns an iterator which gets the cities as init param.
class Cities:
def __init__(self):
self._cities = ['New York', 'Newark', 'Istanbul', 'London']
def __len__(self):
return len(self._cities)
class CityIterator:
def __init__(self, city_obj):
# cities is an instance of Cities
self._city_obj = city_obj
self._index = 0
def __iter__(self):
return self
def __next__(self):
if self._index >= len(self._city_obj):
raise StopIteration
else:
item = self._city_obj._cities[self._index]
self._index += 1
return item
Now if we need to create a new iterator, we do not have to create the data again, which is cities. We creates cities object and pass it to the iterator. But we are still doing extra work. We could implement this by creating only one class.
Iterable is a Python object that implements the iterable protocol. It requires only __iter__() that returns a new instance of iterator object.
class Cities:
def __init__(self):
self._cities = ['New York', 'Newark', 'Istanbul', 'Paris']
def __len__(self):
return len(self._cities)
def __iter__(self):
return self.CityIterator(self)
class CityIterator:
def __init__(self, city_obj):
self._city_obj = city_obj
self._index = 0
def __iter__(self):
return self
def __next__(self):
if self._index >= len(self._city_obj):
raise StopIteration
else:
item = self._city_obj._cities[self._index]
self._index += 1
return item
Iterators has __iter__ and __next__, iterables have __iter__, so we can say Iterators are also iterables but they are iterables that get exhausted. Iterables on the other hand never become exhausted
because they always return a new iterator that is then used to iterate
You notice that the main part of the iterable code is in the iterator, and the iterable itself is nothing more than an extra layer that allows us to create and access the iterator.
Iterating over an iterable
Python has a built function iter() which calls the __iter__(). When we iterate over an iterable, Python calls the iter() which returns an iterator, then it starts using __next__() of iterator to iterate over the data.
NOte that in the above example, Cities creates an iterable but it is not a sequence type, it means we cannot get a city by an index. To fix this we should just add __get_item__ to the Cities class.
class Cities:
def __init__(self):
self._cities = ['New York', 'Newark', 'Budapest', 'Newcastle']
def __len__(self):
return len(self._cities)
def __getitem__(self, s): # now a sequence type
return self._cities[s]
def __iter__(self):
return self.CityIterator(self)
class CityIterator:
def __init__(self, city_obj):
self._city_obj = city_obj
self._index = 0
def __iter__(self):
return self
def __next__(self):
if self._index >= len(self._city_obj):
raise StopIteration
else:
item = self._city_obj._cities[self._index]
self._index += 1
return item
iterable = [1, 2]
iterator = iter(iterable)
print(iterator.__next__())
print(iterator.__next__())
so,
iterable is an object that can be looped over. e.g. list , string , tuple etc.
using the iter function on our iterable object will return an iterator object.
now this iterator object has method named __next__ (in Python 3, or just next in Python 2) by which you can access each element of iterable.
so,
OUTPUT OF ABOVE CODE WILL BE:
1
2
An iterable is an object that has an iter() method which returns an iterator. It is something that can be looped over.
Example : A list is iterable because we can loop over a list BUT is not an iterator
An iterator is an object that you can get an iterator from. It is an object with a state so that it remember where it is during iteration
To see if the object has this method iter() we can use the below function.
ls = ['hello','bye']
print(dir(ls))
Output
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']
As you can see has the iter() that's mean that is a iterable object, but doesn't contain the next() method which is a feature of the iterator object
Whenever you use a for loop or map or a list comprehension in Python the next method is called automatically to get each item from the iteration
Before dealing with the iterables and iterator the major factor that decide the iterable and iterator is sequence
Sequence: Sequence is the collection of data
Iterable: Iterable are the sequence type object that support __iter__ method.
Iter method: Iter method take sequence as an input and create an object which is known as iterator
Iterator: Iterator are the object which call next method and transverse through the sequence. On calling the next method it returns the object that it traversed currently.
example:
x=[1,2,3,4]
x is a sequence which consists of collection of data
y=iter(x)
On calling iter(x) it returns a iterator only when the x object has iter method otherwise it raise an exception.If it returns iterator then y is assign like this:
y=[1,2,3,4]
As y is a iterator hence it support next() method
On calling next method it returns the individual elements of the list one by one.
After returning the last element of the sequence if we again call the next method it raise an StopIteration error
example:
>>> y.next()
1
>>> y.next()
2
>>> y.next()
3
>>> y.next()
4
>>> y.next()
StopIteration
Other people already explained comprehensively, what is iterable and iterator, so I will try to do the same thing with generators.
IMHO the main problem for understanding generators is a confusing use of the word “generator”, because this word is used in 2 different meanings:
as a tool for creating (generating) iterators,
in the form of a function returning an iterator (i.e. with the yield statement(s) in its body),
in the form of a generator expression
as a result of the use of that tool, i.e. the resulting iterator.
(In this meaning a generator is a special form of an iterator — the word “generator” points out how this iterator was created.)
Generator as a tool of the 1st type:
In[2]: def my_generator():
...: yield 100
...: yield 200
In[3]: my_generator
Out[3]: <function __main__.my_generator()>
In[4]: type(my_generator)
Out[4]: function
Generator as a result (i.e. an iterator) of the use of this tool:
In[5]: my_iterator = my_generator()
In[6]: my_iterator
Out[6]: <generator object my_generator at 0x00000000053EAE48>
In[7]: type(my_iterator)
Out[7]: generator
Generator as a tool of the 2nd type — indistinguishable from the resulting iterator of this tool:
In[8]: my_gen_expression = (2 * i for i in (10, 20))
In[9]: my_gen_expression
Out[9]: <generator object <genexpr> at 0x000000000542C048>
In[10]: type(my_gen_expression)
Out[10]: generator
Here's another view using collections.abc. This view may be useful the second time around or later.
From collections.abc we can see the following hierarchy:
builtins.object
Iterable
Iterator
Generator
i.e. Generator is derived from Iterator is derived from Iterable is derived from the base object.
Hence,
Every iterator is an iterable, but not every iterable is an iterator. For example, [1, 2, 3] and range(10) are iterables, but not iterators. x = iter([1, 2, 3]) is an iterator and an iterable.
A similar relationship exists between Iterator and Generator.
Calling iter() on an iterator or a generator returns itself. Thus, if it is an iterator, then iter(it) is it is True.
Under the hood, a list comprehension like [2 * x for x in nums] or a for loop like for x in nums:, acts as though iter() is called on the iterable (nums) and then iterates over nums using that iterator. Hence, all of the following are functionally equivalent (with, say, nums=[1, 2, 3]):
for x in nums:
for x in iter(nums):
for x in iter(iter(nums)):
for x in iter(iter(iter(iter(iter(nums))))):
For me, Python's glossery was most helpful for these questions, e.g. for iterable it says:
An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an iter() method or with a getitem() method that implements Sequence semantics.
Iterables can be used in a for loop and in many other places where a sequence is needed (zip(), map(), …). When an iterable object is passed as an argument to the built-in function iter(), it returns an iterator for the object. This iterator is good for one pass over the set of values. When using iterables, it is usually not necessary to call iter() or deal with iterator objects yourself. The for statement does that automatically for you, creating a temporary unnamed variable to hold the iterator for the duration of the loop. See also iterator, sequence, and generator.
For e.g. iterator is created like this:
Iter1=iter(list).
Now this iterator object is created over list and next function will be called
filter() function returns an iterator but this iterator is created over which underlying structure to be iterated upon. The iterator is attached to which kind of data structure in case of filter() function that we used. To which structure iterator is attached to in case of filter() function?
filter is a type that implements the iterator protocol. To create an instance of filter, you give it a predicate (or None to indicate the constant predicate lambda x: True) and an iterable. An iterator for the iterable value is created and stored along with the predicate.
When iterating over the filter instance, it yields only those values from its internal iterator that match the predicate. A pure Python implementation might look like
class filter:
def __init__(self, p, itr):
self.itr = iter(itr)
self.p = p if p is not None else lambda x: True
def __iter__(self):
return self
def __next__(self):
while True:
x = next(self.itr)
if self.pred(x):
return x
The call to iter in filter.__init__ is how you can filter an iterable (like a list) that is not itself an iterator.
Note that while all filter instances wrap some other iterator, not all iterators wrap another data structure. For example, the following class also implements the iterator protocol to produce an infinite stream of 1s.
class Ones:
def __iter__(self):
return self
def __next__(self):
return 1
What are "iterable", "iterator", and "iteration" in Python? How are they defined?
Iteration is a general term for taking each item of something, one after another. Any time you use a loop, explicit or implicit, to go over a group of items, that is iteration.
In Python, iterable and iterator have specific meanings.
An iterable is an object that has an __iter__ method which returns an iterator, or which defines a __getitem__ method that can take sequential indexes starting from zero (and raises an IndexError when the indexes are no longer valid). So an iterable is an object that you can get an iterator from.
An iterator is an object with a next (Python 2) or __next__ (Python 3) method.
Whenever you use a for loop, or map, or a list comprehension, etc. in Python, the next method is called automatically to get each item from the iterator, thus going through the process of iteration.
A good place to start learning would be the iterators section of the tutorial and the iterator types section of the standard types page. After you understand the basics, try the iterators section of the Functional Programming HOWTO.
Here's the explanation I use in teaching Python classes:
An ITERABLE is:
anything that can be looped over (i.e. you can loop over a string or file) or
anything that can appear on the right-side of a for-loop: for x in iterable: ... or
anything you can call with iter() that will return an ITERATOR: iter(obj) or
an object that defines __iter__ that returns a fresh ITERATOR,
or it may have a __getitem__ method suitable for indexed lookup.
An ITERATOR is an object:
with state that remembers where it is during iteration,
with a __next__ method that:
returns the next value in the iteration
updates the state to point at the next value
signals when it is done by raising StopIteration
and that is self-iterable (meaning that it has an __iter__ method that returns self).
Notes:
The __next__ method in Python 3 is spelt next in Python 2, and
The builtin function next() calls that method on the object passed to it.
For example:
>>> s = 'cat' # s is an ITERABLE
# s is a str object that is immutable
# s has no state
# s has a __getitem__() method
>>> t = iter(s) # t is an ITERATOR
# t has state (it starts by pointing at the "c"
# t has a next() method and an __iter__() method
>>> next(t) # the next() function returns the next value and advances the state
'c'
>>> next(t) # the next() function returns the next value and advances
'a'
>>> next(t) # the next() function returns the next value and advances
't'
>>> next(t) # next() raises StopIteration to signal that iteration is complete
Traceback (most recent call last):
...
StopIteration
>>> iter(t) is t # the iterator is self-iterable
The above answers are great, but as most of what I've seen, don't stress the distinction enough for people like me.
Also, people tend to get "too Pythonic" by putting definitions like "X is an object that has __foo__() method" before. Such definitions are correct--they are based on duck-typing philosophy, but the focus on methods tends to get between when trying to understand the concept in its simplicity.
So I add my version.
In natural language,
iteration is the process of taking one element at a time in a row of elements.
In Python,
iterable is an object that is, well, iterable, which simply put, means that
it can be used in iteration, e.g. with a for loop. How? By using iterator.
I'll explain below.
... while iterator is an object that defines how to actually do the
iteration--specifically what is the next element. That's why it must have
next() method.
Iterators are themselves also iterable, with the distinction that their __iter__() method returns the same object (self), regardless of whether or not its items have been consumed by previous calls to next().
So what does Python interpreter think when it sees for x in obj: statement?
Look, a for loop. Looks like a job for an iterator... Let's get one. ...
There's this obj guy, so let's ask him.
"Mr. obj, do you have your iterator?" (... calls iter(obj), which calls
obj.__iter__(), which happily hands out a shiny new iterator _i.)
OK, that was easy... Let's start iterating then. (x = _i.next() ... x = _i.next()...)
Since Mr. obj succeeded in this test (by having certain method returning a valid iterator), we reward him with adjective: you can now call him "iterable Mr. obj".
However, in simple cases, you don't normally benefit from having iterator and iterable separately. So you define only one object, which is also its own iterator. (Python does not really care that _i handed out by obj wasn't all that shiny, but just the obj itself.)
This is why in most examples I've seen (and what had been confusing me over and over),
you can see:
class IterableExample(object):
def __iter__(self):
return self
def next(self):
pass
instead of
class Iterator(object):
def next(self):
pass
class Iterable(object):
def __iter__(self):
return Iterator()
There are cases, though, when you can benefit from having iterator separated from the iterable, such as when you want to have one row of items, but more "cursors". For example when you want to work with "current" and "forthcoming" elements, you can have separate iterators for both. Or multiple threads pulling from a huge list: each can have its own iterator to traverse over all items. See #Raymond's and #glglgl's answers above.
Imagine what you could do:
class SmartIterableExample(object):
def create_iterator(self):
# An amazingly powerful yet simple way to create arbitrary
# iterator, utilizing object state (or not, if you are fan
# of functional), magic and nuclear waste--no kittens hurt.
pass # don't forget to add the next() method
def __iter__(self):
return self.create_iterator()
Notes:
I'll repeat again: iterator is not iterable. Iterator cannot be used as
a "source" in for loop. What for loop primarily needs is __iter__()
(that returns something with next()).
Of course, for is not the only iteration loop, so above applies to some other
constructs as well (while...).
Iterator's next() can throw StopIteration to stop iteration. Does not have to,
though, it can iterate forever or use other means.
In the above "thought process", _i does not really exist. I've made up that name.
There's a small change in Python 3.x: next() method (not the built-in) now
must be called __next__(). Yes, it should have been like that all along.
You can also think of it like this: iterable has the data, iterator pulls the next
item
Disclaimer: I'm not a developer of any Python interpreter, so I don't really know what the interpreter "thinks". The musings above are solely demonstration of how I understand the topic from other explanations, experiments and real-life experience of a Python newbie.
An iterable is a object which has a __iter__() method. It can possibly iterated over several times, such as list()s and tuple()s.
An iterator is the object which iterates. It is returned by an __iter__() method, returns itself via its own __iter__() method and has a next() method (__next__() in 3.x).
Iteration is the process of calling this next() resp. __next__() until it raises StopIteration.
Example:
>>> a = [1, 2, 3] # iterable
>>> b1 = iter(a) # iterator 1
>>> b2 = iter(a) # iterator 2, independent of b1
>>> next(b1)
1
>>> next(b1)
2
>>> next(b2) # start over, as it is the first call to b2
1
>>> next(b1)
3
>>> next(b1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> b1 = iter(a) # new one, start over
>>> next(b1)
1
Here's my cheat sheet:
sequence
+
|
v
def __getitem__(self, index: int):
+ ...
| raise IndexError
|
|
| def __iter__(self):
| + ...
| | return <iterator>
| |
| |
+--> or <-----+ def __next__(self):
+ | + ...
| | | raise StopIteration
v | |
iterable | |
+ | |
| | v
| +----> and +-------> iterator
| ^
v |
iter(<iterable>) +----------------------+
|
def generator(): |
+ yield 1 |
| generator_expression +-+
| |
+-> generator() +-> generator_iterator +-+
Quiz: Do you see how...
every iterator is an iterable?
a container object's __iter__() method can be implemented as a generator?
an iterable that has a __next__ method is not necessarily an iterator?
Answers:
Every iterator must have an __iter__ method. Having __iter__ is enough to be an iterable. Therefore every iterator is an iterable.
When __iter__ is called it should return an iterator (return <iterator> in the diagram above). Calling a generator returns a generator iterator which is a type of iterator.
class Iterable1:
def __iter__(self):
# a method (which is a function defined inside a class body)
# calling iter() converts iterable (tuple) to iterator
return iter((1,2,3))
class Iterable2:
def __iter__(self):
# a generator
for i in (1, 2, 3):
yield i
class Iterable3:
def __iter__(self):
# with PEP 380 syntax
yield from (1, 2, 3)
# passes
assert list(Iterable1()) == list(Iterable2()) == list(Iterable3()) == [1, 2, 3]
Here is an example:
class MyIterable:
def __init__(self):
self.n = 0
def __getitem__(self, index: int):
return (1, 2, 3)[index]
def __next__(self):
n = self.n = self.n + 1
if n > 3:
raise StopIteration
return n
# if you can iter it without raising a TypeError, then it's an iterable.
iter(MyIterable())
# but obviously `MyIterable()` is not an iterator since it does not have
# an `__iter__` method.
from collections.abc import Iterator
assert isinstance(MyIterable(), Iterator) # AssertionError
I don’t know if it helps anybody but I always like to visualize concepts in my head to better understand them. So as I have a little son I visualize iterable/iterator concept with bricks and white paper.
Suppose we are in the dark room and on the floor we have bricks for my son. Bricks of different size, color, does not matter now. Suppose we have 5 bricks like those. Those 5 bricks can be described as an object – let’s say bricks kit. We can do many things with this bricks kit – can take one and then take second and then third, can change places of bricks, put first brick above the second. We can do many sorts of things with those. Therefore this bricks kit is an iterable object or sequence as we can go through each brick and do something with it. We can only do it like my little son – we can play with one brick at a time. So again I imagine myself this bricks kit to be an iterable.
Now remember that we are in the dark room. Or almost dark. The thing is that we don’t clearly see those bricks, what color they are, what shape etc. So even if we want to do something with them – aka iterate through them – we don’t really know what and how because it is too dark.
What we can do is near to first brick – as element of a bricks kit – we can put a piece of white fluorescent paper in order for us to see where the first brick-element is. And each time we take a brick from a kit, we replace the white piece of paper to a next brick in order to be able to see that in the dark room. This white piece of paper is nothing more than an iterator. It is an object as well. But an object with what we can work and play with elements of our iterable object – bricks kit.
That by the way explains my early mistake when I tried the following in an IDLE and got a TypeError:
>>> X = [1,2,3,4,5]
>>> next(X)
Traceback (most recent call last):
File "<pyshell#19>", line 1, in <module>
next(X)
TypeError: 'list' object is not an iterator
List X here was our bricks kit but NOT a white piece of paper. I needed to find an iterator first:
>>> X = [1,2,3,4,5]
>>> bricks_kit = [1,2,3,4,5]
>>> white_piece_of_paper = iter(bricks_kit)
>>> next(white_piece_of_paper)
1
>>> next(white_piece_of_paper)
2
>>>
Don’t know if it helps, but it helped me. If someone could confirm/correct visualization of the concept, I would be grateful. It would help me to learn more.
I don't think that you can get it much simpler than the documentation, however I'll try:
Iterable is something that can be iterated over. In practice it usually means a sequence e.g. something that has a beginning and an end and some way to go through all the items in it.
You can think Iterator as a helper pseudo-method (or pseudo-attribute) that gives (or holds) the next (or first) item in the iterable. (In practice it is just an object that defines the method next())
Iteration is probably best explained by the Merriam-Webster definition of the word :
b : the repetition of a sequence of computer instructions a specified
number of times or until a condition is met — compare recursion
Iterables have a __iter__ method that instantiates a new iterator every time.
Iterators implement a __next__ method that returns individual items, and a __iter__ method that returns self .
Therefore, iterators are also iterable, but iterables are not iterators.
Luciano Ramalho, Fluent Python.
Iterable:- something that is iterable is iterable; like sequences like lists ,strings etc.
Also it has either the __getitem__ method or an __iter__ method. Now if we use iter() function on that object, we'll get an iterator.
Iterator:- When we get the iterator object from the iter() function; we call __next__() method (in python3) or simply next() (in python2) to get elements one by one. This class or instance of this class is called an iterator.
From docs:-
The use of iterators pervades and unifies Python. Behind the scenes, the for statement calls iter() on the container object. The function returns an iterator object that defines the method __next__() which accesses elements in the container one at a time. When there are no more elements, __next__() raises a StopIteration exception which tells the for loop to terminate. You can call the __next__() method using the next() built-in function; this example shows how it all works:
>>> s = 'abc'
>>> it = iter(s)
>>> it
<iterator object at 0x00A1DB50>
>>> next(it)
'a'
>>> next(it)
'b'
>>> next(it)
'c'
>>> next(it)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
next(it)
StopIteration
Ex of a class:-
class Reverse:
"""Iterator for looping over a sequence backwards."""
def __init__(self, data):
self.data = data
self.index = len(data)
def __iter__(self):
return self
def __next__(self):
if self.index == 0:
raise StopIteration
self.index = self.index - 1
return self.data[self.index]
>>> rev = Reverse('spam')
>>> iter(rev)
<__main__.Reverse object at 0x00A1DB50>
>>> for char in rev:
... print(char)
...
m
a
p
s
Iterators are objects that implement the iter and next methods. If those methods are defined, we can use for loop or comprehensions.
class Squares:
def __init__(self, length):
self.length = length
self.i = 0
def __iter__(self):
print('calling __iter__') # this will be called first and only once
return self
def __next__(self):
print('calling __next__') # this will be called for each iteration
if self.i >= self.length:
raise StopIteration
else:
result = self.i ** 2
self.i += 1
return result
Iterators get exhausted. It means after you iterate over items, you cannot reiterate, you have to create a new object. Let's say you have a class, which holds the cities properties and you want to iterate over.
class Cities:
def __init__(self):
self._cities = ['Brooklyn', 'Manhattan', 'Prag', 'Madrid', 'London']
self._index = 0
def __iter__(self):
return self
def __next__(self):
if self._index >= len(self._cities):
raise StopIteration
else:
item = self._cities[self._index]
self._index += 1
return item
Instance of class Cities is an iterator. However if you want to reiterate over cities, you have to create a new object which is an expensive operation. You can separate the class into 2 classes: one returns cities and second returns an iterator which gets the cities as init param.
class Cities:
def __init__(self):
self._cities = ['New York', 'Newark', 'Istanbul', 'London']
def __len__(self):
return len(self._cities)
class CityIterator:
def __init__(self, city_obj):
# cities is an instance of Cities
self._city_obj = city_obj
self._index = 0
def __iter__(self):
return self
def __next__(self):
if self._index >= len(self._city_obj):
raise StopIteration
else:
item = self._city_obj._cities[self._index]
self._index += 1
return item
Now if we need to create a new iterator, we do not have to create the data again, which is cities. We creates cities object and pass it to the iterator. But we are still doing extra work. We could implement this by creating only one class.
Iterable is a Python object that implements the iterable protocol. It requires only __iter__() that returns a new instance of iterator object.
class Cities:
def __init__(self):
self._cities = ['New York', 'Newark', 'Istanbul', 'Paris']
def __len__(self):
return len(self._cities)
def __iter__(self):
return self.CityIterator(self)
class CityIterator:
def __init__(self, city_obj):
self._city_obj = city_obj
self._index = 0
def __iter__(self):
return self
def __next__(self):
if self._index >= len(self._city_obj):
raise StopIteration
else:
item = self._city_obj._cities[self._index]
self._index += 1
return item
Iterators has __iter__ and __next__, iterables have __iter__, so we can say Iterators are also iterables but they are iterables that get exhausted. Iterables on the other hand never become exhausted
because they always return a new iterator that is then used to iterate
You notice that the main part of the iterable code is in the iterator, and the iterable itself is nothing more than an extra layer that allows us to create and access the iterator.
Iterating over an iterable
Python has a built function iter() which calls the __iter__(). When we iterate over an iterable, Python calls the iter() which returns an iterator, then it starts using __next__() of iterator to iterate over the data.
NOte that in the above example, Cities creates an iterable but it is not a sequence type, it means we cannot get a city by an index. To fix this we should just add __get_item__ to the Cities class.
class Cities:
def __init__(self):
self._cities = ['New York', 'Newark', 'Budapest', 'Newcastle']
def __len__(self):
return len(self._cities)
def __getitem__(self, s): # now a sequence type
return self._cities[s]
def __iter__(self):
return self.CityIterator(self)
class CityIterator:
def __init__(self, city_obj):
self._city_obj = city_obj
self._index = 0
def __iter__(self):
return self
def __next__(self):
if self._index >= len(self._city_obj):
raise StopIteration
else:
item = self._city_obj._cities[self._index]
self._index += 1
return item
iterable = [1, 2]
iterator = iter(iterable)
print(iterator.__next__())
print(iterator.__next__())
so,
iterable is an object that can be looped over. e.g. list , string , tuple etc.
using the iter function on our iterable object will return an iterator object.
now this iterator object has method named __next__ (in Python 3, or just next in Python 2) by which you can access each element of iterable.
so,
OUTPUT OF ABOVE CODE WILL BE:
1
2
An iterable is an object that has an iter() method which returns an iterator. It is something that can be looped over.
Example : A list is iterable because we can loop over a list BUT is not an iterator
An iterator is an object that you can get an iterator from. It is an object with a state so that it remember where it is during iteration
To see if the object has this method iter() we can use the below function.
ls = ['hello','bye']
print(dir(ls))
Output
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']
As you can see has the iter() that's mean that is a iterable object, but doesn't contain the next() method which is a feature of the iterator object
Whenever you use a for loop or map or a list comprehension in Python the next method is called automatically to get each item from the iteration
Before dealing with the iterables and iterator the major factor that decide the iterable and iterator is sequence
Sequence: Sequence is the collection of data
Iterable: Iterable are the sequence type object that support __iter__ method.
Iter method: Iter method take sequence as an input and create an object which is known as iterator
Iterator: Iterator are the object which call next method and transverse through the sequence. On calling the next method it returns the object that it traversed currently.
example:
x=[1,2,3,4]
x is a sequence which consists of collection of data
y=iter(x)
On calling iter(x) it returns a iterator only when the x object has iter method otherwise it raise an exception.If it returns iterator then y is assign like this:
y=[1,2,3,4]
As y is a iterator hence it support next() method
On calling next method it returns the individual elements of the list one by one.
After returning the last element of the sequence if we again call the next method it raise an StopIteration error
example:
>>> y.next()
1
>>> y.next()
2
>>> y.next()
3
>>> y.next()
4
>>> y.next()
StopIteration
Other people already explained comprehensively, what is iterable and iterator, so I will try to do the same thing with generators.
IMHO the main problem for understanding generators is a confusing use of the word “generator”, because this word is used in 2 different meanings:
as a tool for creating (generating) iterators,
in the form of a function returning an iterator (i.e. with the yield statement(s) in its body),
in the form of a generator expression
as a result of the use of that tool, i.e. the resulting iterator.
(In this meaning a generator is a special form of an iterator — the word “generator” points out how this iterator was created.)
Generator as a tool of the 1st type:
In[2]: def my_generator():
...: yield 100
...: yield 200
In[3]: my_generator
Out[3]: <function __main__.my_generator()>
In[4]: type(my_generator)
Out[4]: function
Generator as a result (i.e. an iterator) of the use of this tool:
In[5]: my_iterator = my_generator()
In[6]: my_iterator
Out[6]: <generator object my_generator at 0x00000000053EAE48>
In[7]: type(my_iterator)
Out[7]: generator
Generator as a tool of the 2nd type — indistinguishable from the resulting iterator of this tool:
In[8]: my_gen_expression = (2 * i for i in (10, 20))
In[9]: my_gen_expression
Out[9]: <generator object <genexpr> at 0x000000000542C048>
In[10]: type(my_gen_expression)
Out[10]: generator
Here's another view using collections.abc. This view may be useful the second time around or later.
From collections.abc we can see the following hierarchy:
builtins.object
Iterable
Iterator
Generator
i.e. Generator is derived from Iterator is derived from Iterable is derived from the base object.
Hence,
Every iterator is an iterable, but not every iterable is an iterator. For example, [1, 2, 3] and range(10) are iterables, but not iterators. x = iter([1, 2, 3]) is an iterator and an iterable.
A similar relationship exists between Iterator and Generator.
Calling iter() on an iterator or a generator returns itself. Thus, if it is an iterator, then iter(it) is it is True.
Under the hood, a list comprehension like [2 * x for x in nums] or a for loop like for x in nums:, acts as though iter() is called on the iterable (nums) and then iterates over nums using that iterator. Hence, all of the following are functionally equivalent (with, say, nums=[1, 2, 3]):
for x in nums:
for x in iter(nums):
for x in iter(iter(nums)):
for x in iter(iter(iter(iter(iter(nums))))):
For me, Python's glossery was most helpful for these questions, e.g. for iterable it says:
An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an iter() method or with a getitem() method that implements Sequence semantics.
Iterables can be used in a for loop and in many other places where a sequence is needed (zip(), map(), …). When an iterable object is passed as an argument to the built-in function iter(), it returns an iterator for the object. This iterator is good for one pass over the set of values. When using iterables, it is usually not necessary to call iter() or deal with iterator objects yourself. The for statement does that automatically for you, creating a temporary unnamed variable to hold the iterator for the duration of the loop. See also iterator, sequence, and generator.
I have a dummy example of an iterator container below (the real one reads a file too large to fit in memory):
class DummyIterator:
def __init__(self, max_value):
self.max_value = max_value
def __iter__(self):
for i in range(self.max_value):
yield i
def regular_dummy_iterator(max_value):
for i in range(max_value):
yield i
This allows me to iterate over the value more than once so that I can implement something like this:
def normalise(data):
total = sum(i for i in data)
for val in data:
yield val / total
# this works when I call next()
normalise(DummyIterator(100))
# this doesn't work when I call next()
normalise(regular_dummy_iterator(100))
How do I check in the normalise function that I am being passed an iterator container rather than a normal generator?
First of all: There is no such thing as a iterator container. You have an iterable.
An iterable produces an iterator. Any iterator is also an iterable, but produces itself as the iterator:
>>> list_iter = iter([])
>>> iter(list_iter) is list_iter
True
You don't have an iterator if the iter(ob) is ob test is false.
You can test whether you have an iterator (is consumed once next raises the StopIteration exception) vs just an iterable (can probably be iterated over multiple times) by using the collections.abcmodule. Here is an example:
from collections.abc import Iterable, Iterator
def my_iterator():
yield 1
i = my_iterator()
a = []
isinstance(i, Iterator) # True
isinstance(a, Iterator) # False
What makes my_iterator() an Iterator is the presence of both the __next__ and __iter__ magic methods (and by the way, basically what is happening behind the scenes when you call isinstance on a collections.abc abstract base class is a test for the presence of certain magic methods).
Note that an iterator is also an Iterable, as is the empty list (i.e., both have the __iter__ magic method):
isinstance(i, Iterable) # True
isinstance(a, Iterable) # True
Also note, as was pointed out in Martijn Pieters' answer, that when you apply the generic iter() function to both, you get an iterator:
isinstance(iter(my_iterator()), Iterator) # True
isinstance(iter([])), Iterator) # True
The difference here between [] and my_iterator() is that iter(my_iterator()) returns itself as the iterator, whereas iter([]) produces a new iterator every time.
As was already mentioned in MP's same answer, your object above is not an "iterator container." It is an iterable object, i.e., "an iterable". Whether or not it "contains" something isn't really related; the concept of containing is represented by the abstract base class Container. A Container may be iterable, but it doesn't necessarily have to be.
I've learnt from the official documentation of python 2.7.8 how to work with iterators and generators. I've got a question related on a curiosity.
it = iter("abcde")
print it
>>> <iterator object at 0x7ff4c2b3bad0>
class example1():
def __init__(self, word):
self.word = word
self.index = len(word)
def __iter__(self):
for x in range(self.index - 1, -1, -1):
yield self.word[x]
a = example1("altalena")
print iter(a)
>>> <generator object __iter__ at 0x7f24712000a0>
In the above examples, when I print the iterators, i read "generator","iterator" object and the hexadecimal ID. Why I can't do the same with the following code?
class example2():
def __init__(self, word):
self.word = word
self.index = len(word)
def __iter__(self):
return self
def next (self):
if self.index == 0:
raise StopIteration
self.index = self.index - 1
return self.word[self.index]
a = example2()
print iter(a)
>>> <__main__.example2 instance at 0x7f89ee2de440>
I think it is caused by "return self" in iter, that leads to the class instance, but i don't know the solution to get a more right output. It may be useless but I don't know why it is an how to avoid it.
Generators are a type of iterator. Your example1 class is returning a generator because you used yield in the __iter__; it is a separate iterator object, returned from __iter__. This makes your example1 class iterable, not an iterator itself.
In your second example your class __iter__ returns self. It is not just iterable, it is its own iterator. Because it is returning self, there is no separate object here.
Python makes an explicit distinction between iterable and iterator. An iterable can potentially be iterated over. An iterator does the actual work when iterating. By using iter() you ask Python to produce an iterator for a given iterable.
This is why iter(stringobject) returns a new object; you have produced an iterator from the iterable string.
You need this distinction because the process of iterating requires something that keeps state; namely where in the process are we now. An iterator is the object that keeps track of that, so that each time you call the .next() method on the iterator, you get the next value, or StopIteration is raised if there is no next value to produce anymore.
So in your example, the string and example1 are both just iterable, and calling iter() on them produced a new separate iterator.
However, your example2 is it's own iterator. Calling iter() on it does not produce a separate object. You cannot create separate iterators from something that already is an iterator.
If you want to produce a new independent iterator for your example2 class, you need to return a distinct, separate object from __iter__:
class ExampleIterator(object):
def __init__(self, source):
self.source = source
self.position = len(source)
def __iter__(self):
return self
def next(self):
if self.position <= 0:
raise StopIteration
self.position -= 1
return self.source[self.position]
class Example2():
def __init__(self, word):
self.word = word
def __iter__(self):
return ExampleIterator(self)
Here Example2 is just a iterable again, not an iterator. The __iter__ method returns a new, distinct ExampleIterator() instance, and it is responsible for keeping track of the iteration state.
I think you don't have a problem, you just don't understand what happens here.
In general, iter(object) returns an iterator for this iterable object.
This is obtained either via calling __iter__() or, if that doesn't exist, by providing a wrapper object which calls __getitem__() until it is exhausted (raises IndexError).
The object returned by __iter__() can either be the iterable itself (like in your 2nd example) or something else. Especially, it can be a generator object if you make __iter__() a generator function as you do in your 1st example.
The iteration itself happens with the iterator object and its next() resp. __next__() method.