For the iterator protocol you create both an __iter__ and __next__ method. However, what about the following case:
class Item:
def __init__(self):
self.name = 'James'
def __iter__(self):
return self
Now I can do:
>>> i=Item()
>>> iter(i)
<__main__.Item instance at 0x10bfe6e18>
But not:
>>> next(i)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: instance has no next() method
As far as I'm aware, the definition of iterator/iterable is:
Iterable has the method __iter__
Iterator has the method __next__
Would this then mean that my item above is an Iterable but not an Iterator? Or would it be neither because doing the following wouldn't work:
>>> for item in i:
... print (item)
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: instance has no next() method
Note this would be the full class that has the iterator methods defined:
class Item:
def __init__(self):
self.name = 'James'
self.i = 0
def __iter__(self):
return self
def __next__(self):
if self.i >= len(self.name): raise StopIteration
value = self.name[self.i]
self.i += 1
return value
You're mostly right with your definitions, but not quite.
An object is iterable if it has an __iter__ method that returns an iterator.
An object is an iterator if it has a __next__ method to get the next value while iterating. But iterators in Python are also expected to be iterable. They should all have an __iter__ method that returns self.
Your first example has an __iter__ method, but because it returns self and the object is not an iterator (since it has no __next__ method), it's not really a valid iterable either.
To make a non-iterator iterable, you need to return some other object that is a valid iterator. One sneaky way to do it is to make __iter__ a generator method (by using yield in its implementation). But if you have some sequence of values to return, you could also just return an iterator over that sequence.
The class in your last code block is indeed an iterator. But if you wanted to make it an iterable that is not its own iterator (perhaps because you want to be able to iterate over it several times), you would probably want something more like this:
class Item:
def __init__(self):
self.name = 'James'
def __iter__(self):
return iter(self.name)
__iter__ and __next__, as iterable and iterator, are different things. And although it is possible to have both methods on the same class, with __iter__ returning self, this would work only for proof of concepts, not for production code.
An iterable will have an __iter__ method that returns an object that has __next__. If both are the same instances as in
this is just a demo, with faulty code
class Item:
def __init__(self, data):
self.data = data
def __iter__(self):
self.counter = 0
return self
def __next__(self):
self.counter += 1
if self.counter > len(self.data):
raise StopIteration()
return self.data[self.counter - 1]
This will work - but if you try to create two independent iterators on the same instance of Item, they won't work as desired - since both would share the same counter - the attribute counter in the instance.
It is rare however that one needs to implement __next__: if __iter__ is writen as a generator function, having a yield instead of returning self, that will just work. Python will call __next__ on the generator created automatically with each call to __iter__:
this works
class Item:
def __init__(self, data):
self.data = data
def __iter__(self):
for item in self.data:
yield item
As you can see, the correct way is "nextless" and much simpler, and can also be implemented, in this case, by returning an independent iterator for the data. (The yield implementation is needed if getting to each item requires some custom computation)
this also works
class Item:
def __init__(self, data):
self.data = data
def __iter__(self):
return iter(self.data)
(on this case, Python will call __next__ on the iterator created for self.data)
If you really want to implement __next__, the object with that method have to keep track of any counter or pointers needed to retrieve the next items, and that must be independent of the host instance. The most straightforward way to do that is to have a second class, related to your first one, and have __iter__ return an instance of that instead:
working "full" example
class Item:
def __init__(self, data):
self.data = data
def __iter__(self):
return ItemIterator(self)
class ItemIterator:
def __init__(self, item):
self.item = item
self.counter = 0
def __next__(self):
self.counter += 1
if self.counter > len(self.item.data):
raise StopIteration()
return self.item.data[self.counter - 1]
Would this then mean that my item above is an Iterable but not an Iterator?
No; to be iterable, the __iter__ method should return an iterator. Yours doesn't.
According to the glossary in the official Python docs:
Iterable
An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an __iter__() method or with a __getitem__() method that implements Sequence semantics.
Iterables can be used in a for loop and in many other places where a sequence is needed (zip(), map(), …).
Since your item is not "capable of returning its members one at a time", and cannot "be used in a for loop [or] other places where a sequence is needed", it is not iterable.
Note also that the __next__ method is not sufficient to be an iterator; an iterator must also have an __iter__ method which returns itself:
Iterator objects also need to implement this method; they are required to return themselves.
Related
I want to ask how is the iterator in python designed? I have read that when the iterator is used it returns single values of the sequence of an iterable it represents upon the call of next or next(). And this way it does not need to copy the full iterable and therefore occupy memory. Exactly how can you return single values of a sequence object like list, string or mapping objects like dictionaries? Does it store pointers to the original sequence datastructure contents and then have a method called next which increments the pointer?
Thanks
In Python 3, however, zip() returns an iterator. This object yields
tuples on demand and can be traversed only once. The iteration ends
with a StopIteration exception once the shortest input iterable is
exhausted. If you supply no arguments to zip(), then the function
returns an empty iterator:
An iterator is just a class that defines __next__. That's it (though all iterators should also be iterable by defining __iter__ = lambda self: self as well). It's entirely how __next__ is defined that defines the behavior of the iterator.
You can create an iterator over a constant sequence
class Ones:
def __iter__(self):
return self
def __next__(self):
return 1
or you can create an iterator that walks through some iterable value. It's the iterator that keeps track of which value to return next, from the values supplied by the iterable. Here's a simplified version of list_iterator, which is the built-in type of which list.__iter__ return values.
class list_iterator:
def __init__(self, lst):
self.lst = lst
self.i = 0
def __iter__(self):
return self
def __next__(self):
if self.i == len(self.lst):
raise StopIteration
x = self.lst[self.i]
self.i += 1
return x
x = [1,2,3]
# prints 1, then 2, then 3
for v in list_iterator(x):
print(v)
I am learning python these days and have a small question as follows:
Define a class drop_first that has only two methods init and iter
The constructor of drop_first has parameter iterable, to which an iterable is given.
Method init applies function iter to iterable and records the resulting iterator in some attribute. It then calls function next on the iterator once.
Method iter simply returns the iterator recorded in the attribute.
For example, the instance created by drop_first([1,2,3]) allows iteration skipping the first element 1.
for x in drop_first([1,2,3]): print(x)
2 and 3 will be printed.
I answered with the following codes:
class drop_first:
def __init__(self, iterable):
self.iterable = iterable
self.index = 1
def __iter__(self):
return self
def __next__(self):
try:
result = self.iterable[self.index]
except IndexError:
raise StopIteration
self.index += 1
return result
Nevertheless, I found that it was required to create a class with only 2 methods inside, and am a little bit confused about the specifications of those two methods. So I am wondering if anyone could give me some explainations about the requirement of the class...Thank you.
Define a class drop_first that has only two methods init and iter
Bad instruction. This class should be called DropFirst. But be it.
The constructor of drop_first has parameter iterable, to which an iterable is given.
def __init__(self, iterable):
Method init applies function iter to iterable and records the resulting iterator in some attribute. It then calls function next on the iterator once.
self.iterator = iter(iterable)
next(self.iterator)
Method iter simply returns the iterator recorded in the attribute.
def __iter__(self):
return self.iterator
This results in
class DropFirst: # or, if your instructor insists on it, drop_first
def __init__(self, iterable):
self.iterator = iter(iterable)
next(self.iterator)
def __iter__(self):
return self.iterator
But, to be honest, I'd take the following approach (which breaks the specifications, but is more versatile):
class DropFirst:
def __init__(self, iterable):
self.iterable = iterable
def __iter__(self):
iterator = iter(self.iterable)
next(iterator)
return iterator
This works in all cases the original one also does, but can be iterated over several times.
In Python 3, it is standard procedure to make a class an iterable and iterator at the same time by defining both the __iter__ and __next__ methods. But I have problems to wrap my head around this. Take this example which creates an iterator that produces only even numbers:
class EvenNumbers:
def __init__(self, max_):
self.max_ = max_
def __iter__(self):
self.n = 0
return self
def __next__(self):
if self.n <= self.max_: # edit: self.max --> self.max_
result = 2 * self.n
self.n += 1
return result
raise StopIteration
instance = EvenNumbers(4)
for entry in instance:
print(entry)
To my knowledge (correct me if I'm wrong), when I create the loop, an iterator is created by calling something like itr = iter(instance) which internally calls the __iter__ method. This is expected to return an iterator object (which the instance is due to defining __next__ and therefore I can just return self). To get an element from it, next(itr) is called until the exception is raised.
My question here is now: if and how can __iter__ and __next__ be separated, so that the content of the latter function is defined somewhere else? And when could this be useful? I know that I have to change __iter__ so that it returns an iterator.
Btw the idea to do this comes from this site (LINK), which does not state how to implement this.
It sounds like you're confusing iterators and iterables. Iterables have an __iter__ method which returns an iterator. Iterators have a __next__ method which returns either their next value or raise a StopIteration. Now in python, it is stated that iterators are also iterables (but not visa versa) and that iter(iterator) is iterator so an iterator, itr, should return only itself from it's __iter__ method.
Iterators are required to have an __iter__() method that returns the iterator object itself so every iterator is also iterable and may be used in most places where other iterables are accepted
In code:
class MyIter:
def __iter__(self):
return self
def __next__(self):
# actual iterator logic
If you want to make a custom iterator class, the easiest way is to inherit from collections.abc.Iterator which you can see defines __iter__ as above (it is also a subclass of collections.abc.Iterable). Then all you need is
class MyIter(collections.abc.Iterator):
def __next__(self):
...
There is of course a much easier way to make an iterator, and thats with a generator function
def fib():
a = 1
b = 1
yield a
yield b
while True:
b, a = a + b, b
yield b
list(itertools.takewhile(lambda x: x < 100, fib()))
# --> [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
Just for reference, this is (simplified) code for an abstract iterator and iterable
from abc import ABC, abstractmethod
class Iterable(ABC):
#abstractmethod
def __iter__(self):
'Returns an instance of Iterator'
pass
class Iterator(Iterable, ABC):
#abstractmethod
def __next__(self):
'Return the next item from the iterator. When exhausted, raise StopIteration'
pass
# overrides Iterable.__iter__
def __iter__(self):
return self
I think I have grasped the concept now, even if I do not fully understand the passage from the documentation by #FHTMitchell. I came across an example on how to separate the two methods and wanted to document this.
What I found is a very basic tutorial that clearly distinguishes between the iterable and the iterator (which is the cause of my confusion).
Basically, you define your iterable first as a separate class:
class EvenNumbers:
def __init__(self, max_):
self.max = max_
def __iter__(self):
self.n = 0
return EvenNumbersIterator(self)
The __iter__ method only requires an object that has a __next__ method defined. Therefore, you can do this:
class EvenNumbersIterator:
def __init__(self, source):
self.source = source
def __next__(self):
if self.source.n <= self.source.max:
result = 2 * self.source.n
self.source.n += 1
return result
else:
raise StopIteration
This separates the iterator part from the iterable class. It now makes sense that if I define __next__ within the iterable class, I have to return the reference to the instance itself as it basically does 2 jobs at once.
I have a class that looks like this
Class myClass:
def __init__(self, key, value):
self.key = key
self.value = value
where key is a string and value is always a list of elements of myClass, possibly empty.
I want to define my own iter method that returns value.key for each value in values. I tried
def __iter__(self):
return self
def __next__(self):
try:
self.value.next().key
except:
raise StopIteration
but it's looping forever. What am I doing wrong?
Also if I want to have Python 2 / 3 compatibility, should I add the method
def next(self):
return self.__next__()
There's no reason for you to implement __next__. You can use __iter__ to return a generator which will do what you want.
class Pair(object):
def __init__(self, key, value):
self.key = key
self.value = value
def __iter__(self):
return (v.key for v in self.value)
# alternative iter function, that allows more complicated logic
def __iter__(self):
for v in self.value:
yield v.key
p = Pair("parent", [Pair("child0", "value0"), Pair("child1", "value1")])
assert list(p) == ["child0", "child1"]
This way of doing things is compatible with both python2 and python3 as the returned generator will have the required next function in python2, and __next__ in python3.
You need to extract and preserve an iterator on list self.value -- you can't just call next on a list, you need an iterator on such a list.
So, you need an auxiliary iterator class:
class myClassIter(object):
def __init__(self, theiter):
self.theiter = theiter
def __next__(self):
return next(self.theiter).key
next = __next__
which I've also made Py 2/3 compatible with the object base and appropriate aliasing.
Here, I'm assuming every item in the list has a key attribute (so the only expected exception is StopIteration, which you can just propagate). If that is not the case, and you want to just stop the iteration when an item is met without the attribite, the try/except is needed, but keep it tight! -- a crucial design aspect of good exception handling. I.e, if these are indeed your specs:
def __next__(self):
try: return next(self.theiter).key
except AttributeError: raise StopIteration
don't catch all exceptions -- only the ones you specifically expect!
Now, in myClass, you'll want:
def __iter__(self):
return myClassIter(iter(self.value))
This means that myClass is an iterable, not an iterator, so you can e.g properly have more than one loop on a myClass instance:
mc = myClass(somekey, funkylist)
for ka in mc:
for kb in mc:
whatever(ka, kb)
If mc was itself an iterator, the inner loop would exhaust it and the semantics of the nested loops would therefore be completely different.
If you do indeed want such completely different semantics (i.e you want mc to be itself an iterator, not just an iterable) then you must dispense with the auxiliary class (but still need to store the iterator on self.value as an instance attribute for myClass) -- that would be a strange, uncomfortable arrangement, but it is (just barely) possible that it is indeed the arrangement your application needs...
I've learnt from the official documentation of python 2.7.8 how to work with iterators and generators. I've got a question related on a curiosity.
it = iter("abcde")
print it
>>> <iterator object at 0x7ff4c2b3bad0>
class example1():
def __init__(self, word):
self.word = word
self.index = len(word)
def __iter__(self):
for x in range(self.index - 1, -1, -1):
yield self.word[x]
a = example1("altalena")
print iter(a)
>>> <generator object __iter__ at 0x7f24712000a0>
In the above examples, when I print the iterators, i read "generator","iterator" object and the hexadecimal ID. Why I can't do the same with the following code?
class example2():
def __init__(self, word):
self.word = word
self.index = len(word)
def __iter__(self):
return self
def next (self):
if self.index == 0:
raise StopIteration
self.index = self.index - 1
return self.word[self.index]
a = example2()
print iter(a)
>>> <__main__.example2 instance at 0x7f89ee2de440>
I think it is caused by "return self" in iter, that leads to the class instance, but i don't know the solution to get a more right output. It may be useless but I don't know why it is an how to avoid it.
Generators are a type of iterator. Your example1 class is returning a generator because you used yield in the __iter__; it is a separate iterator object, returned from __iter__. This makes your example1 class iterable, not an iterator itself.
In your second example your class __iter__ returns self. It is not just iterable, it is its own iterator. Because it is returning self, there is no separate object here.
Python makes an explicit distinction between iterable and iterator. An iterable can potentially be iterated over. An iterator does the actual work when iterating. By using iter() you ask Python to produce an iterator for a given iterable.
This is why iter(stringobject) returns a new object; you have produced an iterator from the iterable string.
You need this distinction because the process of iterating requires something that keeps state; namely where in the process are we now. An iterator is the object that keeps track of that, so that each time you call the .next() method on the iterator, you get the next value, or StopIteration is raised if there is no next value to produce anymore.
So in your example, the string and example1 are both just iterable, and calling iter() on them produced a new separate iterator.
However, your example2 is it's own iterator. Calling iter() on it does not produce a separate object. You cannot create separate iterators from something that already is an iterator.
If you want to produce a new independent iterator for your example2 class, you need to return a distinct, separate object from __iter__:
class ExampleIterator(object):
def __init__(self, source):
self.source = source
self.position = len(source)
def __iter__(self):
return self
def next(self):
if self.position <= 0:
raise StopIteration
self.position -= 1
return self.source[self.position]
class Example2():
def __init__(self, word):
self.word = word
def __iter__(self):
return ExampleIterator(self)
Here Example2 is just a iterable again, not an iterator. The __iter__ method returns a new, distinct ExampleIterator() instance, and it is responsible for keeping track of the iteration state.
I think you don't have a problem, you just don't understand what happens here.
In general, iter(object) returns an iterator for this iterable object.
This is obtained either via calling __iter__() or, if that doesn't exist, by providing a wrapper object which calls __getitem__() until it is exhausted (raises IndexError).
The object returned by __iter__() can either be the iterable itself (like in your 2nd example) or something else. Especially, it can be a generator object if you make __iter__() a generator function as you do in your 1st example.
The iteration itself happens with the iterator object and its next() resp. __next__() method.