If I understand properly, we in Python we have:
Iterables = __iter__() is implemented
Iterators = __iter__() returns self & __next__() is implemented
Generators = an iterator created with a yield statement or a generator expression.
Question: Are there categories above that are always/never consumable?
By consumable I mean iterating through them "destroys" the iterable; like zip() (consumable) vs range() (not consumable).
All iterators are consumed; the reason you might not think so is that when you use an iterable with something like
for x in [1,2,3]:
the for loop is creating a new iterator for you behind the scenes. In fact, a list is not an iterator; iter([1,2,3]) returns something of type list_iterator, not the list itself.
Regarding the example you linked to in a comment, instead of
class PowTwo:
def __init__(self, max=0):
self.max = max
def __iter__(self):
self.n = 0
return self
def __next__(self):
if self.n <= self.max:
result = 2 ** self.n
self.n += 1
return result
else:
raise StopIteration
which has the side effect of modifying the iterator in the act of returning it, I would do something like
class PowTwoIterator:
def __init__(self, max=0):
self.max = max
self._restart()
def _restart(self):
self._n = 0
def __iter__(self):
return self
def __next__(self):
if self._n <= self.max:
result = 2 ** self._n
self._n += 1
return result
else:
raise StopIteration
Now, the only way you can modify the state of the object is to do so explicitly (and even that should not be done lightly, since both _n and _restart are marked as not being part of the public interface).
The change in the name reminds you that this is first and foremost an iterator, not an iterable that can provide independent iterators from.
Related
So I get generator functions for lazy evaluation and generator expressions, aka generator comprehensions as its syntactic sugar equivalent.
I understand classes like
class Itertest1:
def __init__(self):
self.count = 0
self.max_repeats = 100
def __iter__(self):
print("in __inter__()")
return self
def __next__(self):
if self.count >= self.max_repeats:
raise StopIteration
self.count += 1
print(self.count)
return self.count
as a way of implementing the iterator interface, i.e. iter() and next() in one and the same class.
But what then is
class Itertest2:
def __init__(self):
self.data = list(range(100))
def __iter__(self):
print("in __inter__()")
for i, dp in enumerate(self.data):
print("idx:", i)
yield dp
which uses the yield statement within the iter member function?
Also I noticed that upon calling the iter member function
it = Itertest2().__iter__()
batch = it.__next__()
the print statement is only executed when calling next() for the first time.
Is this due to this weird mixture of yield and iter? I think this is quite counter intuitive...
Something equivalent to Itertest2 could be written using a separate iterator class.
class Itertest3:
def __init__(self):
self.data = list(range(100))
def __iter__(self):
return Itertest3Iterator(self.data)
class Itertest3Iterator:
def __init__(self, data):
self.data = enumerate(data)
def __iter__(self):
return self
def __next__(self):
print("in __inter__()")
i, dp = next(self.state) # Let StopIteration exception propagate
print("idx:", i)
return dp
Compare this to Itertest1, where the instance of Itertest1 itself carried the state of the iteration around in it. Each call to Itertest1.__iter__ returned the same object (the instance of Itertest1), so they couldn't iterate over the data independently.
Notice I put print("in __iter__()") in __next__, not __iter__. As you observed, nothing in a generator function actually executes until the first call to __next__. The generator function itself only creates an generator; it does not actually start executing the code in it.
Having the yield statement anywhere in any function wraps the function code in a (native) generator object, and replaces the function with a stub that gives you said generator object.
So, here, calling __iter__ will give you an anonymous generator object that executes the code you want.
The main use case for __next__ is to provide a way to write an iterator without relying on (native) generators.
The use case of __iter__ is to distinguish between an object and an iteration state over said object. Consider code like
c = some_iterable()
for a in c:
for b in c:
# do something with a and b
You would not want the two interleaved iterations to interfere with each other's state. This is why such a loop would desugar to something like
c = some_iterable()
_iter1 = iter(c)
try:
while True:
a = next(_iter1)
_iter2 = iter(c)
try:
while True:
b = next(_iter2)
# do something with a and b
except StopIteration:
pass
except StopIteration:
pass
Typically, custom iterators implement a stub __iter__ that returns self, so that iter(iter(x)) is equivalent to iter(x). This is important when writing iterator wrappers.
In Python 3, it is standard procedure to make a class an iterable and iterator at the same time by defining both the __iter__ and __next__ methods. But I have problems to wrap my head around this. Take this example which creates an iterator that produces only even numbers:
class EvenNumbers:
def __init__(self, max_):
self.max_ = max_
def __iter__(self):
self.n = 0
return self
def __next__(self):
if self.n <= self.max_: # edit: self.max --> self.max_
result = 2 * self.n
self.n += 1
return result
raise StopIteration
instance = EvenNumbers(4)
for entry in instance:
print(entry)
To my knowledge (correct me if I'm wrong), when I create the loop, an iterator is created by calling something like itr = iter(instance) which internally calls the __iter__ method. This is expected to return an iterator object (which the instance is due to defining __next__ and therefore I can just return self). To get an element from it, next(itr) is called until the exception is raised.
My question here is now: if and how can __iter__ and __next__ be separated, so that the content of the latter function is defined somewhere else? And when could this be useful? I know that I have to change __iter__ so that it returns an iterator.
Btw the idea to do this comes from this site (LINK), which does not state how to implement this.
It sounds like you're confusing iterators and iterables. Iterables have an __iter__ method which returns an iterator. Iterators have a __next__ method which returns either their next value or raise a StopIteration. Now in python, it is stated that iterators are also iterables (but not visa versa) and that iter(iterator) is iterator so an iterator, itr, should return only itself from it's __iter__ method.
Iterators are required to have an __iter__() method that returns the iterator object itself so every iterator is also iterable and may be used in most places where other iterables are accepted
In code:
class MyIter:
def __iter__(self):
return self
def __next__(self):
# actual iterator logic
If you want to make a custom iterator class, the easiest way is to inherit from collections.abc.Iterator which you can see defines __iter__ as above (it is also a subclass of collections.abc.Iterable). Then all you need is
class MyIter(collections.abc.Iterator):
def __next__(self):
...
There is of course a much easier way to make an iterator, and thats with a generator function
def fib():
a = 1
b = 1
yield a
yield b
while True:
b, a = a + b, b
yield b
list(itertools.takewhile(lambda x: x < 100, fib()))
# --> [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]
Just for reference, this is (simplified) code for an abstract iterator and iterable
from abc import ABC, abstractmethod
class Iterable(ABC):
#abstractmethod
def __iter__(self):
'Returns an instance of Iterator'
pass
class Iterator(Iterable, ABC):
#abstractmethod
def __next__(self):
'Return the next item from the iterator. When exhausted, raise StopIteration'
pass
# overrides Iterable.__iter__
def __iter__(self):
return self
I think I have grasped the concept now, even if I do not fully understand the passage from the documentation by #FHTMitchell. I came across an example on how to separate the two methods and wanted to document this.
What I found is a very basic tutorial that clearly distinguishes between the iterable and the iterator (which is the cause of my confusion).
Basically, you define your iterable first as a separate class:
class EvenNumbers:
def __init__(self, max_):
self.max = max_
def __iter__(self):
self.n = 0
return EvenNumbersIterator(self)
The __iter__ method only requires an object that has a __next__ method defined. Therefore, you can do this:
class EvenNumbersIterator:
def __init__(self, source):
self.source = source
def __next__(self):
if self.source.n <= self.source.max:
result = 2 * self.source.n
self.source.n += 1
return result
else:
raise StopIteration
This separates the iterator part from the iterable class. It now makes sense that if I define __next__ within the iterable class, I have to return the reference to the instance itself as it basically does 2 jobs at once.
I'm trying to figure out how to make iterator, below is an iterator that works fine.
class DoubleIt:
def __init__(self):
self.start = 1
def __iter__(self):
self.max = 10
return self
def __next__(self):
if self.start < self.max:
self.start *= 2
return self.start
else:
raise StopIteration
obj = DoubleIt()
i = iter(obj)
print(next(i))
However, when I try to pass 16 into the second argument in iter() (I expect the iterator will stop when return 16)
i = iter(DoubleIt(), 16)
print(next(i))
It throws TypeError: iter(v, w): v must be callable
Therefore, I try to do so.
i = iter(DoubleIt, 16)
print(next(i))
It returns <main.DoubleIt object at 0x7f4dcd4459e8>. Which is not I expected.
I checked the website of programiz, https://www.programiz.com/python-programming/methods/built-in/iter
Which said that callable object must be passed in the first argument so as to use the second argument, but it doesn't mention can User defined object be passed in it in order to use the second argument.
So my question is, is there a way to do so? Can the second argument be used with the "Self defined Object"?
The documentation could be a bit clearer on this, it only states
iter(object[, sentinel])
...
The iterator created in this case will call object with no arguments
for each call to its __next__() method; if the value returned is equal to sentinel, StopIteration will be raised, otherwise the value will be returned.
What is maybe not said perfectly clearly is that what the iterator yields is whatever the callable returns. And since your callable is a class (with no arguments), it returns a new instance of the class every iteration.
One way around this is to make your class callable and delegate it to the __next__ method:
class DoubleIt:
def __init__(self):
self.start = 1
def __iter__(self):
return self
def __next__(self):
self.start *= 2
return self.start
__call__ = __next__
i = iter(DoubleIt(), 16)
print(next(i))
# 2
print(list(i))
# [4, 8]
This has the dis-/advantage that it is an infinite generator that is only stopped by the sentinel value of iter.
Another way is to make the maximum an argument of the class:
class DoubleIt:
def __init__(self, max=10):
self.start = 1
self.max = max
def __iter__(self):
return self
def __next__(self):
if self.start < self.max:
self.start *= 2
return self.start
else:
raise StopIteration
i = iter(DoubleIt(max=16))
print(next(i))
# 2
print(list(i))
# [4, 8, 16]
One difference to note is that iter stops when it encounters the sentinel value (and does not yield the item), whereas this second way uses <, instead of <= comparison (like your code) and will thus yield the maximum item.
Here's an example of a doubler routine that would work with the two argument mode of iter:
count = 1
def nextcount():
global count
count *= 2
return count
print(list(iter(nextcount, 16)))
# Produces [2, 4, 8]
This mode involves iter creating the iterator for us. Note that we need to reset count before it can work again; it only works given a callable (such as a function or bound method) that has side effects (changing the counter), and the iterator will only stop upon encountering exactly the sentinel value.
Your DoubleIt class provided no particular protocol for setting a max value, and iter doesn't expect or use any such protocol either. The alternate mode of iter creates an iterator from a callable and a sentinel value, quite independent of the iterable or iterator protocols.
The behaviour you expected is more akin to what itertools.takewhile or itertools.islice do, manipulating one iterator to create another.
Another way to make an iterable object is to implement the sequence protocol:
class DoubleSeq:
def __init__(self, steps):
self.steps = steps
def __len__(self):
return self.steps
def __getitem__(self, iteration):
if iteration >= self.steps:
raise IndexError()
return 2**iteration
print(list(iter(DoubleSeq(4))))
# Produces [1, 2, 4, 8]
Note that DoubleSeq isn't an iterator at all; iter created one for us using the sequence protocol. DoubleSeq doesn't hold the iteration counter, the iterator does.
I would like to compare all elements in my iterable object combinatorically with each other. The following reproducible example just mimics the functionality of a plain list, but demonstrates my problem. In this example with a list of ["A","B","C","D"], I would like to get the following 16 lines of output, every combination of each item with each other. A list of 100 items should generate 100*100=10,000 lines.
A A True
A B False
A C False
... 10 more lines ...
D B False
D C False
D D True
The following code seemed like it should do the job.
class C():
def __init__(self):
self.stuff = ["A","B","C","D"]
def __iter__(self):
self.idx = 0
return self
def __next__(self):
self.idx += 1
if self.idx > len(self.stuff):
raise StopIteration
else:
return self.stuff[self.idx - 1]
thing = C()
for x in thing:
for y in thing:
print(x, y, x==y)
But after finishing the y-loop, the x-loop seems done, too, even though it's only used the first item in the iterable.
A A True
A B False
A C False
A D False
After much searching, I eventually tried the following code, hoping that itertools.tee would allow me two independent iterators over the same data:
import itertools
thing = C()
thing_one, thing_two = itertools.tee(thing)
for x in thing_one:
for y in thing_two:
print(x, y, x==y)
But I got the same output as before.
The real-world object this represents is a model of a directory and file structure with varying numbers of files and subdirectories, at varying depths into the tree. It has nested links to thousands of members and iterates correctly over them once, just like this example. But it also does expensive processing within its many internal objects on-the-fly as needed for comparisons, which would end up doubling the workload if I had to make a complete copy of it prior to iterating. I would really like to use multiple iterators, pointing into a single object with all the data, if possible.
Edit on answers: The critical flaw in the question code, pointed out in all answers, is the single internal self.idx variable being unable to handle multiple callers independently. The accepted answer is the best for my real class (oversimplified in this reproducible example), another answer presents a simple, elegant solution for simpler data structures like the list presented here.
It's actually impossible to make a container class that is it's own iterator. The container shouldn't know about the state of the iterator and the iterator doesn't need to know the contents of the container, it just needs to know which object is the corresponding container and "where" it is. If you mix iterator and container different iterators will share state with each other (in your case the self.idx) which will not give the correct results (they read and modify the same variable).
That's the reason why all built-in types have a seperate iterator class (and even some have an reverse-iterator class):
>>> l = [1, 2, 3]
>>> iter(l)
<list_iterator at 0x15e360c86d8>
>>> reversed(l)
<list_reverseiterator at 0x15e360a5940>
>>> t = (1, 2, 3)
>>> iter(t)
<tuple_iterator at 0x15e363fb320>
>>> s = '123'
>>> iter(s)
<str_iterator at 0x15e363fb438>
So, basically you could just return iter(self.stuff) in __iter__ and drop the __next__ altogether because list_iterator knows how to iterate over the list:
class C:
def __init__(self):
self.stuff = ["A","B","C","D"]
def __iter__(self):
return iter(self.stuff)
thing = C()
for x in thing:
for y in thing:
print(x, y, x==y)
prints 16 lines, like expected.
If your goal is to make your own iterator class, you need two classes (or 3 if you want to implement the reversed-iterator yourself).
class C:
def __init__(self):
self.stuff = ["A","B","C","D"]
def __iter__(self):
return C_iterator(self)
def __reversed__(self):
return C_reversed_iterator(self)
class C_iterator:
def __init__(self, parent):
self.idx = 0
self.parent = parent
def __iter__(self):
return self
def __next__(self):
self.idx += 1
if self.idx > len(self.parent.stuff):
raise StopIteration
else:
return self.parent.stuff[self.idx - 1]
thing = C()
for x in thing:
for y in thing:
print(x, y, x==y)
works as well.
For completeness, here's one possible implementation of the reversed-iterator:
class C_reversed_iterator:
def __init__(self, parent):
self.parent = parent
self.idx = len(parent.stuff) + 1
def __iter__(self):
return self
def __next__(self):
self.idx -= 1
if self.idx <= 0:
raise StopIteration
else:
return self.parent.stuff[self.idx - 1]
thing = C()
for x in reversed(thing):
for y in reversed(thing):
print(x, y, x==y)
Instead of defining your own iterators you could use generators. One way was already shown in the other answer:
class C:
def __init__(self):
self.stuff = ["A","B","C","D"]
def __iter__(self):
yield from self.stuff
def __reversed__(self):
yield from self.stuff[::-1]
or explicitly delegate to a generator function (that's actually equivalent to the above but maybe more clear that it's a new object that is produced):
def C_iterator(obj):
for item in obj.stuff:
yield item
def C_reverse_iterator(obj):
for item in obj.stuff[::-1]:
yield item
class C:
def __init__(self):
self.stuff = ["A","B","C","D"]
def __iter__(self):
return C_iterator(self)
def __reversed__(self):
return C_reverse_iterator(self)
Note: You don't have to implement the __reversed__ iterator. That was just meant as additional "feature" of the answer.
Your __iter__ is completely broken. Instead of actually making a fresh iterator on every call, it just resets some state on self and returns self. That means you can't actually have more than one iterator at a time over your object, and any call to __iter__ while another loop over the object is active will interfere with the existing loop.
You need to actually make a new object. The simplest way to do that is to use yield syntax to write a generator function. The generator function will automatically return a new iterator object every time:
class C(object):
def __init__(self):
self.stuff = ['A', 'B', 'C', 'D']
def __iter__(self):
for thing in self.stuff:
yield thing
my code run wrong
class a(object):
def __iter(self):
return 33
b={'a':'aaa','b':'bbb'}
c=a()
print b.itervalues()
print c.itervalues()
Please try to use the code, rather than text, because my English is not very good, thank you
a. Spell it right: not
def __iter(self):
but:
def __iter__(self):
with __ before and after iter.
b. Make the body right: not
return 33
but:
yield 33
or
return iter([33])
If you return a value from __iter__, return an iterator (an iterable, as in return [33], is almost as good but not quite...); or else, yield 1+ values, making __iter__ into a generator function (so it intrinsically returns a generator iterator).
c. Call it right: not
a().itervalues()
but, e.g.:
for x in a(): print x
or
print list(a())
itervalues is a method of dict, and has nothing to do with __iter__.
If you fix all three (!) mistakes, the code works better;-).
A few things about your code:
__iter should be __iter__
You're returning '33' in the __iter__ function. You should actually be returning an iterator object. An iterator is an object which keeps returning different values when it's next() function is called (maybe a sequence of values like [0,1,2,3 etc]).
Here's a working example of an iterator:
class a(object):
def __init__(self,x=10):
self.x = x
def __iter__(self):
return self
def next(self):
if self.x > 0:
self.x-=1
return self.x
else:
raise StopIteration
c=a()
for x in c:
print x
Any object of class a is an iterator object. Calling the __iter__ function is supposed to return the iterator, so it returns itself – as you can see, the a class has a next() function, so this is an iterator object.
When the next function is called, it keeps return consecutive values until it hits zero, and then it sends the StopIteration exception, which (appropriately) stops the iteration.
If this seems a little hazy, I would suggest experimenting with the code and then checking out the documentation here: http://docs.python.org/library/stdtypes.html
Here is a code example that implements the xrange builtin:
class my_xrange(object):
def __init__(self, start, end, skip=1):
self.curval = int(start)
self.lastval = int(end)
self.skip = int(skip)
assert(int(skip) != 0)
def __iter__(self):
return self
def next(self):
if (self.skip > 0) and (self.curval >= self.lastval):
raise StopIteration()
elif (self.skip < 0) and (self.curval <= self.lastval):
raise StopIteration()
else:
oldval = self.curval
self.curval += self.skip
return oldval
for i in my_xrange(0, 10):
print i
You are using this language feature incorrectly.
http://docs.python.org/library/stdtypes.html#iterator-types
This above link will explain what the function should be used for.
You can try to see documentation in your native language here: http://wiki.python.org/moin/Languages