Related
How would one create an iterative function (or iterator object) in python?
Iterator objects in python conform to the iterator protocol, which basically means they provide two methods: __iter__() and __next__().
The __iter__ returns the iterator object and is implicitly called
at the start of loops.
The __next__() method returns the next value and is implicitly called at each loop increment. This method raises a StopIteration exception when there are no more value to return, which is implicitly captured by looping constructs to stop iterating.
Here's a simple example of a counter:
class Counter:
def __init__(self, low, high):
self.current = low - 1
self.high = high
def __iter__(self):
return self
def __next__(self): # Python 2: def next(self)
self.current += 1
if self.current < self.high:
return self.current
raise StopIteration
for c in Counter(3, 9):
print(c)
This will print:
3
4
5
6
7
8
This is easier to write using a generator, as covered in a previous answer:
def counter(low, high):
current = low
while current < high:
yield current
current += 1
for c in counter(3, 9):
print(c)
The printed output will be the same. Under the hood, the generator object supports the iterator protocol and does something roughly similar to the class Counter.
David Mertz's article, Iterators and Simple Generators, is a pretty good introduction.
There are four ways to build an iterative function:
create a generator (uses the yield keyword)
use a generator expression (genexp)
create an iterator (defines __iter__ and __next__ (or next in Python 2.x))
create a class that Python can iterate over on its own (defines __getitem__)
Examples:
# generator
def uc_gen(text):
for char in text.upper():
yield char
# generator expression
def uc_genexp(text):
return (char for char in text.upper())
# iterator protocol
class uc_iter():
def __init__(self, text):
self.text = text.upper()
self.index = 0
def __iter__(self):
return self
def __next__(self):
try:
result = self.text[self.index]
except IndexError:
raise StopIteration
self.index += 1
return result
# getitem method
class uc_getitem():
def __init__(self, text):
self.text = text.upper()
def __getitem__(self, index):
return self.text[index]
To see all four methods in action:
for iterator in uc_gen, uc_genexp, uc_iter, uc_getitem:
for ch in iterator('abcde'):
print(ch, end=' ')
print()
Which results in:
A B C D E
A B C D E
A B C D E
A B C D E
Note:
The two generator types (uc_gen and uc_genexp) cannot be reversed(); the plain iterator (uc_iter) would need the __reversed__ magic method (which, according to the docs, must return a new iterator, but returning self works (at least in CPython)); and the getitem iteratable (uc_getitem) must have the __len__ magic method:
# for uc_iter we add __reversed__ and update __next__
def __reversed__(self):
self.index = -1
return self
def __next__(self):
try:
result = self.text[self.index]
except IndexError:
raise StopIteration
self.index += -1 if self.index < 0 else +1
return result
# for uc_getitem
def __len__(self)
return len(self.text)
To answer Colonel Panic's secondary question about an infinite lazily evaluated iterator, here are those examples, using each of the four methods above:
# generator
def even_gen():
result = 0
while True:
yield result
result += 2
# generator expression
def even_genexp():
return (num for num in even_gen()) # or even_iter or even_getitem
# not much value under these circumstances
# iterator protocol
class even_iter():
def __init__(self):
self.value = 0
def __iter__(self):
return self
def __next__(self):
next_value = self.value
self.value += 2
return next_value
# getitem method
class even_getitem():
def __getitem__(self, index):
return index * 2
import random
for iterator in even_gen, even_genexp, even_iter, even_getitem:
limit = random.randint(15, 30)
count = 0
for even in iterator():
print even,
count += 1
if count >= limit:
break
print
Which results in (at least for my sample run):
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
How to choose which one to use? This is mostly a matter of taste. The two methods I see most often are generators and the iterator protocol, as well as a hybrid (__iter__ returning a generator).
Generator expressions are useful for replacing list comprehensions (they are lazy and so can save on resources).
If one needs compatibility with earlier Python 2.x versions use __getitem__.
I see some of you doing return self in __iter__. I just wanted to note that __iter__ itself can be a generator (thus removing the need for __next__ and raising StopIteration exceptions)
class range:
def __init__(self,a,b):
self.a = a
self.b = b
def __iter__(self):
i = self.a
while i < self.b:
yield i
i+=1
Of course here one might as well directly make a generator, but for more complex classes it can be useful.
First of all the itertools module is incredibly useful for all sorts of cases in which an iterator would be useful, but here is all you need to create an iterator in python:
yield
Isn't that cool? Yield can be used to replace a normal return in a function. It returns the object just the same, but instead of destroying state and exiting, it saves state for when you want to execute the next iteration. Here is an example of it in action pulled directly from the itertools function list:
def count(n=0):
while True:
yield n
n += 1
As stated in the functions description (it's the count() function from the itertools module...) , it produces an iterator that returns consecutive integers starting with n.
Generator expressions are a whole other can of worms (awesome worms!). They may be used in place of a List Comprehension to save memory (list comprehensions create a list in memory that is destroyed after use if not assigned to a variable, but generator expressions can create a Generator Object... which is a fancy way of saying Iterator). Here is an example of a generator expression definition:
gen = (n for n in xrange(0,11))
This is very similar to our iterator definition above except the full range is predetermined to be between 0 and 10.
I just found xrange() (suprised I hadn't seen it before...) and added it to the above example. xrange() is an iterable version of range() which has the advantage of not prebuilding the list. It would be very useful if you had a giant corpus of data to iterate over and only had so much memory to do it in.
This question is about iterable objects, not about iterators. In Python, sequences are iterable too so one way to make an iterable class is to make it behave like a sequence, i.e. give it __getitem__ and __len__ methods. I have tested this on Python 2 and 3.
class CustomRange:
def __init__(self, low, high):
self.low = low
self.high = high
def __getitem__(self, item):
if item >= len(self):
raise IndexError("CustomRange index out of range")
return self.low + item
def __len__(self):
return self.high - self.low
cr = CustomRange(0, 10)
for i in cr:
print(i)
If you looking for something short and simple, maybe it will be enough for you:
class A(object):
def __init__(self, l):
self.data = l
def __iter__(self):
return iter(self.data)
example of usage:
In [3]: a = A([2,3,4])
In [4]: [i for i in a]
Out[4]: [2, 3, 4]
All answers on this page are really great for a complex object. But for those containing builtin iterable types as attributes, like str, list, set or dict, or any implementation of collections.Iterable, you can omit certain things in your class.
class Test(object):
def __init__(self, string):
self.string = string
def __iter__(self):
# since your string is already iterable
return (ch for ch in self.string)
# or simply
return self.string.__iter__()
# also
return iter(self.string)
It can be used like:
for x in Test("abcde"):
print(x)
# prints
# a
# b
# c
# d
# e
Include the following code in your class code.
def __iter__(self):
for x in self.iterable:
yield x
Make sure that you replace self.iterablewith the iterable which you iterate through.
Here's an example code
class someClass:
def __init__(self,list):
self.list = list
def __iter__(self):
for x in self.list:
yield x
var = someClass([1,2,3,4,5])
for num in var:
print(num)
Output
1
2
3
4
5
Note: Since strings are also iterable, they can also be used as an argument for the class
foo = someClass("Python")
for x in foo:
print(x)
Output
P
y
t
h
o
n
This is an iterable function without yield. It make use of the iter function and a closure which keeps it's state in a mutable (list) in the enclosing scope for python 2.
def count(low, high):
counter = [0]
def tmp():
val = low + counter[0]
if val < high:
counter[0] += 1
return val
return None
return iter(tmp, None)
For Python 3, closure state is kept in an immutable in the enclosing scope and nonlocal is used in local scope to update the state variable.
def count(low, high):
counter = 0
def tmp():
nonlocal counter
val = low + counter
if val < high:
counter += 1
return val
return None
return iter(tmp, None)
Test;
for i in count(1,10):
print(i)
1
2
3
4
5
6
7
8
9
class uc_iter():
def __init__(self):
self.value = 0
def __iter__(self):
return self
def __next__(self):
next_value = self.value
self.value += 2
return next_value
Improving previous answer, one of the advantage of using class is that you can add __call__ to return self.value or even next_value.
class uc_iter():
def __init__(self):
self.value = 0
def __iter__(self):
return self
def __next__(self):
next_value = self.value
self.value += 2
return next_value
def __call__(self):
next_value = self.value
self.value += 2
return next_value
c = uc_iter()
print([c() for _ in range(10)])
print([next(c) for _ in range(5)])
# [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
# [20, 22, 24, 26, 28]
Other example of a class based on Python Random that can be both called and iterated could be seen on my implementation here
I can't figure out how to look ahead one element in a Python generator. As soon as I look it's gone.
Here is what I mean:
gen = iter([1,2,3])
next_value = gen.next() # okay, I looked forward and see that next_value = 1
# but now:
list(gen) # is [2, 3] -- the first value is gone!
Here is a more real example:
gen = element_generator()
if gen.next_value() == 'STOP':
quit_application()
else:
process(gen.next())
Can anyone help me write a generator that you can look one element forward?
See also: Resetting generator object in Python
For sake of completeness, the more-itertools package (which should probably be part of any Python programmer's toolbox) includes a peekable wrapper that implements this behavior. As the code example in the documentation shows:
>>> p = peekable(['a', 'b'])
>>> p.peek()
'a'
>>> next(p)
'a'
However, it's often possible to rewrite code that would use this functionality so that it doesn't actually need it. For example, your realistic code sample from the question could be written like this:
gen = element_generator()
command = gen.next_value()
if command == 'STOP':
quit_application()
else:
process(command)
(reader's note: I've preserved the syntax in the example from the question as of when I'm writing this, even though it refers to an outdated version of Python)
The Python generator API is one way: You can't push back elements you've read. But you can create a new iterator using the itertools module and prepend the element:
import itertools
gen = iter([1,2,3])
peek = gen.next()
print list(itertools.chain([peek], gen))
Ok - two years too late - but I came across this question, and did not find any of the answers to my satisfaction. Came up with this meta generator:
class Peekorator(object):
def __init__(self, generator):
self.empty = False
self.peek = None
self.generator = generator
try:
self.peek = self.generator.next()
except StopIteration:
self.empty = True
def __iter__(self):
return self
def next(self):
"""
Return the self.peek element, or raise StopIteration
if empty
"""
if self.empty:
raise StopIteration()
to_return = self.peek
try:
self.peek = self.generator.next()
except StopIteration:
self.peek = None
self.empty = True
return to_return
def simple_iterator():
for x in range(10):
yield x*3
pkr = Peekorator(simple_iterator())
for i in pkr:
print i, pkr.peek, pkr.empty
results in:
0 3 False
3 6 False
6 9 False
9 12 False
...
24 27 False
27 None False
i.e. you have at any moment during iteration access to the next item in the list.
Using itertools.tee will produce a lightweight copy of the generator; then peeking ahead at one copy will not affect the second copy. Thus:
import itertools
def process(seq):
peeker, items = itertools.tee(seq)
# initial peek ahead
# so that peeker is one ahead of items
if next(peeker) == 'STOP':
return
for item in items:
# peek ahead
if next(peeker) == "STOP":
return
# process items
print(item)
The items generator is unaffected by modifications to peeker. However, modifying seq after the call to tee may cause problems.
That said: any algorithm that requires looking an item ahead in a generator could instead be written to use the current generator item and the previous item. This will result in simpler code - see my other answer to this question.
An iterator that allows peeking at the next element and also further ahead. It reads ahead as needed and remembers the values in a deque.
from collections import deque
class PeekIterator:
def __init__(self, iterable):
self.iterator = iter(iterable)
self.peeked = deque()
def __iter__(self):
return self
def __next__(self):
if self.peeked:
return self.peeked.popleft()
return next(self.iterator)
def peek(self, ahead=0):
while len(self.peeked) <= ahead:
self.peeked.append(next(self.iterator))
return self.peeked[ahead]
Demo:
>>> it = PeekIterator(range(10))
>>> it.peek()
0
>>> it.peek(5)
5
>>> it.peek(13)
Traceback (most recent call last):
File "<pyshell#68>", line 1, in <module>
it.peek(13)
File "[...]", line 15, in peek
self.peeked.append(next(self.iterator))
StopIteration
>>> it.peek(2)
2
>>> next(it)
0
>>> it.peek(2)
3
>>> list(it)
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>>
>>> gen = iter(range(10))
>>> peek = next(gen)
>>> peek
0
>>> gen = (value for g in ([peek], gen) for value in g)
>>> list(gen)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Just for fun, I created an implementation of a lookahead class based on the suggestion by
Aaron:
import itertools
class lookahead_chain(object):
def __init__(self, it):
self._it = iter(it)
def __iter__(self):
return self
def next(self):
return next(self._it)
def peek(self, default=None, _chain=itertools.chain):
it = self._it
try:
v = self._it.next()
self._it = _chain((v,), it)
return v
except StopIteration:
return default
lookahead = lookahead_chain
With this, the following will work:
>>> t = lookahead(xrange(8))
>>> list(itertools.islice(t, 3))
[0, 1, 2]
>>> t.peek()
3
>>> list(itertools.islice(t, 3))
[3, 4, 5]
With this implementation it is a bad idea to call peek many times in a row...
While looking at the CPython source code I just found a better way which is both shorter and more efficient:
class lookahead_tee(object):
def __init__(self, it):
self._it, = itertools.tee(it, 1)
def __iter__(self):
return self._it
def peek(self, default=None):
try:
return self._it.__copy__().next()
except StopIteration:
return default
lookahead = lookahead_tee
Usage is the same as above but you won't pay a price here to use peek many times in a row. With a few more lines you can also look ahead more than one item in the iterator (up to available RAM).
A simple solution is to use a function like this:
def peek(it):
first = next(it)
return first, itertools.chain([first], it)
Then you can do:
>>> it = iter(range(10))
>>> x, it = peek(it)
>>> x
0
>>> next(it)
0
>>> next(it)
1
If anybody is interested, and please correct me if I am wrong, but I believe it is pretty easy to add some push back functionality to any iterator.
class Back_pushable_iterator:
"""Class whose constructor takes an iterator as its only parameter, and
returns an iterator that behaves in the same way, with added push back
functionality.
The idea is to be able to push back elements that need to be retrieved once
more with the iterator semantics. This is particularly useful to implement
LL(k) parsers that need k tokens of lookahead. Lookahead or push back is
really a matter of perspective. The pushing back strategy allows a clean
parser implementation based on recursive parser functions.
The invoker of this class takes care of storing the elements that should be
pushed back. A consequence of this is that any elements can be "pushed
back", even elements that have never been retrieved from the iterator.
The elements that are pushed back are then retrieved through the iterator
interface in a LIFO-manner (as should logically be expected).
This class works for any iterator but is especially meaningful for a
generator iterator, which offers no obvious push back ability.
In the LL(k) case mentioned above, the tokenizer can be implemented by a
standard generator function (clean and simple), that is completed by this
class for the needs of the actual parser.
"""
def __init__(self, iterator):
self.iterator = iterator
self.pushed_back = []
def __iter__(self):
return self
def __next__(self):
if self.pushed_back:
return self.pushed_back.pop()
else:
return next(self.iterator)
def push_back(self, element):
self.pushed_back.append(element)
it = Back_pushable_iterator(x for x in range(10))
x = next(it) # 0
print(x)
it.push_back(x)
x = next(it) # 0
print(x)
x = next(it) # 1
print(x)
x = next(it) # 2
y = next(it) # 3
print(x)
print(y)
it.push_back(y)
it.push_back(x)
x = next(it) # 2
y = next(it) # 3
print(x)
print(y)
for x in it:
print(x) # 4-9
This will work -- it buffers an item and calls a function with each item and the next item in the sequence.
Your requirements are murky on what happens at the end of the sequence. What does "look ahead" mean when you're at the last one?
def process_with_lookahead( iterable, aFunction ):
prev= iterable.next()
for item in iterable:
aFunction( prev, item )
prev= item
aFunction( item, None )
def someLookaheadFunction( item, next_item ):
print item, next_item
Instead of using items (i, i+1), where 'i' is the current item and i+1 is the 'peek ahead' version, you should be using (i-1, i), where 'i-1' is the previous version from the generator.
Tweaking your algorithm this way will produce something that is identical to what you currently have, apart from the extra needless complexity of trying to 'peek ahead'.
Peeking ahead is a mistake, and you should not be doing it.
Although itertools.chain() is the natural tool for the job here, beware of loops like this:
for elem in gen:
...
peek = next(gen)
gen = itertools.chain([peek], gen)
...Because this will consume a linearly growing amount of memory, and eventually grind to a halt. (This code essentially seems to create a linked list, one node per chain() call.) I know this not because I inspected the libs but because this just resulted in a major slowdown of my program - getting rid of the gen = itertools.chain([peek], gen) line sped it up again. (Python 3.3)
Python3 snippet for #jonathan-hartley answer:
def peek(iterator, eoi=None):
iterator = iter(iterator)
try:
prev = next(iterator)
except StopIteration:
return iterator
for elm in iterator:
yield prev, elm
prev = elm
yield prev, eoi
for curr, nxt in peek(range(10)):
print((curr, nxt))
# (0, 1)
# (1, 2)
# (2, 3)
# (3, 4)
# (4, 5)
# (5, 6)
# (6, 7)
# (7, 8)
# (8, 9)
# (9, None)
It'd be straightforward to create a class that does this on __iter__ and yields just the prev item and put the elm in some attribute.
w.r.t #David Z's post, the newer seekable tool can reset a wrapped iterator to a prior position.
>>> s = mit.seekable(range(3))
>>> s.next()
# 0
>>> s.seek(0) # reset iterator
>>> s.next()
# 0
>>> s.next()
# 1
>>> s.seek(1)
>>> s.next()
# 1
>>> next(s)
# 2
cytoolz has a peek function.
>> from cytoolz import peek
>> gen = iter([1,2,3])
>> first, continuation = peek(gen)
>> first
1
>> list(continuation)
[1, 2, 3]
In my case, I need a generator where I could queue back to generator the data I have just got via next() call.
The way I handle this problem, is to create a queue. In the implementation of the generator, I would first check the queue: if queue is not empty, the "yield" will return the values in queue, or otherwise the values in normal way.
import queue
def gen1(n, q):
i = 0
while True:
if not q.empty():
yield q.get()
else:
yield i
i = i + 1
if i >= n:
if not q.empty():
yield q.get()
break
q = queue.Queue()
f = gen1(2, q)
i = next(f)
print(i)
i = next(f)
print(i)
q.put(i) # put back the value I have just got for following 'next' call
i = next(f)
print(i)
running
python3 gen_test.py
0
1
1
This concept is very useful when I was writing a parser, which needs to look the file line by line, if the line appears to belong to next phase of parsing, I could just queue back to the generator so that the next phase of code could parse it correctly without handling complex state.
For those of you who embrace frugality and one-liners, I present to you a one-liner that allows one to look ahead in an iterable (this only works in Python 3.8 and above):
>>> import itertools as it
>>> peek = lambda iterable, n=1: it.islice(zip(it.chain((t := it.tee(iterable))[0], [None] * n), it.chain([None] * n, t[1])), n, None)
>>> for lookahead, element in peek(range(10)):
... print(lookahead, element)
1 0
2 1
3 2
4 3
5 4
6 5
7 6
8 7
9 8
None 9
>>> for lookahead, element in peek(range(10), 2):
... print(lookahead, element)
2 0
3 1
4 2
5 3
6 4
7 5
8 6
9 7
None 8
None 9
This method is space-efficient by avoiding copying the iterator multiple times. It is also fast due to how it lazily generates elements. Finally, as a cherry on top, you can look ahead an arbitrary number of elements.
An algorithm that works by "peeking" at the next element in a generator could equivalently be one that works by remembering the previous element, treating that element as the one to operate upon, and treating the "current" element as simply "peeked at".
Either way, what is really happening is that the algorithm considers overlapping pairs from the generator. The itertools.tee recipe will work fine - and it is not hard to see that it is essentially a refactored version of Jonathan Hartley's approach:
from itertools import tee
# From https://docs.python.org/3/library/itertools.html#itertools.pairwise
# In 3.10 and up, this is directly supplied by the `itertools` module.
def pairwise(iterable):
# pairwise('ABCDEFG') --> AB BC CD DE EF FG
a, b = tee(iterable)
next(b, None)
return zip(a, b)
def process(seq):
for to_process, lookahead in pairwise(seq):
# peek ahead
if lookahead == "STOP":
return
# process items
print(to_process)
Using iter in our for loop makes programming an efficient one in python. How does it actually works?
Tried to visualize iter(iterables) in "http://www.pythontutor.com/visualize.html#mode=display". Here iter helps to create an instance.
Doesn't it actually refer to internal numerical objects?
val = [1,2,3,4,5]
val = iter(val)
for item in val:
print(item)
val = [1,2,3,4,5]
for item in val:
print(item)
Both returns same output. But how iter identifies the values?
What you're doing is redundant. A for loop is essentially syntactic sugar for:
val = [1,2,3,4,5]
iterator = iter(val)
while True:
try:
item = next(iterator)
except StopIteration:
break
print(item)
All that you're doing by calling iter() on your val is replacing
iterator = iter(val)
with
iterator = iter(iter(val))
Calling iter() on an iterator is a no-op; returning the same iterator.
Lets pretend that Python lists were not iterable and we wanted to create a class, MyList, that could be constructed with a list instance and iterate it. MyList would need to implement a __iter__ method that would return an iterator object that implements the __next__ method. Each successive call to this __next__ method should return the next element of the list. When there are no more elements to be returned, it needs to raise a StopIteration exception. This iterator object clearly needs two pieces of information:
A reference to the list being iterated.
The index of the last element that was output.
Let's call this iterator class MyListIterator. Its implementation might be:
class MyListIterator:
def __init__(self, l):
self.l = l # the list being iterated
self.index = -1 # the index of the last element outputted
def __next__(self):
self.index += 1 # next index to output
if self.index < len(self.l):
return self.l[self.index]
raise StopIteration
Class MyList then would be:
class MyList:
def __init__(self, the_list):
self.l = the_list # we are constructed with a list
def __iter__(self):
return MyListIterator(self.l) # pass to the iterator our list
An example of use:
l = MyList([0, 1, 2])
for i in l:
for j in l:
print (i, j)
Prints:
0 0
0 1
0 2
1 0
1 1
1 2
2 0
2 1
2 2
Say we wish to process an iterator and want to handle it by chunks.
The logic per chunk depends on previously-calculated chunks, so groupby() does not help.
Our friend in this case is itertools.takewhile():
while True:
chunk = itertools.takewhile(getNewChunkLogic(), myIterator)
process(chunk)
The problem is that takewhile() needs to go past the last element that meets the new chunk logic, thus 'eating' the first element for the next chunk.
There are various solutions to that, including wrapping or à la C's ungetc(), etc..
My question is: is there an elegant solution?
takewhile() indeed needs to look at the next element to determine when to toggle behaviour.
You could use a wrapper that tracks the last seen element, and that can be 'reset' to back up one element:
_sentinel = object()
class OneStepBuffered(object):
def __init__(self, it):
self._it = iter(it)
self._last = _sentinel
self._next = _sentinel
def __iter__(self):
return self
def __next__(self):
if self._next is not _sentinel:
next_val, self._next = self._next, _sentinel
return next_val
try:
self._last = next(self._it)
return self._last
except StopIteration:
self._last = self._next = _sentinel
raise
next = __next__ # Python 2 compatibility
def step_back(self):
if self._last is _sentinel:
raise ValueError("Can't back up a step")
self._next, self._last = self._last, _sentinel
Wrap your iterator in this one before using it with takewhile():
myIterator = OneStepBuffered(myIterator)
while True:
chunk = itertools.takewhile(getNewChunkLogic(), myIterator)
process(chunk)
myIterator.step_back()
Demo:
>>> from itertools import takewhile
>>> test_list = range(10)
>>> iterator = OneStepBuffered(test_list)
>>> list(takewhile(lambda i: i < 5, iterator))
[0, 1, 2, 3, 4]
>>> iterator.step_back()
>>> list(iterator)
[5, 6, 7, 8, 9]
I had the same problem. You might wish to use itertools.tee or itertools.pairwise (new in Python 3.10) to deal with this, but I didn't think those solutions were very elegant.
The best I found is to just rewrite takewhile. Based heavily on the documentation:
def takewhile_inclusive(predicate, it):
for x in it:
if predicate(x):
yield x
else:
yield x
break
In your loop you can elegantly treat the final element separately using unpacking:
*chunk,lastPiece = takewhile_inclusive(getNewChunkLogic(), myIterator)
You can then chain the last piece:
lastPiece = None
while True:
*chunk,lastPiece = takewhile_inclusive(getNewChunkLogic(), myIterator)
if lastPiece is not None:
myIterator = itertools.chain([lastPiece], myIterator))
Given the callable GetNewChunkLogic() will report True along first chunk and False afterward.
The following snippet
solves the 'additional next step' problem of takewhile .
is elegant because you don't have to implement the back-one-step logic .
def partition(pred, iterable):
'Use a predicate to partition entries into true entries and false entries'
# partition(is_odd, range(10)) --> 1 3 5 7 9 and 0 2 4 6 8
t1, t2 = tee(iterable)
return filter(pred, t1), filterfalse(pred, t2)
while True:
head, tail = partition(GetNewChunkLogic(), myIterator)
process(head)
myIterator = tail
However, the most elegant way is to modify your GetNewChunkLogic into a generator and remove the while loop.
Here's another way you can do it. Yield a value (sentinel) when the predicate fails, but before yielding the value itself. Then group by values which aren't the sentinel.
Here, group_by_predicate requires a function that returns a predicate (pred_gen). This is recreated every time the predicate fails:
from itertools import groupby
def group_by_predicate(predicate_gen, _iter):
sentinel = object()
def _group_with_sentinel():
pred = predicate_gen()
for n in _iter:
while not pred(n):
yield sentinel
pred = predicate_gen()
yield n
g = _group_with_sentinel()
for k, g in groupby(g, lambda s: s!=sentinel):
if k:
yield g
This can then be used like:
def less_than_gen(maxn):
"""Return a predicate that returns true while the sum of inputs is < maxn"""
def pred(i):
pred.count += i
return pred.count < maxn
pred.count = 0
return pred
data = iter(list(range(9)) * 3)
for g in group_by_predicate(lambda: less_than_gen(15), data):
print(list(g))
Which outputs groups of numbers whose sum is all less than 15:
[0, 1, 2, 3, 4]
[5, 6]
[7]
[8, 0, 1, 2, 3]
[4, 5]
[6, 7]
[8, 0, 1, 2, 3]
[4, 5]
[6, 7]
[8]
I can't figure out how to look ahead one element in a Python generator. As soon as I look it's gone.
Here is what I mean:
gen = iter([1,2,3])
next_value = gen.next() # okay, I looked forward and see that next_value = 1
# but now:
list(gen) # is [2, 3] -- the first value is gone!
Here is a more real example:
gen = element_generator()
if gen.next_value() == 'STOP':
quit_application()
else:
process(gen.next())
Can anyone help me write a generator that you can look one element forward?
See also: Resetting generator object in Python
For sake of completeness, the more-itertools package (which should probably be part of any Python programmer's toolbox) includes a peekable wrapper that implements this behavior. As the code example in the documentation shows:
>>> p = peekable(['a', 'b'])
>>> p.peek()
'a'
>>> next(p)
'a'
However, it's often possible to rewrite code that would use this functionality so that it doesn't actually need it. For example, your realistic code sample from the question could be written like this:
gen = element_generator()
command = gen.next_value()
if command == 'STOP':
quit_application()
else:
process(command)
(reader's note: I've preserved the syntax in the example from the question as of when I'm writing this, even though it refers to an outdated version of Python)
The Python generator API is one way: You can't push back elements you've read. But you can create a new iterator using the itertools module and prepend the element:
import itertools
gen = iter([1,2,3])
peek = gen.next()
print list(itertools.chain([peek], gen))
Ok - two years too late - but I came across this question, and did not find any of the answers to my satisfaction. Came up with this meta generator:
class Peekorator(object):
def __init__(self, generator):
self.empty = False
self.peek = None
self.generator = generator
try:
self.peek = self.generator.next()
except StopIteration:
self.empty = True
def __iter__(self):
return self
def next(self):
"""
Return the self.peek element, or raise StopIteration
if empty
"""
if self.empty:
raise StopIteration()
to_return = self.peek
try:
self.peek = self.generator.next()
except StopIteration:
self.peek = None
self.empty = True
return to_return
def simple_iterator():
for x in range(10):
yield x*3
pkr = Peekorator(simple_iterator())
for i in pkr:
print i, pkr.peek, pkr.empty
results in:
0 3 False
3 6 False
6 9 False
9 12 False
...
24 27 False
27 None False
i.e. you have at any moment during iteration access to the next item in the list.
Using itertools.tee will produce a lightweight copy of the generator; then peeking ahead at one copy will not affect the second copy. Thus:
import itertools
def process(seq):
peeker, items = itertools.tee(seq)
# initial peek ahead
# so that peeker is one ahead of items
if next(peeker) == 'STOP':
return
for item in items:
# peek ahead
if next(peeker) == "STOP":
return
# process items
print(item)
The items generator is unaffected by modifications to peeker. However, modifying seq after the call to tee may cause problems.
That said: any algorithm that requires looking an item ahead in a generator could instead be written to use the current generator item and the previous item. This will result in simpler code - see my other answer to this question.
An iterator that allows peeking at the next element and also further ahead. It reads ahead as needed and remembers the values in a deque.
from collections import deque
class PeekIterator:
def __init__(self, iterable):
self.iterator = iter(iterable)
self.peeked = deque()
def __iter__(self):
return self
def __next__(self):
if self.peeked:
return self.peeked.popleft()
return next(self.iterator)
def peek(self, ahead=0):
while len(self.peeked) <= ahead:
self.peeked.append(next(self.iterator))
return self.peeked[ahead]
Demo:
>>> it = PeekIterator(range(10))
>>> it.peek()
0
>>> it.peek(5)
5
>>> it.peek(13)
Traceback (most recent call last):
File "<pyshell#68>", line 1, in <module>
it.peek(13)
File "[...]", line 15, in peek
self.peeked.append(next(self.iterator))
StopIteration
>>> it.peek(2)
2
>>> next(it)
0
>>> it.peek(2)
3
>>> list(it)
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>>
>>> gen = iter(range(10))
>>> peek = next(gen)
>>> peek
0
>>> gen = (value for g in ([peek], gen) for value in g)
>>> list(gen)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Just for fun, I created an implementation of a lookahead class based on the suggestion by
Aaron:
import itertools
class lookahead_chain(object):
def __init__(self, it):
self._it = iter(it)
def __iter__(self):
return self
def next(self):
return next(self._it)
def peek(self, default=None, _chain=itertools.chain):
it = self._it
try:
v = self._it.next()
self._it = _chain((v,), it)
return v
except StopIteration:
return default
lookahead = lookahead_chain
With this, the following will work:
>>> t = lookahead(xrange(8))
>>> list(itertools.islice(t, 3))
[0, 1, 2]
>>> t.peek()
3
>>> list(itertools.islice(t, 3))
[3, 4, 5]
With this implementation it is a bad idea to call peek many times in a row...
While looking at the CPython source code I just found a better way which is both shorter and more efficient:
class lookahead_tee(object):
def __init__(self, it):
self._it, = itertools.tee(it, 1)
def __iter__(self):
return self._it
def peek(self, default=None):
try:
return self._it.__copy__().next()
except StopIteration:
return default
lookahead = lookahead_tee
Usage is the same as above but you won't pay a price here to use peek many times in a row. With a few more lines you can also look ahead more than one item in the iterator (up to available RAM).
A simple solution is to use a function like this:
def peek(it):
first = next(it)
return first, itertools.chain([first], it)
Then you can do:
>>> it = iter(range(10))
>>> x, it = peek(it)
>>> x
0
>>> next(it)
0
>>> next(it)
1
If anybody is interested, and please correct me if I am wrong, but I believe it is pretty easy to add some push back functionality to any iterator.
class Back_pushable_iterator:
"""Class whose constructor takes an iterator as its only parameter, and
returns an iterator that behaves in the same way, with added push back
functionality.
The idea is to be able to push back elements that need to be retrieved once
more with the iterator semantics. This is particularly useful to implement
LL(k) parsers that need k tokens of lookahead. Lookahead or push back is
really a matter of perspective. The pushing back strategy allows a clean
parser implementation based on recursive parser functions.
The invoker of this class takes care of storing the elements that should be
pushed back. A consequence of this is that any elements can be "pushed
back", even elements that have never been retrieved from the iterator.
The elements that are pushed back are then retrieved through the iterator
interface in a LIFO-manner (as should logically be expected).
This class works for any iterator but is especially meaningful for a
generator iterator, which offers no obvious push back ability.
In the LL(k) case mentioned above, the tokenizer can be implemented by a
standard generator function (clean and simple), that is completed by this
class for the needs of the actual parser.
"""
def __init__(self, iterator):
self.iterator = iterator
self.pushed_back = []
def __iter__(self):
return self
def __next__(self):
if self.pushed_back:
return self.pushed_back.pop()
else:
return next(self.iterator)
def push_back(self, element):
self.pushed_back.append(element)
it = Back_pushable_iterator(x for x in range(10))
x = next(it) # 0
print(x)
it.push_back(x)
x = next(it) # 0
print(x)
x = next(it) # 1
print(x)
x = next(it) # 2
y = next(it) # 3
print(x)
print(y)
it.push_back(y)
it.push_back(x)
x = next(it) # 2
y = next(it) # 3
print(x)
print(y)
for x in it:
print(x) # 4-9
This will work -- it buffers an item and calls a function with each item and the next item in the sequence.
Your requirements are murky on what happens at the end of the sequence. What does "look ahead" mean when you're at the last one?
def process_with_lookahead( iterable, aFunction ):
prev= iterable.next()
for item in iterable:
aFunction( prev, item )
prev= item
aFunction( item, None )
def someLookaheadFunction( item, next_item ):
print item, next_item
Instead of using items (i, i+1), where 'i' is the current item and i+1 is the 'peek ahead' version, you should be using (i-1, i), where 'i-1' is the previous version from the generator.
Tweaking your algorithm this way will produce something that is identical to what you currently have, apart from the extra needless complexity of trying to 'peek ahead'.
Peeking ahead is a mistake, and you should not be doing it.
Although itertools.chain() is the natural tool for the job here, beware of loops like this:
for elem in gen:
...
peek = next(gen)
gen = itertools.chain([peek], gen)
...Because this will consume a linearly growing amount of memory, and eventually grind to a halt. (This code essentially seems to create a linked list, one node per chain() call.) I know this not because I inspected the libs but because this just resulted in a major slowdown of my program - getting rid of the gen = itertools.chain([peek], gen) line sped it up again. (Python 3.3)
Python3 snippet for #jonathan-hartley answer:
def peek(iterator, eoi=None):
iterator = iter(iterator)
try:
prev = next(iterator)
except StopIteration:
return iterator
for elm in iterator:
yield prev, elm
prev = elm
yield prev, eoi
for curr, nxt in peek(range(10)):
print((curr, nxt))
# (0, 1)
# (1, 2)
# (2, 3)
# (3, 4)
# (4, 5)
# (5, 6)
# (6, 7)
# (7, 8)
# (8, 9)
# (9, None)
It'd be straightforward to create a class that does this on __iter__ and yields just the prev item and put the elm in some attribute.
w.r.t #David Z's post, the newer seekable tool can reset a wrapped iterator to a prior position.
>>> s = mit.seekable(range(3))
>>> s.next()
# 0
>>> s.seek(0) # reset iterator
>>> s.next()
# 0
>>> s.next()
# 1
>>> s.seek(1)
>>> s.next()
# 1
>>> next(s)
# 2
cytoolz has a peek function.
>> from cytoolz import peek
>> gen = iter([1,2,3])
>> first, continuation = peek(gen)
>> first
1
>> list(continuation)
[1, 2, 3]
In my case, I need a generator where I could queue back to generator the data I have just got via next() call.
The way I handle this problem, is to create a queue. In the implementation of the generator, I would first check the queue: if queue is not empty, the "yield" will return the values in queue, or otherwise the values in normal way.
import queue
def gen1(n, q):
i = 0
while True:
if not q.empty():
yield q.get()
else:
yield i
i = i + 1
if i >= n:
if not q.empty():
yield q.get()
break
q = queue.Queue()
f = gen1(2, q)
i = next(f)
print(i)
i = next(f)
print(i)
q.put(i) # put back the value I have just got for following 'next' call
i = next(f)
print(i)
running
python3 gen_test.py
0
1
1
This concept is very useful when I was writing a parser, which needs to look the file line by line, if the line appears to belong to next phase of parsing, I could just queue back to the generator so that the next phase of code could parse it correctly without handling complex state.
For those of you who embrace frugality and one-liners, I present to you a one-liner that allows one to look ahead in an iterable (this only works in Python 3.8 and above):
>>> import itertools as it
>>> peek = lambda iterable, n=1: it.islice(zip(it.chain((t := it.tee(iterable))[0], [None] * n), it.chain([None] * n, t[1])), n, None)
>>> for lookahead, element in peek(range(10)):
... print(lookahead, element)
1 0
2 1
3 2
4 3
5 4
6 5
7 6
8 7
9 8
None 9
>>> for lookahead, element in peek(range(10), 2):
... print(lookahead, element)
2 0
3 1
4 2
5 3
6 4
7 5
8 6
9 7
None 8
None 9
This method is space-efficient by avoiding copying the iterator multiple times. It is also fast due to how it lazily generates elements. Finally, as a cherry on top, you can look ahead an arbitrary number of elements.
An algorithm that works by "peeking" at the next element in a generator could equivalently be one that works by remembering the previous element, treating that element as the one to operate upon, and treating the "current" element as simply "peeked at".
Either way, what is really happening is that the algorithm considers overlapping pairs from the generator. The itertools.tee recipe will work fine - and it is not hard to see that it is essentially a refactored version of Jonathan Hartley's approach:
from itertools import tee
# From https://docs.python.org/3/library/itertools.html#itertools.pairwise
# In 3.10 and up, this is directly supplied by the `itertools` module.
def pairwise(iterable):
# pairwise('ABCDEFG') --> AB BC CD DE EF FG
a, b = tee(iterable)
next(b, None)
return zip(a, b)
def process(seq):
for to_process, lookahead in pairwise(seq):
# peek ahead
if lookahead == "STOP":
return
# process items
print(to_process)