Discarding a tee element

Discarding a tee element - python

I want to "fork" a stream of a large amount of data, in order to look-ahead just a couple of elements.
I was hoping to write something like this:
from itertools import tee
stream = # a generator of a very large data stream
while True:
try:
element= stream.next()
process_element( element )
if some_condition( element ):
stream, fork= tee(stream)
process_fork( fork )
except StopIteration:
break
Reading the documentation for tee, though, I'm left with the impression that the deque of fork will keep growing, even after fork has gone out of scope.
Is this the case? If so, is there a way to tell tee to "discard" the fork? Or is there another more obvious way of doing this?

You could avoid the implementation-dependent behavior #goncalopp mentions by making aTeeclass and giving it adiscard()method:
class Tee(object):
def __init__(self, iterable, n=2):
it = iter(iterable)
self.deques = [collections.deque() for _ in range(n)]
def gen(mydeque):
while True:
if not mydeque: # when the local deque is empty
newval = next(it) # fetch a new value and
for d in self.deques: # load it to all the active deques
d.append(newval)
yield mydeque.popleft()
self.generators = [gen(d) for d in self.deques]
def __call__(self):
return self.generators
def discard(gen):
index = self.generators.index(gen)
del self.deques[index]
del self.generators[index]
Note that since it would now be a class, utilizing it would be slightly different. However when you're done withfork, you could get rid of it by callingtee.discard(fork). Here's an example:
tee = None
while True:
try:
element = stream.next()
process_element(element)
if some_condition(element):
if not tee:
tee = Tee(stream)
stream, fork = tee()
process_fork(fork)
except StopIteration:
break
if tee:
tee.discard(fork)
fork = None

Here's a simple test script:
from itertools import tee
def natural_numbers():
i=0
while True:
yield i
i+=1
stream = natural_numbers() #Don't use xrange, cpython optimizes it away
stream, fork= tee(stream)
del fork
for e in stream:
pass
It seems that, at least in CPython, the process' memory doesn't keep growing. There seems to be a mechanism that detects this situation.
If, however, you substitute tee for the python code the documentation states is equivalent...
def tee(iterable, n=2):
it = iter(iterable)
deques = [collections.deque() for i in range(n)]
def gen(mydeque):
while True:
if not mydeque: # when the local deque is empty
newval = next(it) # fetch a new value and
for d in deques: # load it to all the deques
d.append(newval)
yield mydeque.popleft()
return tuple(gen(d) for d in deques)
...memory does keep growing, as expected.
So, my guess is that this will be implementation-dependent behaviour

Related

Pythonic way to get next iterable value without incrementing iterator [duplicate]

I can't figure out how to look ahead one element in a Python generator. As soon as I look it's gone.
Here is what I mean:
gen = iter([1,2,3])
next_value = gen.next() # okay, I looked forward and see that next_value = 1
# but now:
list(gen) # is [2, 3] -- the first value is gone!
Here is a more real example:
gen = element_generator()
if gen.next_value() == 'STOP':
quit_application()
else:
process(gen.next())
Can anyone help me write a generator that you can look one element forward?
See also: Resetting generator object in Python

For sake of completeness, the more-itertools package (which should probably be part of any Python programmer's toolbox) includes a peekable wrapper that implements this behavior. As the code example in the documentation shows:
>>> p = peekable(['a', 'b'])
>>> p.peek()
'a'
>>> next(p)
'a'
However, it's often possible to rewrite code that would use this functionality so that it doesn't actually need it. For example, your realistic code sample from the question could be written like this:
gen = element_generator()
command = gen.next_value()
if command == 'STOP':
quit_application()
else:
process(command)
(reader's note: I've preserved the syntax in the example from the question as of when I'm writing this, even though it refers to an outdated version of Python)

The Python generator API is one way: You can't push back elements you've read. But you can create a new iterator using the itertools module and prepend the element:
import itertools
gen = iter([1,2,3])
peek = gen.next()
print list(itertools.chain([peek], gen))

Ok - two years too late - but I came across this question, and did not find any of the answers to my satisfaction. Came up with this meta generator:
class Peekorator(object):
def __init__(self, generator):
self.empty = False
self.peek = None
self.generator = generator
try:
self.peek = self.generator.next()
except StopIteration:
self.empty = True
def __iter__(self):
return self
def next(self):
"""
Return the self.peek element, or raise StopIteration
if empty
"""
if self.empty:
raise StopIteration()
to_return = self.peek
try:
self.peek = self.generator.next()
except StopIteration:
self.peek = None
self.empty = True
return to_return
def simple_iterator():
for x in range(10):
yield x*3
pkr = Peekorator(simple_iterator())
for i in pkr:
print i, pkr.peek, pkr.empty
results in:
0 3 False
3 6 False
6 9 False
9 12 False
...
24 27 False
27 None False
i.e. you have at any moment during iteration access to the next item in the list.

Using itertools.tee will produce a lightweight copy of the generator; then peeking ahead at one copy will not affect the second copy. Thus:
import itertools
def process(seq):
peeker, items = itertools.tee(seq)
# initial peek ahead
# so that peeker is one ahead of items
if next(peeker) == 'STOP':
return
for item in items:
# peek ahead
if next(peeker) == "STOP":
return
# process items
print(item)
The items generator is unaffected by modifications to peeker. However, modifying seq after the call to tee may cause problems.
That said: any algorithm that requires looking an item ahead in a generator could instead be written to use the current generator item and the previous item. This will result in simpler code - see my other answer to this question.

An iterator that allows peeking at the next element and also further ahead. It reads ahead as needed and remembers the values in a deque.
from collections import deque
class PeekIterator:
def __init__(self, iterable):
self.iterator = iter(iterable)
self.peeked = deque()
def __iter__(self):
return self
def __next__(self):
if self.peeked:
return self.peeked.popleft()
return next(self.iterator)
def peek(self, ahead=0):
while len(self.peeked) <= ahead:
self.peeked.append(next(self.iterator))
return self.peeked[ahead]
Demo:
>>> it = PeekIterator(range(10))
>>> it.peek()
0
>>> it.peek(5)
5
>>> it.peek(13)
Traceback (most recent call last):
File "<pyshell#68>", line 1, in <module>
it.peek(13)
File "[...]", line 15, in peek
self.peeked.append(next(self.iterator))
StopIteration
>>> it.peek(2)
2
>>> next(it)
0
>>> it.peek(2)
3
>>> list(it)
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>>

>>> gen = iter(range(10))
>>> peek = next(gen)
>>> peek
0
>>> gen = (value for g in ([peek], gen) for value in g)
>>> list(gen)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Just for fun, I created an implementation of a lookahead class based on the suggestion by
Aaron:
import itertools
class lookahead_chain(object):
def __init__(self, it):
self._it = iter(it)
def __iter__(self):
return self
def next(self):
return next(self._it)
def peek(self, default=None, _chain=itertools.chain):
it = self._it
try:
v = self._it.next()
self._it = _chain((v,), it)
return v
except StopIteration:
return default
lookahead = lookahead_chain
With this, the following will work:
>>> t = lookahead(xrange(8))
>>> list(itertools.islice(t, 3))
[0, 1, 2]
>>> t.peek()
3
>>> list(itertools.islice(t, 3))
[3, 4, 5]
With this implementation it is a bad idea to call peek many times in a row...
While looking at the CPython source code I just found a better way which is both shorter and more efficient:
class lookahead_tee(object):
def __init__(self, it):
self._it, = itertools.tee(it, 1)
def __iter__(self):
return self._it
def peek(self, default=None):
try:
return self._it.__copy__().next()
except StopIteration:
return default
lookahead = lookahead_tee
Usage is the same as above but you won't pay a price here to use peek many times in a row. With a few more lines you can also look ahead more than one item in the iterator (up to available RAM).

A simple solution is to use a function like this:
def peek(it):
first = next(it)
return first, itertools.chain([first], it)
Then you can do:
>>> it = iter(range(10))
>>> x, it = peek(it)
>>> x
0
>>> next(it)
0
>>> next(it)
1

If anybody is interested, and please correct me if I am wrong, but I believe it is pretty easy to add some push back functionality to any iterator.
class Back_pushable_iterator:
"""Class whose constructor takes an iterator as its only parameter, and
returns an iterator that behaves in the same way, with added push back
functionality.
The idea is to be able to push back elements that need to be retrieved once
more with the iterator semantics. This is particularly useful to implement
LL(k) parsers that need k tokens of lookahead. Lookahead or push back is
really a matter of perspective. The pushing back strategy allows a clean
parser implementation based on recursive parser functions.
The invoker of this class takes care of storing the elements that should be
pushed back. A consequence of this is that any elements can be "pushed
back", even elements that have never been retrieved from the iterator.
The elements that are pushed back are then retrieved through the iterator
interface in a LIFO-manner (as should logically be expected).
This class works for any iterator but is especially meaningful for a
generator iterator, which offers no obvious push back ability.
In the LL(k) case mentioned above, the tokenizer can be implemented by a
standard generator function (clean and simple), that is completed by this
class for the needs of the actual parser.
"""
def __init__(self, iterator):
self.iterator = iterator
self.pushed_back = []
def __iter__(self):
return self
def __next__(self):
if self.pushed_back:
return self.pushed_back.pop()
else:
return next(self.iterator)
def push_back(self, element):
self.pushed_back.append(element)
it = Back_pushable_iterator(x for x in range(10))
x = next(it) # 0
print(x)
it.push_back(x)
x = next(it) # 0
print(x)
x = next(it) # 1
print(x)
x = next(it) # 2
y = next(it) # 3
print(x)
print(y)
it.push_back(y)
it.push_back(x)
x = next(it) # 2
y = next(it) # 3
print(x)
print(y)
for x in it:
print(x) # 4-9

This will work -- it buffers an item and calls a function with each item and the next item in the sequence.
Your requirements are murky on what happens at the end of the sequence. What does "look ahead" mean when you're at the last one?
def process_with_lookahead( iterable, aFunction ):
prev= iterable.next()
for item in iterable:
aFunction( prev, item )
prev= item
aFunction( item, None )
def someLookaheadFunction( item, next_item ):
print item, next_item

Instead of using items (i, i+1), where 'i' is the current item and i+1 is the 'peek ahead' version, you should be using (i-1, i), where 'i-1' is the previous version from the generator.
Tweaking your algorithm this way will produce something that is identical to what you currently have, apart from the extra needless complexity of trying to 'peek ahead'.
Peeking ahead is a mistake, and you should not be doing it.

Although itertools.chain() is the natural tool for the job here, beware of loops like this:
for elem in gen:
...
peek = next(gen)
gen = itertools.chain([peek], gen)
...Because this will consume a linearly growing amount of memory, and eventually grind to a halt. (This code essentially seems to create a linked list, one node per chain() call.) I know this not because I inspected the libs but because this just resulted in a major slowdown of my program - getting rid of the gen = itertools.chain([peek], gen) line sped it up again. (Python 3.3)

Python3 snippet for #jonathan-hartley answer:
def peek(iterator, eoi=None):
iterator = iter(iterator)
try:
prev = next(iterator)
except StopIteration:
return iterator
for elm in iterator:
yield prev, elm
prev = elm
yield prev, eoi
for curr, nxt in peek(range(10)):
print((curr, nxt))
# (0, 1)
# (1, 2)
# (2, 3)
# (3, 4)
# (4, 5)
# (5, 6)
# (6, 7)
# (7, 8)
# (8, 9)
# (9, None)
It'd be straightforward to create a class that does this on __iter__ and yields just the prev item and put the elm in some attribute.

w.r.t #David Z's post, the newer seekable tool can reset a wrapped iterator to a prior position.
>>> s = mit.seekable(range(3))
>>> s.next()
# 0
>>> s.seek(0) # reset iterator
>>> s.next()
# 0
>>> s.next()
# 1
>>> s.seek(1)
>>> s.next()
# 1
>>> next(s)
# 2

cytoolz has a peek function.
>> from cytoolz import peek
>> gen = iter([1,2,3])
>> first, continuation = peek(gen)
>> first
1
>> list(continuation)
[1, 2, 3]

In my case, I need a generator where I could queue back to generator the data I have just got via next() call.
The way I handle this problem, is to create a queue. In the implementation of the generator, I would first check the queue: if queue is not empty, the "yield" will return the values in queue, or otherwise the values in normal way.
import queue
def gen1(n, q):
i = 0
while True:
if not q.empty():
yield q.get()
else:
yield i
i = i + 1
if i >= n:
if not q.empty():
yield q.get()
break
q = queue.Queue()
f = gen1(2, q)
i = next(f)
print(i)
i = next(f)
print(i)
q.put(i) # put back the value I have just got for following 'next' call
i = next(f)
print(i)
running
python3 gen_test.py
0
1
1
This concept is very useful when I was writing a parser, which needs to look the file line by line, if the line appears to belong to next phase of parsing, I could just queue back to the generator so that the next phase of code could parse it correctly without handling complex state.

For those of you who embrace frugality and one-liners, I present to you a one-liner that allows one to look ahead in an iterable (this only works in Python 3.8 and above):
>>> import itertools as it
>>> peek = lambda iterable, n=1: it.islice(zip(it.chain((t := it.tee(iterable))[0], [None] * n), it.chain([None] * n, t[1])), n, None)
>>> for lookahead, element in peek(range(10)):
... print(lookahead, element)
1 0
2 1
3 2
4 3
5 4
6 5
7 6
8 7
9 8
None 9
>>> for lookahead, element in peek(range(10), 2):
... print(lookahead, element)
2 0
3 1
4 2
5 3
6 4
7 5
8 6
9 7
None 8
None 9
This method is space-efficient by avoiding copying the iterator multiple times. It is also fast due to how it lazily generates elements. Finally, as a cherry on top, you can look ahead an arbitrary number of elements.

An algorithm that works by "peeking" at the next element in a generator could equivalently be one that works by remembering the previous element, treating that element as the one to operate upon, and treating the "current" element as simply "peeked at".
Either way, what is really happening is that the algorithm considers overlapping pairs from the generator. The itertools.tee recipe will work fine - and it is not hard to see that it is essentially a refactored version of Jonathan Hartley's approach:
from itertools import tee
# From https://docs.python.org/3/library/itertools.html#itertools.pairwise
# In 3.10 and up, this is directly supplied by the `itertools` module.
def pairwise(iterable):
# pairwise('ABCDEFG') --> AB BC CD DE EF FG
a, b = tee(iterable)
next(b, None)
return zip(a, b)
def process(seq):
for to_process, lookahead in pairwise(seq):
# peek ahead
if lookahead == "STOP":
return
# process items
print(to_process)

Custom iterator and itertools.tee problem

My custom iterator should call a specific method when next is called on it. It works that way initially, but after itertools.tee is called on the iterator for the second time the method is not called.
I actually already have a solution/workaround but I'd like to understand the root cause of the problem.
class MyIterator(object):
def __init__(self, elements):
self._elements = iter(elements)
def __iter__(self):
return self
def next(self):
element = (self._elements)
if isinstance(element, HwState):
element.el_method()
return element
elements = list(...)
iterator1, iterator2 = itertools.tee(MyIterator(elements))
element1 = next(iterator2) # ok
element2 = next(iterator2) # ok
iterator1, iterator2 = itertools.tee(MyIterator(iterator1))
element1 = next(iterator2) # el_method() is not called but correct element is returned
element2 = next(iterator2) # el_method() is not called but correct element is returned
I "solved" the issue this way:
elements = list(...)
iterator = MyIterator(elements)
element1 = next(iterator)
element2 = next(iterator)
iterator = MyIterator(elements)
element1 = next(iterator) # el_method() is called, correct element is returned
element2 = next(iterator) # el_method() is called, correct element is returned

See the "roughly equivalent" implementation of itertools.tee included in the documentation:
def tee(iterable, n=2):
it = iter(iterable)
deques = [collections.deque() for i in range(n)]
def gen(mydeque):
while True:
if not mydeque: # when the local deque is empty
try:
newval = next(it) # fetch a new value and
except StopIteration:
return
for d in deques: # load it to all the deques
d.append(newval)
yield mydeque.popleft()
return tuple(gen(d) for d in deques)
Essentially, tee keeps a queue for every generated iterator. When a new value is requested, if there is something in the iterator queue it takes the next value from there, and if the queue is empty it calls next on the original iterator once and adds the result to every queue. What this means is that the generated values are "cached" and returned by each iterator, instead of duplicating the work of producing the element.
Moreover, it would be in general impossible for tee to behave as you expect, since tee cannot know how to make copies of an iterator in general. Think for example on a text file. Once you read one line in principle you cannot go back (in simple sequential access), and there is no such thing as "duplicating the file iterator" as such (to emulate something like that you would need multiple file handlers or seeking), so you just save the lines you read and return them later in the other iterators.

Using generator send() within a for loop

I implemented graph traversal as a generator function which yields the node being visited.
Sometimes the user needs to tell the traversal function that the edges outgoing from a particular node shouldn't be followed; in order to support that, the traversal checks the value sent back to it (using generator send() method), and if it's True, regards the node as a leaf for traversal purposes.
The problem is that the simplest user loop is kinda long:
# simplified thanks to #tobias_k
# bfs is the traversal generator function
traversal = bfs(g, start_node)
try:
n = next(traversal)
while True:
# process(n) returns True if don't want to follow edges out of n
n = traversal.send(process(n))
except StopIteration:
pass
Is there any way to improve this?
I thought something like this should work:
for n in bfs(g, start_node):
???.send(process(n))
but I feel I'm missing the knowledge of some python syntax.

I don't see a way to do this in a regular for loop. However, you could create another generator, that iterates another generator, using some "follow-function" to determine whether to follow the current element, thus encapsulating the tricky parts of your code into a separate function.
def checking_generator(generator, follow_function):
try:
x = next(generator)
while True:
yield x
x = generator.send(follow_function(x))
except StopIteration:
pass
for n in checking_generator(bfs(g, start_node), process):
print(n)

I discovered that my question would have had a one-line answer, using the extended "continue" statement proposed in the earlier version of PEP 342:
for n in bfs(g, start_node):
continue process(n)
However, while PEP 342 was accepted, that particular feature was withdrawn after this June 2005 discussion between Raymond and Guido:
Raymond Hettinger said:
Let me go on record as a strong -1 for "continue EXPR". The
for-loop is our most basic construct and is easily understood in its
present form. The same can be said for "continue" and "break" which
have the added advantage of a near zero learning curve for people
migrating from other languages.
Any urge to complicate these basic statements should be seriously
scrutinized and held to high standards of clarity, explainability,
obviousness, usefulness, and necessity. IMO, it fails most of those
tests.
I would not look forward to explaining "continue EXPR" in the tutorial
and think it would stand out as an anti-feature.
[...] The correct argument against "continue EXPR" is that there
are no use cases yet; if there were a good use case, the explanation
would follow easily.
Guido
If python core developers have since changed their mind about the usefulness of extended "continue", perhaps this could be reintroduced into a future PEP. But, given a nearly identical use case as in this question was already discussed in the quoted thread, and wasn't found persuasive, it seems unlikely.

To simplify the client code, you could use an ordinary bsf() generator and check node.isleaf attribute in it:
for node in bfs(g, start_node):
node.isleaf = process(node) # don't follow if `process()` returns True
The disadvantage is that node is mutable. Or you have to pass a shared data structure that tracks leaf nodes: leaf[node] = process(node) where leaf dictionary is passed into bfs() earlier.
If you want to use .send() method explicitly; you have to handle StopIteration. See PEP 479 -- Change StopIteration handling inside generators. You could hide it in a helper function:
def traverse(tree_generator, visitor):
try:
node = next(tree_generator)
while True:
node = tree_generator.send(visitor(node))
except StopIteration:
pass
Example:
traverse(bfs(g, start_node), process)

I don't see this as a frequent use case, consider this as the original generator:
def original_gen():
for x in range(10):
should_break = yield x
if should_break:
break
If the value of should_break is always calculated based on some function call with x then why not just write the generator like this:
def processing_gen(check_f):
for x in range(10):
yield x
should_break = check_f(x)
if should_break:
break
However I usually think of the code that processes the generated values as being written inside the loop (otherwise what is the point of having a loop at all?)
What it really seems you want to do is create a generator where calling the __next__ method really implies send(process(LAST_VALUE)) which can be implemented with a class:
class Followup_generator(): #feel free to use a better name
def __init__(self,generator,following_function):
self.gen = generator
self.process_f = following_function
def __iter__(self):
return self
def __next__(self):
if hasattr(self,"last_value"):
return self.send(self.process_f(self.last_value))
else:
self.last_value = next(self.gen)
return self.last_value
def send(self,arg):
self.last_value = self.gen.send(arg)
return self.last_value
def __getattr__(self,attr):
"forward other lookups to the generator (.throw etc.)"
return getattr(self.gen, attr)
# call signature is the exact same as #tobias_k's checking_generator
traversal = Followup_generator(bfs(g, start_node), process)
for n in traversal:
print(n)
n = traversal.send(DATA) #you'd be able to send extra values to it
However this still doesn't see this as frequently used, I'd be perfectly fine with a while loop, although I'd put the .send call at the top:
traversal = bfs(g, start_node)
send_value = None
while True:
n = traversal.send(send_value)
#code for loop, ending in calculating the next send_value
send_value = process(n)
And you might wrap that in a try: ... except StopIteration:pass although I find that simply waiting for an error to raise is better expressed with a context manager:
class Catch:
def __init__(self,exc_type):
if issubclass(exc_type,BaseException):
self.catch_type = exc_type
else:
raise TypeError("can only catch Exceptions")
def __enter__(self):
return self
def __exit__(self,exc_type,err, tb):
if issubclass(exc_type, self.catch_type):
self.err = err
return True
with Catch(StopIteration):
traversal = bfs(g, start_node)
send_value = None
while True:
n = traversal.send(send_value)
#code for loop, ending in calculating the next send_value
send_value = process(n)

Probably this is the answer to the question from the thread's topic.
Take a look at the additional empty yields statements inside the traversal function and custom send function, that does the magical job.
# tested with Python 3.7
def traversal(n):
for i in range(n):
yield i, '%s[%s] %s' % (' ' * (4 - n), n, i)
stop = yield
if stop:
yield # here's the first part of the magic
else:
yield # the same as above
yield from traversal(int(n / 2))
def send(generator, value):
next(generator) # here's the second part of the magic
generator.send(value)
g = traversal(4)
for i, (num, msg) in enumerate(g):
print('>', i, msg)
stop = num % 2 == 0
send(g, stop)

I've written a small class SettableGenerator which uses a method to receive the value to be send and then forwards it to the actual generator when __next__ is invoked.
With this you can write:
gen = SettableGenerator(bfs(g, start_node))
for n in gen:
gen.set(process(n))

Let's consider the following generator. It generates numbers from 0 to 9. For every generated number, it gets an input and stores it into ret:
def count_to_nine():
# Output: numbers from 0 to 9
# Input: converted numbers
ret = []
for i in range(10):
# Yield a number, get something back
val = (yield i)
# Remember that "something"
ret.append(val)
return ret
You can, indeed, iterate it using next() + send(),
but the best way is to iterate using send() alone:
g = count_to_nine()
value = None # to make sure that the first send() gives a None
while True:
value = g.send(value) # send the previously generated value, get a new one
value = f'#{value}'
Here's the result:
StopIteration: ['#0', '#1', '#2', '#3', '#4', '#5', '#6', '#7', '#8', '#9']
If you want that output, catch the StopIteration and get the result from it.
Cheers!

What is a good way to decorate an iterator to alter the value before next is called in python?

I am working on a problem that involves validating a format from within unified diff patch.
The variables within the inner format can span multiple lines at a time, so I wrote a generator that pulls each line and yields the variable when it is complete.
To avoid having to rewrite this function when reading from a unified diff file, I created a generator to strip the unified diff characters from the line before passing it to the inner format validator. However, I am getting stuck in an infinite loop (both in the code and in my head). I have abstracted to problem to the following code. I'm sure there is a better way to do this. I just don't know what it is.
from collections import Iterable
def inner_format_validator(inner_item):
# Do some validation to inner items
return inner_item[0] != '+'
def inner_gen(iterable):
for inner_item in iterable:
# Operates only on inner_info type data
yield inner_format_validator(inner_item)
def outer_gen(iterable):
class DecoratedGenerator(Iterable):
def __iter__(self):
return self
def next(self):
# Using iterable from closure
for outer_item in iterable:
self.outer_info = outer_item[0]
inner_item = outer_item[1:]
return inner_item
decorated_gen = DecoratedGenerator()
for inner_item in inner_gen(decorated_gen):
yield inner_item, decorated_gen.outer_info
if __name__ == '__main__':
def wrap(string):
# The point here is that I don't know what the first character will be
pseudo_rand = len(string)
if pseudo_rand * pseudo_rand % 2 == 0:
return '+' + string
else:
return '-' + string
inner_items = ["whatever"] * 3
# wrap screws up inner_format_validator
outer_items = [wrap("whatever")] * 3
# I need to be able to
# iterate over inner_items
for inner_info in inner_gen(inner_items):
print(inner_info)
# and iterate over outer_items
for outer_info, inner_info in outer_gen(outer_items):
# This is an infinite loop
print(outer_info)
print(inner_info)
Any ideas as to a better, more pythonic way to do this?

I would do something simpler, like this:
def outer_gen(iterable):
iterable = iter(iterable)
first_item = next(iterable)
info = first_item[0]
yield info, first_item[1:]
for item in iterable:
yield info, item
This will execute the 4 first lines only once, then enter the loop and yield what you want.
You probably want to add some try/except to cacth IndexErrors here and there.
If you want to take values while they start with something or the contrary, remember you can use a lot of stuff from the itertools toolbox, and in particular dropwhile, takewhile and chain:
>>> import itertools
>>> l = ['+foo', '-bar', '+foo']
>>> list(itertools.takewhile(lambda x: x.startswith('+'), l))
['+foo']
>>> list(itertools.dropwhile(lambda x: x.startswith('+'), l))
['-bar', '+foo']
>>> a = itertools.takewhile(lambda x: x.startswith('+'), l)
>>> b = itertools.dropwhile(lambda x: x.startswith('+'), l)
>>> list(itertools.chain(a, b))
['+foo', '-bar', '+foo']
And remember that you can create generators like comprehension lists, store them in variables and chain them, just like you would pipe linux commands:
import random
def create_item():
return random.choice(('+', '-')) + random.choice(('foo', 'bar'))
random_items = (create_item() for s in xrange(10))
added_items = ((i[0], i[1:]) for i in random_items if i.startswith('+'))
valid_items = ((prefix, line) for prefix, line in added_items if 'foo' in line)
print list(valid_items)
With all this, you should be able to find some pythonic way to solve your problem :-)

I still don't like this very much, but at least it's shorter and a tad more pythonic:
from itertools import imap, izip
from functools import partial
def inner_format_validator(inner_item):
return not inner_item.startswith('+')
inner_gen = partial(imap, inner_format_validator)
def split(astr):
return astr[0], astr[1:]
def outer_gen(iterable):
outer_stuff, inner_stuff = izip(*imap(split, iterable))
return izip(inner_gen(inner_stuff), outer_stuff)
[EDIT] inner_gen() and outer_gen() without imap and partial:
def inner_gen(iterable):
for each in iterable:
yield inner_format_validator(each)
def outer_gen(iterable):
outer_stuff, inner_stuff = izip(*(split(each) for each in iterable))
return izip(inner_gen(inner_stuff), outer_stuff)
Maybe this is a better, though different, solution:
def transmogrify(iter_of_iters, *transmogrifiers):
for iters in iter_of_iters:
yield (
trans(each) if trans else each
for trans, each in izip(transmogrifiers, iters)
)
for outer, inner in transmogrify(imap(split, stuff), inner_format_validator, None):
print inner, outer

I think it will do what you intended if you change the definition of DecoratedGenerator to this:
class DecoratedGenerator(Iterable):
def __iter__(self):
# Using iterable from closure
for outer_item in iterable:
self.outer_info = outer_item[0]
inner_item = outer_item[1:]
yield inner_item
Your original version never terminated because its next() method was stateless and would return the same value every time it was called. You didn't need to have a next() method at all, though--you can implement __iter__() yourself (as I did), and then it all works fine.

Python twisted: iterators and yields/inlineCallbacks

Folks,
Am thoroughly confused, so it's possible I am not even asking things correctly, but here goes:
I have a twisted application using inlineCallbacks. Now I need to define an iterator which will mean a generator is returned to the caller. However, the iterator cannot be inlineCallbacks decorated, can it be? If not, then how I do I code something like this.
Just to clarify: the goal is process_loop needs to be called every, say 5, seconds, it can process only ONE chunk of, say 10, and then it has to let go. However, to know that chunk of 10 (stored in cached, which is a dict of a dict), it needs to call a function that returns deferred.
#inlineCallbacks ### can\'t have inlineCallbacks here, right?
def cacheiter(cached):
for cachename,cachevalue in cached.items():
result = yield (call func here which returns deferred)
if result is True:
for k,v in cachedvalue.items():
yield cachename, k, v
#inlineCallbacks
def process_chunk(myiter, num):
try:
for i in xrange(num):
nextval = myiter.next()
yield some_processing(nextval)
returnValue(False)
except StopIteration:
returnValue(True)
#inlineCallbacks
def process_loop(cached):
myiter = cacheiter(cached)
result = yield process_chunk(myiter, 10)
if not result:
print 'More left'
reactor.callLater(5, process_loop, cached)
else:
print 'All done'

You're right that you can't express what you want to express in cacheiter. The inlineCallbacks decorator won't let you have a function that returns an iterator. If you decorate a function with it, then the result is a function that always returns a Deferred. That's what it is for.
Part of what makes this difficult is that iterators don't work well with asynchronous code. If there's a Deferred involved in producing the elements of your iterator, then the elements that come out of your iterator are going to be Deferreds first.
You might do something like this to account for that:
#inlineCallbacks
def process_work():
for element_deferred in some_jobs:
element = yield element_deferred
work_on(element)
This can work, but it looks particularly weird. Since generators can only yield to their caller (not, for example, to their caller's caller), the some_jobs iterator can't do anything about this; only code lexically within process_work can yield a Deferred to the inlineCallbacks-provided trampoline to wait on.
If you don't mind this pattern, then we could imaging your code being written something like:
from twisted.internet.task import deferLater
from twisted.internet.defer import inlineCallbacks, returnValue
from twisted.internet import reactor
class cacheiter(object):
def __init__(self, cached):
self._cached = iter(cached.items())
self._remaining = []
def __iter__(self):
return self
#inlineCallbacks
def next(self):
# First re-fill the list of synchronously-producable values if it is empty
if not self._remaining:
for name, value in self._cached:
# Wait on this Deferred to determine if this cache item should be included
if (yield check_condition(name, value)):
# If so, put all of its values into the value cache so the next one
# can be returned immediately next time this method is called.
self._remaining.extend([(name, k, v) for (k, v) in value.items()])
# Now actually give out a value, if there is one.
if self._remaining:
returnValue(self._remaining.pop())
# Otherwise the entire cache has been visited and the iterator is complete.
# Sadly we cannot signal completion with StopIteration, because the iterator
# protocol isn't going to add an errback to this Deferred and check for
# StopIteration. So signal completion with a simple None value.
returnValue(None)
#inlineCallbacks
def process_chunk(myiter, num):
for i in xrange(num):
nextval = yield myiter.next()
if nextval is None:
# The iterator signaled completion via the special None value.
# Processing is complete.
returnValue(True)
# Otherwise process the value.
yield some_processing(nextval)
# Indicate there is more processing to be done.
returnValue(False)
def sleep(sec):
# Simple helper to delay asynchronously for some number of seconds.
return deferLater(reactor, sec, lambda: None)
#inlineCallbacks
def process_loop(cached):
myiter = cacheiter(cached)
while True:
# Loop processing 10 items from myiter at a time, until process_chunk signals
# there are no values left.
result = yield process_chunk(myiter, 10)
if result:
print 'All done'
break
print 'More left'
# Insert the 5 second delay before starting on the next chunk.
yield sleep(5)
d = process_loop(cached)
Another approach you might be able to take, though, is to use twisted.internet.task.cooperate. cooperate takes an iterator and consumes it, assuming that consuming it is potentially costly, and splitting up the job over multiple reactor iterations. Taking the definition of cacheiter from above:
from twisted.internet.task import cooperate
def process_loop(cached):
finished = []
def process_one(value):
if value is None:
finished.append(True)
else:
return some_processing(value)
myiter = cacheiter(cached)
while not finished:
value_deferred = myiter.next()
value_deferred.addCallback(process_one)
yield value_deferred
task = cooperate(process_loop(cached))
d = task.whenDone()

I think you're trying to do this:
#inlineCallbacks
def cacheiter(cached):
for cachename,cachevalue in cached.items():
result = yield some_deferred() # some deferred you'd like evaluated
if result is True:
# here you want to return something, so you have to use returnValue
# the generator you want to return can be written as a generator expression
gen = ((cachename, k, v) for k,v in cachedvalue.items())
returnValue(gen)
When a genexp can't express what you're trying to return you can write a closure:
#inlineCallbacks
def cacheiter(cached):
for cachename,cachevalue in cached.items():
result = yield some_deferred()
if result is True:
# define the generator, saving the current values of the cache
def gen(cachedvalue=cachedvalue, cachename=cachename):
for k,v in cachedvalue.items():
yield cachename, k, v
returnValue(gen()) # return it

Try writing your iterator as a DeferredGenerator.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Discarding a tee element - python

Related

Pythonic way to get next iterable value without incrementing iterator [duplicate]

Custom iterator and itertools.tee problem

Using generator send() within a for loop

What is a good way to decorate an iterator to alter the value before next is called in python?

Python twisted: iterators and yields/inlineCallbacks

Categories

Resources