Custom iterator and itertools.tee problem

Custom iterator and itertools.tee problem - python

My custom iterator should call a specific method when next is called on it. It works that way initially, but after itertools.tee is called on the iterator for the second time the method is not called.
I actually already have a solution/workaround but I'd like to understand the root cause of the problem.
class MyIterator(object):
def __init__(self, elements):
self._elements = iter(elements)
def __iter__(self):
return self
def next(self):
element = (self._elements)
if isinstance(element, HwState):
element.el_method()
return element
elements = list(...)
iterator1, iterator2 = itertools.tee(MyIterator(elements))
element1 = next(iterator2) # ok
element2 = next(iterator2) # ok
iterator1, iterator2 = itertools.tee(MyIterator(iterator1))
element1 = next(iterator2) # el_method() is not called but correct element is returned
element2 = next(iterator2) # el_method() is not called but correct element is returned
I "solved" the issue this way:
elements = list(...)
iterator = MyIterator(elements)
element1 = next(iterator)
element2 = next(iterator)
iterator = MyIterator(elements)
element1 = next(iterator) # el_method() is called, correct element is returned
element2 = next(iterator) # el_method() is called, correct element is returned

See the "roughly equivalent" implementation of itertools.tee included in the documentation:
def tee(iterable, n=2):
it = iter(iterable)
deques = [collections.deque() for i in range(n)]
def gen(mydeque):
while True:
if not mydeque: # when the local deque is empty
try:
newval = next(it) # fetch a new value and
except StopIteration:
return
for d in deques: # load it to all the deques
d.append(newval)
yield mydeque.popleft()
return tuple(gen(d) for d in deques)
Essentially, tee keeps a queue for every generated iterator. When a new value is requested, if there is something in the iterator queue it takes the next value from there, and if the queue is empty it calls next on the original iterator once and adds the result to every queue. What this means is that the generated values are "cached" and returned by each iterator, instead of duplicating the work of producing the element.
Moreover, it would be in general impossible for tee to behave as you expect, since tee cannot know how to make copies of an iterator in general. Think for example on a text file. Once you read one line in principle you cannot go back (in simple sequential access), and there is no such thing as "duplicating the file iterator" as such (to emulate something like that you would need multiple file handlers or seeking), so you just save the lines you read and return them later in the other iterators.

Related

Mixing yield and return. `yield [cand]; return` vs `return [[cand]]`. Why do they lead to different output? [duplicate]

This question already has answers here:
Return in generator together with yield
(2 answers)
Closed last year.
Why does
yield [cand]
return
lead to different output/behavior than
return [[cand]]
Minimal viable example
uses recursion
the output of the version using yield [1]; return is different than the output of the version using return [[1]]
def foo(i):
if i != 1:
yield [1]
return
yield from foo(i-1)
def bar(i):
if i != 1:
return [[1]]
yield from bar(i-1)
print(list(foo(1))) # [[1]]
print(list(bar(1))) # []
Min viable counter example
does not use recurion
the output of the version using yield [1]; return is the same as the output of the version using return [[1]]
def foo():
yield [1]
return
def foofoo():
yield from foo()
def bar():
return [[1]]
def barbar():
yield from bar()
print(list(foofoo())) # [[1]]
print(list(barbar())) # [[1]]
Full context
I'm solving Leetcode #39: Combination Sum and was wondering why one solution works, but not the other:
Working solution
from functools import cache # requires Python 3.9+
class Solution:
def combinationSum(self, candidates: List[int], target: int) -> List[List[int]]:
#cache
def helper(targ, i=0):
if i == N or targ < (cand := candidates[i]):
return
if targ == cand:
yield [cand]
return
for comb in helper(targ - cand, i):
yield comb + [cand]
yield from helper(targ, i+1)
N = len(candidates)
candidates.sort()
yield from helper(target)
Non-working solution
from functools import cache # requires Python 3.9+
class Solution:
def combinationSum(self, candidates: List[int], target: int) -> List[List[int]]:
#cache
def helper(targ, i=0):
if i == N or targ < (cand := candidates[i]):
return
if targ == cand:
return [[cand]]
for comb in helper(targ - cand, i):
yield comb + [cand]
yield from helper(targ, i+1)
N = len(candidates)
candidates.sort()
yield from helper(target)
Output
On the following input
candidates = [2,3,6,7]
target = 7
print(Solution().combinationSum(candidates, target))
the working solution correctly prints
[[3,2,2],[7]]
while the non-working solution prints
[]
I'm wondering why yield [cand]; return works, but return [[cand]] doesn't.

In a generator function, return just defines the value associated with the StopIteration exception implicitly raised to indicate an iterator is exhausted. It's not produced during iteration, and most iterating constructs (e.g. for loops) intentionally ignore the StopIteration exception (it means the loop is over, you don't care if someone attached random garbage to a message that just means "we're done").
For example, try:
>>> def foo():
... yield 'onlyvalue' # Existence of yield keyword makes this a generator
... return 'returnvalue'
...
>>> f = foo() # Makes a generator object, stores it in f
>>> next(f) # Pull one value from generator
'onlyvalue'
>>> next(f) # There is no other yielded value, so this hits the return; iteration over
--------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
...
StopIteration: 'returnvalue'
As you can see, your return value does get "returned" in a sense (it's not completely discarded), but it's never seen by anything iterating normally, so it's largely useless. Outside of rare cases involving using generators as coroutines (where you're using .send() and .throw() on instances of the generator and manually advancing it with next(genobj)), the return value of a generator won't be seen.
In short, you have to pick one:
Use yield anywhere in a function, and it's a generator (whether or not the code path of a particular call ever reaches a yield) and return just ends generation (while maybe hiding some data in the StopIteration exception). No matter what you do, calling the generator function "returns" a new generator object (which you can loop over until exhausted), it can never return a raw value computed inside the generator function (which doesn't even begin running until you loop over it at least once).
Don't use yield, and return works as expected (because it's not a generator function).
As an example to explain what happens to the return value in normal looping constructs, this is what for x in gen(): effectively expands to a C optimized version of:
__unnamed_iterator = iter(gen())
while True:
try:
x = next(__unnamed_iterator)
except StopIteration: # StopIteration caught here without inspecting it
break # Loop ends, StopIteration exception cleaned even from sys.exc_info() to avoid possible reference cycles
# body of loop goes here
# Outside of loop, there is no StopIteration object left
As you can see, the expanded form of the for loop has to look for a StopIteration to indicate the loop is over, but it doesn't use it. And for anything that's not a generator, the StopIteration never has any associated values; the for loop has no way to report them even if it did (it has to end the loop when it's told iteration is over, and the arguments to StopIteration are explicitly not part of the values iterated anyway). Anything else that consumes the generator (e.g. calling list on it) is doing roughly the same thing as the for loop, ignoring the StopIteration in the same way; nothing except code that specifically expects generators (as opposed to more generalized iterables and iterators) will ever bother to inspect the StopIteration object (at the C layer, there are optimizations that StopIteration objects aren't even produced by most iterators; they return NULL and leave the set exception empty, which all iterator protocol using things know is equivalent to returning NULL and setting a StopIteration object, so for anything but a generator, there isn't even an exception to inspect much of the time).

Code of a generator that does not work properly in Python

I have the following code of an iterator. Since I have created a separate class meant to iterate through my list, if I put two consecutive for statements that I used to iterate through my object, I should not need to reset the index manually. But, on a global run, it only displays once the elements of my list, and upon multiple iterations, it stops displaying anything, so I think somewhere the resetting of the index does not happen properly. Please use simple code and simple explanations because I am a beginner.
my_list = [1,2,3]
class Iterator:
def __init__(self,seq):
self.seq = seq
def __next__(self):
if len(self.seq) > 0:
return self.seq.pop(0)
raise StopIteration
class Iterating:
def __init__(self):
pass
def __iter__(self):
return Iterator(my_list)
i_1 =Iterating()
for element in i_1:
print(element)
print()
for element in i_1:
print(element)
# why does this not work?

It's because while you're iterating over your Iterator you're deleting all elements from your self.seq thus when you try to iterate over it second time you're iterating over empty list. You could create temporary list that you'll pop from and just "refill" it every time you finish iterating for further usage.
class Iterator:
def __init__(self,seq):
self.seq = seq
self.temp_seq = self.seq.copy()
def __next__(self):
if len(self.temp_seq) > 0:
return self.temp_seq.pop(0)
self.temp_seq = self.seq.copy()
raise StopIteration

The problem is that method pop modifies list (see documentation):
def __next__(self):
if len(self.seq) > 0:
return self.seq.pop(0) # method pop(0) modifies list
raise StopIteration
When you create new instance Iterator(my_list) variable my_list is passed by reference.
To fix your could you can create a copy of my_list:
Iterator(my_list.copy())

How does a for loop evaluate its argument

My question is a very simple one.
Does a for loop evaluates the argument it uses every time ?
Such as:
for i in range(300):
Does python create a list of 300 items for every iteration of this loop?
If it is, is this a way to avoid it?
lst = range(300)
for i in lst:
#loop body
Same goes for code examples like this.
for i in reversed(lst):
for k in range(len(lst)):
Is the reverse process applied every single time, or the length calculated at every iteration? (I ask this for both python2 and python3)
If not, how does Python evaluate the changes on the iterable while iterating over it ?

No fear, the iterator will only be evaluated once. It ends up being roughly equivalent to code like this:
it = iter(range(300))
while True:
try:
i = next(it)
except StopIteration:
break
... body of loop ...
Note that it's not quite equivalent, because break will work differently. Remember that you can add an else to a for loop, but that won't work in the above code.

What objects are created depends on what the __iter__ method of the Iterable you are looping over returns.
Usually Python creates one Iterator when iterating over an Iterable which itself is not an Iterator. In Python2, range returns a list, which is an Iterable and has an __iter__ method which returns an Iterator.
>>> from collections import Iterable, Iterator
>>> isinstance(range(300), Iterable)
True
>>> isinstance(range(300), Iterator)
False
>>> isinstance(iter(range(300)), Iterator)
True
The for in sequence: do something syntax is basically a shorthand for doing this:
it = iter(some_iterable) # get Iterator from Iterable, if some_iterable is already an Iterator, __iter__ returns self by convention
while True:
try:
next_item = next(it)
# do something with the item
except StopIteration:
break
Here is a demo with some print statements to clarify what's happening when using a for loop:
class CapitalIterable(object):
'when iterated over, yields capitalized words of string initialized with'
def __init__(self, stri):
self.stri = stri
def __iter__(self):
print('__iter__ has been called')
return CapitalIterator(self.stri)
# instead of returning a custom CapitalIterator, we could also
# return iter(self.stri.title().split())
# because the built in list has an __iter__ method
class CapitalIterator(object):
def __init__(self, stri):
self.items = stri.title().split()
self.index = 0
def next(self): # python3: __next__
print('__next__ has been called')
try:
item = self.items[self.index]
self.index += 1
return item
except IndexError:
raise StopIteration
def __iter__(self):
return self
c = CapitalIterable('The quick brown fox jumps over the lazy dog.')
for x in c:
print(x)
Output:
__iter__ has been called
__next__ has been called
The
__next__ has been called
Quick
__next__ has been called
Brown
__next__ has been called
Fox
__next__ has been called
Jumps
__next__ has been called
Over
__next__ has been called
The
__next__ has been called
Lazy
__next__ has been called
Dog.
__next__ has been called
As you can see, __iter__ is being called only once, therefore only one Iterator object is created.

Range creates an array of 300 ints in that case either way. It does NOT create an array of 300 ints 300 times. It is not every efficient. If you use xrange it will create an iterable object that won't take up nearly as much memory. https://docs.python.org/2/library/functions.html#xrange
example.py
for i in xrange(300): #low memory foot print, similar to a normal loop
print(i)

How do I check if my loop never ran at all?

How do I check if my loop never ran at all?
This somehow looks too complicated to me:
x = _empty = object()
for x in data:
... # process x
if x is _empty:
raise ValueError("Empty data iterable: {!r:100}".format(data))
Ain't there a easier solution?
The above solution is from curiousefficiency.org
Update
data can contain None items.
data is an iterator, and I don't want to use it twice.

By "never ran", do you mean that data had no elements?
If so, the simplest solution is to check it before running the loop:
if not data:
raise Exception('Empty iterable')
for x in data:
...
However, as mentioned in the comments below, it will not work with some iterables, like files, generators, etc., so should be applied carefully.

The original code is best.
x = _empty = object()
_empty is called a sentinel value. In Python it's common to create a sentinel with object(), since it makes it obvious that the only purpose of _empty is to be a dummy value. But you could have used any mutable, for instance an empty list [].
Mutable objects are always guaranteed to be unique when you compare them with is, so you can safely use them as sentinel values, unlike immutables such as None or 0.
>>> None is None
True
>>> object() is object()
False
>>> [] is []
False

I propose the following:
loop_has_run = False
for x in data:
loop_has_run = True
... # process x
if not loop_has_run:
raise ValueError("Empty data iterable: {!r:100}".format(data))
I contend that this is better than the example in the question, because:
The intent is clearer (since the variable name specifies its meaning directly).
No objects are created or destroyed (which can have a negative performance impact).
It doesn't require paying attention to the subtle point that object() always returns a unique value.
Note that the loop_has_run = True assignment should be put at the start of the loop, in case (for example) the loop body contains break.

The following simple solution works with any iterable. It is based on the idea that we can check if there is a (first) element, and then keep iterating if there was one. The result is much clearer:
import itertools
try:
first_elmt = next(data)
except StopIteration:
raise ValueError("Empty data iterator: {!r:100}".format(data))
for x in itertools.chain([first_elmt], data):
…
PS: Note that it assumes that data is an iterator (as in the question). If it is merely an iterable, the code should be run on data_iter = iter(data) instead of on data (otherwise, say if data is a list, the loop would duplicate the first element).

The intent of that code isn't immediately obvious. Sure people would understand it after a while, but the code could be made clearer.
The solution I offer requires more lines of code, but that code is in a class that can be stored elsewhere. In addition this solution will work for iterables and iterators as well as sized containers.
Your code would be changed to:
it = HadItemsIterable(data)
for x in it:
...
if it.had_items:
...
The code for the class is as follows:
from collections.abc import Iterable
class HadItemsIterable(Iterable):
def __init__(self, iterable):
self._iterator = iter(iterable)
#property
def had_items(self):
try:
return self._had_items
except AttributeError as e:
raise ValueError("Not iterated over items yet")
def __iter__(self):
try:
first = next(self._iterator)
except StopIteration:
if hasattr(self, "_had_items"):
raise
self._had_items = False
raise
self._had_items = True
yield first
yield from self._iterator

You can add a loop_flag default as False, when loop executed, change it into True:
loop_flag = False
x = _empty = object()
for x in data:
loop_flag = True
... # process x
if loop_flag:
print "loop executed..."

What about this solution?
data=[]
count=None
for count, item in enumerate(data):
print (item)
if count is None:
raise ValueError('data is empty')

How to call a generator type function with a nonetype function within a loop?

I have a generator function getElements in a class Reader() that yields all the elements out of an xml file. I also want to have a function getFeatures that only yields the elements with a feature tag.
How I tried it is to have a flag featuresOnly that is set to True when getFeatures is called, and in getFeatures call self.getElements, like this:
def getFeatures(self):
self.getFeaturesOnly = True
self.getElements()
This way in getElements() I only have to do
def getElements(self):
inFile = open(self.path)
for element in cElementTree.iterparse(inFile):
if self.getFeaturesOnly == True:
if element.tag == 'feature':
yield element
else:
yield element
inFile.close()
However, when I do this and run it
features = parseFeatureXML.Reader(filePath)
for element in features.getFeatures():#
print element
I get: TypeError: 'NoneType' object is not iterable
This is because getFeatures doesn't contain a yield. Now, the way that I know how to solve this is to copy the code of getElements into getFeatures and only use the
if elementFunctions.getElmentTag(element) == 'feature':
in the getFeatures() function, but I rather not duplicate any code. So how would I be able to keep on generator function, and have a different function where I only specefy which tag I would like to get?

First things first: You have that error because you don't return the generator
Meaning that you have to change:
def getFeatures(self):
self.getFeaturesOnly = True
self.getElements()
with:
def getFeatures(self):
self.getFeaturesOnly = True
return self.getElements() # returning the generator
Cleared this, TBH I wouldn't design my Reader() class like this.
I'd let the getElement yield all the elements:
def getElements(self):
inFile = open(self.path)
for element in cElementTree.iterparse(inFile):
yield element
inFile.close()
And then getFeatures() do the filtering:
def getFeatures(self):
for element in self.getElements():
if element.tag == 'feature':
yield element

The reason you get the TypeError is not that getFeatures doesn't contain a yield, it's because getFeatures doesn't return anything. If you want getFeatures to return the iterator returned by getElements, you have to use return:
def getFeatures(self):
self.getFeaturesOnly = True
return self.getElements()
While you're at it, you really shouldn't do if expr == True; just do if expr, which works even if expr is true (the concept) but not True (the object.) That said, instead of hoisting the features-only support into getElements, a more common approach is to do it in getFeatures itself, like so:
def getFeatures(self):
for element in self.getElements():
if element.tag == 'feature':
yield element
def getElements(self):
inFile = open(self.path)
for element in cElementTree.iterparse(inFile):
yield element
inFile.close()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Custom iterator and itertools.tee problem - python

Related

Mixing yield and return. `yield [cand]; return` vs `return [[cand]]`. Why do they lead to different output? [duplicate]

Code of a generator that does not work properly in Python

How does a for loop evaluate its argument

How do I check if my loop never ran at all?

How to call a generator type function with a nonetype function within a loop?

Categories

Resources