Annoying generator bug - python

The original context of this bug is a piece of code too large to post in a question like this. I had to whittle this code down to a minimal snippet that still exhibits the bug. This is why the code shown below is somewhat bizarre-looking.
In the code below, the class Foo may thought of as a convoluted way to get something like xrange.
class Foo(object):
def __init__(self, n):
self.generator = (x for x in range(n))
def __iter__(self):
for e in self.generator:
yield e
Indeed, Foo seems to behave very much like xrange:
for c in Foo(3):
print c
# 0
# 1
# 2
print list(Foo(3))
# [0, 1, 2]
Now, the subclass Bar of Foo adds only a __len__ method:
class Bar(Foo):
def __len__(self):
return sum(1 for _ in self.generator)
Bar behaves just like Foo when used in a for-loop:
for c in Bar(3):
print c
# 0
# 1
# 2
BUT:
print list(Bar(3))
# []
My guess is that, in the evaluation of list(Bar(3)), the __len__ method of Bar(3) is getting called, thereby using up the generator.
(If this guess is correct, the call to Bar(3).__len__ is unnecessary; after all, list(Foo(3)) produces the correct result even though Foo has no __len__ method.)
This situation is annoying: there's no good reason for list(Foo(3)) and list(Bar(3)) to produce different results.
Is it possible to fix Bar (without, of course, getting rid of its __len__ method) such that list(Bar(3)) returns [0, 1, 2]?

Your problem is that Foo does not behave the same as xrange: xrange gives you a new iterator each time you asks its iter method, while Foo gives you always the same, meaning that once it is exhausted the object is too:
>>> a = Foo(3)
>>> list(a)
[0, 1, 2]
>>> list(a)
[]
>>> a = range(3)
>>> list(a)
[0, 1, 2]
>>> list(a)
[0, 1, 2]
I could easily confirm that the __len__ method is called by list by adding spys to your methods:
class Bar(Foo):
def __len__(self):
print "LEN"
return sum(1 for _ in self.generator)
(and I added a print "ITERATOR" in Foo.__iter__). It yields:
>>> list(Bar(3))
LEN
ITERATOR
[]
I can only imagine two workarounds:
my preferred one: return a new iterator on each call to __iter__ at Foo level to mimic xrange:
class Foo(object):
def __init__(self, n):
self.n = n
def __iter__(self):
print "ITERATOR"
return ( x for x in range(self.n))
class Bar(Foo):
def __len__(self):
print "LEN"
return sum(1 for _ in self.generator)
we get correctly:
>>> list(Bar(3))
ITERATOR
LEN
ITERATOR
[0, 1, 2]
the alternative: change len to not call the iterator and let Foo untouched:
class Bar(Foo):
def __init__(self, n):
self.len = n
super(Bar, self).__init__(n)
def __len__(self):
print "LEN"
return self.len
Here again we get:
>>> list(Bar(3))
LEN
ITERATOR
[0, 1, 2]
but Foo and Bar objects are exhausted once first iterator reaches its end.
But I must admit that I do not know the context of your real classes...

This behaviour might be annoying but it's actually quite understandable. Internally a list is simply an array and an array is a fixed size datastructure. The result of this is that if you have a list that has size n and you want to add an extra item to reach n+1 it will have to create a whole new array and completely copy the old one to the new one. Effectively your list.append(x) is now a O(n) operation instead of the regular O(1).
To prevent this, list() tries to get the size of your input so it can guess what size the array needs to be.
So one solution for this problem is to force it to guess by using iter:
list(iter(Bar(3)))

Related

How can I have multiple iterators over a single python iterable at the same time?

I would like to compare all elements in my iterable object combinatorically with each other. The following reproducible example just mimics the functionality of a plain list, but demonstrates my problem. In this example with a list of ["A","B","C","D"], I would like to get the following 16 lines of output, every combination of each item with each other. A list of 100 items should generate 100*100=10,000 lines.
A A True
A B False
A C False
... 10 more lines ...
D B False
D C False
D D True
The following code seemed like it should do the job.
class C():
def __init__(self):
self.stuff = ["A","B","C","D"]
def __iter__(self):
self.idx = 0
return self
def __next__(self):
self.idx += 1
if self.idx > len(self.stuff):
raise StopIteration
else:
return self.stuff[self.idx - 1]
thing = C()
for x in thing:
for y in thing:
print(x, y, x==y)
But after finishing the y-loop, the x-loop seems done, too, even though it's only used the first item in the iterable.
A A True
A B False
A C False
A D False
After much searching, I eventually tried the following code, hoping that itertools.tee would allow me two independent iterators over the same data:
import itertools
thing = C()
thing_one, thing_two = itertools.tee(thing)
for x in thing_one:
for y in thing_two:
print(x, y, x==y)
But I got the same output as before.
The real-world object this represents is a model of a directory and file structure with varying numbers of files and subdirectories, at varying depths into the tree. It has nested links to thousands of members and iterates correctly over them once, just like this example. But it also does expensive processing within its many internal objects on-the-fly as needed for comparisons, which would end up doubling the workload if I had to make a complete copy of it prior to iterating. I would really like to use multiple iterators, pointing into a single object with all the data, if possible.
Edit on answers: The critical flaw in the question code, pointed out in all answers, is the single internal self.idx variable being unable to handle multiple callers independently. The accepted answer is the best for my real class (oversimplified in this reproducible example), another answer presents a simple, elegant solution for simpler data structures like the list presented here.
It's actually impossible to make a container class that is it's own iterator. The container shouldn't know about the state of the iterator and the iterator doesn't need to know the contents of the container, it just needs to know which object is the corresponding container and "where" it is. If you mix iterator and container different iterators will share state with each other (in your case the self.idx) which will not give the correct results (they read and modify the same variable).
That's the reason why all built-in types have a seperate iterator class (and even some have an reverse-iterator class):
>>> l = [1, 2, 3]
>>> iter(l)
<list_iterator at 0x15e360c86d8>
>>> reversed(l)
<list_reverseiterator at 0x15e360a5940>
>>> t = (1, 2, 3)
>>> iter(t)
<tuple_iterator at 0x15e363fb320>
>>> s = '123'
>>> iter(s)
<str_iterator at 0x15e363fb438>
So, basically you could just return iter(self.stuff) in __iter__ and drop the __next__ altogether because list_iterator knows how to iterate over the list:
class C:
def __init__(self):
self.stuff = ["A","B","C","D"]
def __iter__(self):
return iter(self.stuff)
thing = C()
for x in thing:
for y in thing:
print(x, y, x==y)
prints 16 lines, like expected.
If your goal is to make your own iterator class, you need two classes (or 3 if you want to implement the reversed-iterator yourself).
class C:
def __init__(self):
self.stuff = ["A","B","C","D"]
def __iter__(self):
return C_iterator(self)
def __reversed__(self):
return C_reversed_iterator(self)
class C_iterator:
def __init__(self, parent):
self.idx = 0
self.parent = parent
def __iter__(self):
return self
def __next__(self):
self.idx += 1
if self.idx > len(self.parent.stuff):
raise StopIteration
else:
return self.parent.stuff[self.idx - 1]
thing = C()
for x in thing:
for y in thing:
print(x, y, x==y)
works as well.
For completeness, here's one possible implementation of the reversed-iterator:
class C_reversed_iterator:
def __init__(self, parent):
self.parent = parent
self.idx = len(parent.stuff) + 1
def __iter__(self):
return self
def __next__(self):
self.idx -= 1
if self.idx <= 0:
raise StopIteration
else:
return self.parent.stuff[self.idx - 1]
thing = C()
for x in reversed(thing):
for y in reversed(thing):
print(x, y, x==y)
Instead of defining your own iterators you could use generators. One way was already shown in the other answer:
class C:
def __init__(self):
self.stuff = ["A","B","C","D"]
def __iter__(self):
yield from self.stuff
def __reversed__(self):
yield from self.stuff[::-1]
or explicitly delegate to a generator function (that's actually equivalent to the above but maybe more clear that it's a new object that is produced):
def C_iterator(obj):
for item in obj.stuff:
yield item
def C_reverse_iterator(obj):
for item in obj.stuff[::-1]:
yield item
class C:
def __init__(self):
self.stuff = ["A","B","C","D"]
def __iter__(self):
return C_iterator(self)
def __reversed__(self):
return C_reverse_iterator(self)
Note: You don't have to implement the __reversed__ iterator. That was just meant as additional "feature" of the answer.
Your __iter__ is completely broken. Instead of actually making a fresh iterator on every call, it just resets some state on self and returns self. That means you can't actually have more than one iterator at a time over your object, and any call to __iter__ while another loop over the object is active will interfere with the existing loop.
You need to actually make a new object. The simplest way to do that is to use yield syntax to write a generator function. The generator function will automatically return a new iterator object every time:
class C(object):
def __init__(self):
self.stuff = ['A', 'B', 'C', 'D']
def __iter__(self):
for thing in self.stuff:
yield thing

Python dynamic function attribute

I came across an interesting issue while trying to achieve dynamic sort.
Given the following code:
>>> l = []
>>> for i in range(2):
>>> def f():
>>> return f.v
>>> f.v = i
>>> l.append(f)
You have to be careful about how to use the functions in l:
>>> l[0]()
1
>>> l[1]()
1
>>> [h() for h in l]
[1, 1]
>>> [f() for f in l]
[0, 1]
>>> f = l[0]
>>> f()
0
>>> k = l[1]
>>> k()
0
>>> f = l[1]
>>> k()
1
>>> del f
>>> k()
NameError: global name 'f' is not defined
The behavior of the function depends on what f currently is.
What should I do to avoid this issue? How can I set a function attribute that does not depends on the function's name?
Update
Reading your comments and answers, here is my actual problem.
I have some data that I want to sort according to user input (so I don't know sorting criteria in advance). User can choose on which part of the data to apply successive sorts, and these sorts can be ascending or descending.
So my first try was to loop over the user inputs, define a function for each criterion, store this function in a list and then use this list for sorted's key like this: key=lambda x: [f(x) for f in functions]. To avoid multiplying conditions into functions themselves, I was computing some needed values before the function definition and binding them to the function (different functions with different pre-computed values).
While debugging, I understood that function attribute was not the solution here, so I indeed wrote a class with a __call__ method.
The issue is due to the fact that return f.v loads the global f, and not the one you intend.1 You can see this by disassembling the code:
>>> dis.dis(l[0])
3 0 LOAD_GLOBAL 0 (f)
3 LOAD_ATTR 1 (v)
6 RETURN_VALUE
After the loop that populates l, f is a reference to the last closure created, as you can see here:
>>> l
[<function f at 0x02594170>, <function f at 0x02594130>]
>>> f
<function f at 0x02594130>
Thus, when you call l[0](), it still loads the f that points to the last function created, and it returns 1. When you redefined f by doing f = l[0], then the global f now points to the first function.
What you seem to want is a function that has a state, which really is a class. You could therefore do something like this:
class MyFunction:
def __init__(self, v):
self.v = v
def __call__(self):
return self.v
l = [MyFunction(i) for i in range(2)]
l[0]() # 0
l[1]() # 1
Though it may be a good idea to explain your actual problem first, as there might be a better solution.
1: Why doesn't it load the global f and not the current instance, you may ask?
Recall that when you create a class, you need to pass a self argument, like so:
# ...
def my_method(self):
return self.value
self is actually a reference to the current instance of your object. That's how Python knows where to load the attribute value. It knows it has to look into the instance referenced by self. So when you do:
a.value = 1
a.my_method()
self is now a reference to a.
So when you do:
def f():
return f.v
There's no way for Python to know what f actually is. It's not a parameter, so it has to load it from elsewhere. In your case, it's loaded from the global variables.
Thus, when you do f.v = i, while you do set an attribute v for the instance of f, there's no way to know which instance you are referring to in the body of your function.
Note that what you are doing here:
def f():
return f.v
is not making a function which returns whatever its own v attribute is. It's returning whatever the f object's v attribute is. So it necessarily depends on the value of f. It's not that your v attribute "depends on the function's name". It really has nothing at all to do with the function's name.
Later, when you do
>>> f = l[0]
>>> k = l[1]
>>> k()
0
What you have done is bound k to the function at l[1]. When you call it, you of course get f.v, because that's what the function does.
But notice:
>>> k.v
1
>>> [h.v for h in l]
[0, 1]
So, a function is an object, and just like most objects, it can have attributes assigned to it (which you can access using dot notation, or the getattr() function, or inspecting the object's dictionary, etc.). But a function is not designed to access its own attributes from within its own code. For that, you want to use a class (as demonstrated by #VincentSavard).
In your particular case, the effect you seem to be after doesn't really need an "attribute" per se; you are apparently looking for a closure. You can implement a closure using a class, but a lighter-weight way is a nested function (one form of which is demonstrated by #TomKarzes; you could also use a named inner function instead of lambda).
Try this:
l = []
for i in range(2):
def f(n):
return lambda: n
l.append(f(i))
This doesn't use attributes, but creates a closure for each value of i. The value of n is then locked once f returns. Here's some sample output:
>>> [f() for f in l]
[0, 1]
As others said, return f.v looks for f name in the current scope which is equal to the last defined function.
To work around this you can simulate functions:
>>> class Function(object):
... def __init__(self, return_value):
... self.return_value = return_value
... def __call__(self):
... return self.return_value
...
>>> l = []
>>> for i in range(2):
... l.append(Function(i))
...
>>> l[0]()
>>> 0
>>> l[1]()
>>> 1

restartable generator available in Python Standard Library?

I can't imagine that I'm the first to write a class like this:
class RestartableGenerator:
def __init__(self, g):
self.g = g
def __iter__(self):
return self.g().__iter__()
if __name__=='__main__':
def gen():
print 'Generating'
for i in range(5):
yield i
i = RestartableGenerator(gen)
print 'Using'
print list(i)
print list(i)
The test produces this output:
Using
Generating
[0, 1, 2, 3, 4]
Generating
[0, 1, 2, 3, 4]
But I didn't find it in the Standard Library. I looked in itertools and functools.
Is it really not there? If it is, where?
Was it considered unnecessary, because when you want to evaluate a sequence multiple times, you better store it in a list?
Edit 1:
My use case is that I want it to be tranparent for the consumer that the sequence is, for memory consumption reasons, a generator instead of a list.
Edit 2: If there's no such class in the Standard Library, what name do you think is appropriate? ParenthesisRemover? MultipleTimesIterable? Anything else? Why?
You could simplify it a little:
class RestartableGenerator:
def __init__(self, g):
self.g = g
def __iter__(self):
return self.g()
Calling gen() returns a generator object. The generator object has a next method, which is the kind of object which __iter__ must return.
There is no need for RestartableGenerator, however, since it does nothing that gen itself can not do. Instead of holding gen in a class attribute, just hold on to gen itself.
def gen():
print 'Generating'
for i in range(5):
yield i
print 'Using'
print list(gen())
print list(gen())
This isn't exactly restartable in the sense of going back to the beginning of the sequence. Each __iter__ call creates a new generator that will rerun the generator code, potentially reexecuting side effects and producing different results. If you want independent iterators over a generated sequence, that's what list or itertools.tee are for. Otherwise, it's clearer to explicitly call the generator function again, so this isn't very useful. You save a pair of parentheses at the cost of less explicit, more bug-prone code.
Note that if you want a lazy sequence type, where iterating over it generates elements on the fly, but you can iterate over it repeatedly, you should define its __iter__ method as a generator:
class Primes(object):
def __iter__(self):
for i in itertools.count():
if is_prime(i):
yield i
This isn't a "restartable generator", but it sounds like what you want.

How to make a repeating generator in Python

How do you make a repeating generator, like xrange, in Python? For instance, if I do:
>>> m = xrange(5)
>>> print list(m)
>>> print list(m)
I get the same result both times — the numbers 0..4. However, if I try the same with yield:
>>> def myxrange(n):
... i = 0
... while i < n:
... yield i
... i += 1
>>> m = myxrange(5)
>>> print list(m)
>>> print list(m)
The second time I try to iterate over m, I get nothing back — an empty list.
Is there a simple way to create a repeating generator like xrange with yield, or generator comprehensions? I found a workaround on a Python tracker issue, which uses a decorator to transform a generator into an iterator. This restarts every time you start using it, even if you didn't use all the values last time through, just like xrange. I also came up with my own decorator, based on the same idea, which actually returns a generator, but one which can restart after throwing a StopIteration exception:
#decorator.decorator
def eternal(genfunc, *args, **kwargs):
class _iterable:
iter = None
def __iter__(self): return self
def next(self, *nargs, **nkwargs):
self.iter = self.iter or genfunc(*args, **kwargs):
try:
return self.iter.next(*nargs, **nkwargs)
except StopIteration:
self.iter = None
raise
return _iterable()
Is there a better way to solve the problem, using only yield and/or generator comprehensions? Or something built into Python? So I don't need to roll my own classes and decorators?
Update
The comment by u0b34a0f6ae nailed the source of my misunderstanding:
xrange(5) does not return an iterator, it creates an xrange object. xrange objects can be iterated, just like dictionaries, more than once.
My "eternal" function was barking up the wrong tree entirely, by acting like an iterator/generator (__iter__ returns self) rather than like a collection/xrange (__iter__ returns a new iterator).
Not directly. Part of the flexibility that allows generators to be used for implementing co-routines, resource management, etc, is that they are always one-shot. Once run, a generator cannot be re-run. You would have to create a new generator object.
However, you can create your own class which overrides __iter__(). It will act like a reusable generator:
def multigen(gen_func):
class _multigen(object):
def __init__(self, *args, **kwargs):
self.__args = args
self.__kwargs = kwargs
def __iter__(self):
return gen_func(*self.__args, **self.__kwargs)
return _multigen
#multigen
def myxrange(n):
i = 0
while i < n:
yield i
i += 1
m = myxrange(5)
print list(m)
print list(m)
Using itertools its super easy.
import itertools
alist = [1,2,3]
repeatingGenerator = itertools.cycle(alist)
print(next(generatorInstance)) #=> yields 1
print(next(generatorInstance)) #=> yields 2
print(next(generatorInstance)) #=> yields 3
print(next(generatorInstance)) #=> yields 1 again!
If you write a lot of these, John Millikin's answer is the cleanest it gets.
But if you don't mind adding 3 lines and some indentation, you can do it without a custom decorator. This composes 2 tricks:
[Generally useful:] You can easily make a class iterable without implementing
.next() - just use a generator for __iter__(self)!
Instead of bothering with a constructor, you can define a one-off class inside a function.
=>
def myxrange(n):
class Iterable(object):
def __iter__(self):
i = 0
while i < n:
yield i
i += 1
return Iterable()
Small print: I didn't test performance, spawning classes like this might be wasteful. But awesome ;-)
I think the answer to that is "No". I'm possibly wrong. It may be that with some of the funky new things you can do with generators in 2.6 involving arguments and exception handling that would allow something like what you want. But those features are mostly intended for implementing semi-continuations.
Why do you want to not have your own classes or decorators? And why did you want to create a decorator that returned a generator instead of a class instance?
You can reset iterators with more_itertools.seekable, a third-party tool.
Install via > pip install more_itertools.
import more_itertools as mit
def myxrange(n):
"""Yield integers."""
i = 0
while i < n:
yield i
i += 1
m = mit.seekable(myxrange(5))
print(list(m))
m.seek(0) # reset iterator
print(list(m))
# [0, 1, 2, 3, 4]
# [0, 1, 2, 3, 4]
Note: memory consumption grows while advancing an iterator, so be wary wrapping large iterables.
use this solution:
>>> myxrange_ = lambda x: myxrange(x)
>>> print list(myxrange_(5))
... [0, 1, 2, 3, 4]
>>> print list(myxrange_(5))
... [0, 1, 2, 3, 4]
>>> for number in myxrange_(5):
... print number
...
0
1
2
3
4
>>>
and with a decorator:
>>> def decorator(generator):
... return lambda x: generator(x)
...
>>> #decorator
>>> def myxrange(n):
... i = 0
... while i < n:
... yield i
... i += 1
...
>>> print list(myxrange(5))
... [0, 1, 2, 3, 4]
>>> print list(myxrange(5))
... [0, 1, 2, 3, 4]
>>>
Simple.

Resetting generator object in Python

I have a generator object returned by multiple yield. Preparation to call this generator is rather time-consuming operation. That is why I want to reuse the generator several times.
y = FunctionWithYield()
for x in y: print(x)
#here must be something to reset 'y'
for x in y: print(x)
Of course, I'm taking in mind copying content into simple list. Is there a way to reset my generator?
See also: How to look ahead one element (peek) in a Python generator?
Generators can't be rewound. You have the following options:
Run the generator function again, restarting the generation:
y = FunctionWithYield()
for x in y: print(x)
y = FunctionWithYield()
for x in y: print(x)
Store the generator results in a data structure on memory or disk which you can iterate over again:
y = list(FunctionWithYield())
for x in y: print(x)
# can iterate again:
for x in y: print(x)
The downside of option 1 is that it computes the values again. If that's CPU-intensive you end up calculating twice. On the other hand, the downside of 2 is the storage. The entire list of values will be stored on memory. If there are too many values, that can be unpractical.
So you have the classic memory vs. processing tradeoff. I can't imagine a way of rewinding the generator without either storing the values or calculating them again.
You could also use tee as suggested by other answers, however that would still store the entire list in memory in your case, so it would be the same results and similar performance to option 2.
Another option is to use the itertools.tee() function to create a second version of your generator:
import itertools
y = FunctionWithYield()
y, y_backup = itertools.tee(y)
for x in y:
print(x)
for x in y_backup:
print(x)
This could be beneficial from memory usage point of view if the original iteration might not process all the items.
>>> def gen():
... def init():
... return 0
... i = init()
... while True:
... val = (yield i)
... if val=='restart':
... i = init()
... else:
... i += 1
>>> g = gen()
>>> g.next()
0
>>> g.next()
1
>>> g.next()
2
>>> g.next()
3
>>> g.send('restart')
0
>>> g.next()
1
>>> g.next()
2
Probably the most simple solution is to wrap the expensive part in an object and pass that to the generator:
data = ExpensiveSetup()
for x in FunctionWithYield(data): pass
for x in FunctionWithYield(data): pass
This way, you can cache the expensive calculations.
If you can keep all results in RAM at the same time, then use list() to materialize the results of the generator in a plain list and work with that.
I want to offer a different solution to an old problem
class IterableAdapter:
def __init__(self, iterator_factory):
self.iterator_factory = iterator_factory
def __iter__(self):
return self.iterator_factory()
squares = IterableAdapter(lambda: (x * x for x in range(5)))
for x in squares: print(x)
for x in squares: print(x)
The benefit of this when compared to something like list(iterator) is that this is O(1) space complexity and list(iterator) is O(n). The disadvantage is that, if you only have access to the iterator, but not the function that produced the iterator, then you cannot use this method. For example, it might seem reasonable to do the following, but it will not work.
g = (x * x for x in range(5))
squares = IterableAdapter(lambda: g)
for x in squares: print(x)
for x in squares: print(x)
Using a wrapper function to handle StopIteration
You could write a simple wrapper function to your generator-generating function that tracks when the generator is exhausted. It will do so using the StopIteration exception a generator throws when it reaches end of iteration.
import types
def generator_wrapper(function=None, **kwargs):
assert function is not None, "Please supply a function"
def inner_func(function=function, **kwargs):
generator = function(**kwargs)
assert isinstance(generator, types.GeneratorType), "Invalid function"
try:
yield next(generator)
except StopIteration:
generator = function(**kwargs)
yield next(generator)
return inner_func
As you can spot above, when our wrapper function catches a StopIteration exception, it simply re-initializes the generator object (using another instance of the function call).
And then, assuming you define your generator-supplying function somewhere as below, you could use the Python function decorator syntax to wrap it implicitly:
#generator_wrapper
def generator_generating_function(**kwargs):
for item in ["a value", "another value"]
yield item
If GrzegorzOledzki's answer won't suffice, you could probably use send() to accomplish your goal. See PEP-0342 for more details on enhanced generators and yield expressions.
UPDATE: Also see itertools.tee(). It involves some of that memory vs. processing tradeoff mentioned above, but it might save some memory over just storing the generator results in a list; it depends on how you're using the generator.
If your generator is pure in a sense that its output only depends on passed arguments and the step number, and you want the resulting generator to be restartable, here's a sort snippet that might be handy:
import copy
def generator(i):
yield from range(i)
g = generator(10)
print(list(g))
print(list(g))
class GeneratorRestartHandler(object):
def __init__(self, gen_func, argv, kwargv):
self.gen_func = gen_func
self.argv = copy.copy(argv)
self.kwargv = copy.copy(kwargv)
self.local_copy = iter(self)
def __iter__(self):
return self.gen_func(*self.argv, **self.kwargv)
def __next__(self):
return next(self.local_copy)
def restartable(g_func: callable) -> callable:
def tmp(*argv, **kwargv):
return GeneratorRestartHandler(g_func, argv, kwargv)
return tmp
#restartable
def generator2(i):
yield from range(i)
g = generator2(10)
print(next(g))
print(list(g))
print(list(g))
print(next(g))
outputs:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[]
0
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1
From official documentation of tee:
In general, if one iterator uses most or all of the data before
another iterator starts, it is faster to use list() instead of tee().
So it's best to use list(iterable) instead in your case.
You can define a function that returns your generator
def f():
def FunctionWithYield(generator_args):
code here...
return FunctionWithYield
Now you can just do as many times as you like:
for x in f()(generator_args): print(x)
for x in f()(generator_args): print(x)
I'm not sure what you meant by expensive preparation, but I guess you actually have
data = ... # Expensive computation
y = FunctionWithYield(data)
for x in y: print(x)
#here must be something to reset 'y'
# this is expensive - data = ... # Expensive computation
# y = FunctionWithYield(data)
for x in y: print(x)
If that's the case, why not reuse data?
There is no option to reset iterators. Iterator usually pops out when it iterate through next() function. Only way is to take a backup before iterate on the iterator object. Check below.
Creating iterator object with items 0 to 9
i=iter(range(10))
Iterating through next() function which will pop out
print(next(i))
Converting the iterator object to list
L=list(i)
print(L)
output: [1, 2, 3, 4, 5, 6, 7, 8, 9]
so item 0 is already popped out. Also all the items are popped as we converted the iterator to list.
next(L)
Traceback (most recent call last):
File "<pyshell#129>", line 1, in <module>
next(L)
StopIteration
So you need to convert the iterator to lists for backup before start iterating.
List could be converted to iterator with iter(<list-object>)
You can now use more_itertools.seekable (a third-party tool) which enables resetting iterators.
Install via > pip install more_itertools
import more_itertools as mit
y = mit.seekable(FunctionWithYield())
for x in y:
print(x)
y.seek(0) # reset iterator
for x in y:
print(x)
Note: memory consumption grows while advancing the iterator, so be wary of large iterables.
You can do that by using itertools.cycle()
you can create an iterator with this method and then execute a for loop over the iterator which will loop over its values.
For example:
def generator():
for j in cycle([i for i in range(5)]):
yield j
gen = generator()
for i in range(20):
print(next(gen))
will generate 20 numbers, 0 to 4 repeatedly.
A note from the docs:
Note, this member of the toolkit may require significant auxiliary storage (depending on the length of the iterable).
How it's work for me.
csv_rows = my_generator()
for _ in range(10):
for row in csv_rows:
print(row)
csv_rows = my_generator()
Ok, you say you want to call a generator multiple times, but initialization is expensive... What about something like this?
class InitializedFunctionWithYield(object):
def __init__(self):
# do expensive initialization
self.start = 5
def __call__(self, *args, **kwargs):
# do cheap iteration
for i in xrange(5):
yield self.start + i
y = InitializedFunctionWithYield()
for x in y():
print x
for x in y():
print x
Alternatively, you could just make your own class that follows the iterator protocol and defines some sort of 'reset' function.
class MyIterator(object):
def __init__(self):
self.reset()
def reset(self):
self.i = 5
def __iter__(self):
return self
def next(self):
i = self.i
if i > 0:
self.i -= 1
return i
else:
raise StopIteration()
my_iterator = MyIterator()
for x in my_iterator:
print x
print 'resetting...'
my_iterator.reset()
for x in my_iterator:
print x
https://docs.python.org/2/library/stdtypes.html#iterator-types
http://anandology.com/python-practice-book/iterators.html
My answer solves slightly different problem: If the generator is expensive to initialize and each generated object is expensive to generate. But we need to consume the generator multiple times in multiple functions. In order to call the generator and each generated object exactly once we can use threads and Run each of the consuming methods in different thread. We may not achieve true parallelism due to GIL, but we will achieve our goal.
This approach did a good job in the following case: deep learning model processes a lot of images. The result is a lot of masks for a lot of objects on the image. Each mask consumes memory. We have around 10 methods which make different statistics and metrics, but they take all the images at once. All the images cannot fit in memory. The moethods can easily be rewritten to accept iterator.
class GeneratorSplitter:
'''
Split a generator object into multiple generators which will be sincronised. Each call to each of the sub generators will cause only one call in the input generator. This way multiple methods on threads can iterate the input generator , and the generator will cycled only once.
'''
def __init__(self, gen):
self.gen = gen
self.consumers: List[GeneratorSplitter.InnerGen] = []
self.thread: threading.Thread = None
self.value = None
self.finished = False
self.exception = None
def GetConsumer(self):
# Returns a generator object.
cons = self.InnerGen(self)
self.consumers.append(cons)
return cons
def _Work(self):
try:
for d in self.gen:
for cons in self.consumers:
cons.consumed.wait()
cons.consumed.clear()
self.value = d
for cons in self.consumers:
cons.readyToRead.set()
for cons in self.consumers:
cons.consumed.wait()
self.finished = True
for cons in self.consumers:
cons.readyToRead.set()
except Exception as ex:
self.exception = ex
for cons in self.consumers:
cons.readyToRead.set()
def Start(self):
self.thread = threading.Thread(target=self._Work)
self.thread.start()
class InnerGen:
def __init__(self, parent: "GeneratorSplitter"):
self.parent: "GeneratorSplitter" = parent
self.readyToRead: threading.Event = threading.Event()
self.consumed: threading.Event = threading.Event()
self.consumed.set()
def __iter__(self):
return self
def __next__(self):
self.readyToRead.wait()
self.readyToRead.clear()
if self.parent.finished:
raise StopIteration()
if self.parent.exception:
raise self.parent.exception
val = self.parent.value
self.consumed.set()
return val
Ussage:
genSplitter = GeneratorSplitter(expensiveGenerator)
metrics={}
executor = ThreadPoolExecutor(max_workers=3)
f1 = executor.submit(mean,genSplitter.GetConsumer())
f2 = executor.submit(max,genSplitter.GetConsumer())
f3 = executor.submit(someFancyMetric,genSplitter.GetConsumer())
genSplitter.Start()
metrics.update(f1.result())
metrics.update(f2.result())
metrics.update(f3.result())
If you want to reuse this generator multiple times with a predefined set of arguments, you can use functools.partial.
from functools import partial
func_with_yield = partial(FunctionWithYield, arg0, arg1)
for i in range(100):
for x in func_with_yield():
print(x)
This will wrap the generator function in another function so each time you call func_with_yield() it creates the same generator function.
It can be done by code object. Here is the example.
code_str="y=(a for a in [1,2,3,4])"
code1=compile(code_str,'<string>','single')
exec(code1)
for i in y: print i
1
2
3
4
for i in y: print i
exec(code1)
for i in y: print i
1
2
3
4

Categories

Resources