python design custom iterator - python

I am designing a custom iterator in python:
class Iterator():
def __init__(self):
return
def fit(self, n):
self.n = n
return self
def __iter__(self):
for i in range(self.n):
yield i
return
it = Iterator().fit(10)
for i in it:
print(i)
it.fit(20)
for i in it:
print(i)
It is working fine but I am wondering if it is possible that a new fit is called before that the previous one is finished leading to strange behaviour of the class.
If yes how should I design it to make it more robust?
It is important to have some parameters passed from the fit method.
EDIT: I will introduce an example that is similar to my original problem
The iterator class is designed to be used by a User class. It is important that when the evaluate method is called all the numbers until n/k are printed. Without any exception.
Maybe the use of a iterator.fit(n) method solves the problem?
class Iterator():
def __init__(self, k):
self.k = k
return
def fit(self, n):
for i in range(int(n/self.k)):
yield i
return
class User():
def __init__(self, iterator):
self.iterator = iterator
return
def evaluate(self, n):
for i in self.iterator.fit(n):
print(i)
return
it = Iterator(2)
u = User(it)
u.evaluate(10) # I want to be sure that all the numbers until 9 are printed
u.evaluate(20) # I want to be sure that all the numbers until 20 are printed

Because each call to range creates a new iterator, there will be no conflicts if you make multiple calls to fit.
Your class is a bit weird. You could either remove the __init__, as it does nothing, or put the fit method in there.
it = Iterator()
it1 = iter(it.fit(10))
it2= iter(it.fit(5))
print it1.next()
print it1.next()
print it2.next()
print it1.next()
>>0
1
0
2

You haven't actually written an iterator -- you've written a normal class that can return an iterator. The iterator that you are returning is a generator.
What this means is that calling fit() during iteration will have no effect -- at least, not until you iterate over your object again. For example:
>>> it = Iterator()
>>> for x in it.fit(7):
... it.fit(3)
... print(x)
...
0
1
2
3
4
5
6
>>> for x in it:
... print(x)
...
0
1
2

Related

python iterators, generators and in between

So I get generator functions for lazy evaluation and generator expressions, aka generator comprehensions as its syntactic sugar equivalent.
I understand classes like
class Itertest1:
def __init__(self):
self.count = 0
self.max_repeats = 100
def __iter__(self):
print("in __inter__()")
return self
def __next__(self):
if self.count >= self.max_repeats:
raise StopIteration
self.count += 1
print(self.count)
return self.count
as a way of implementing the iterator interface, i.e. iter() and next() in one and the same class.
But what then is
class Itertest2:
def __init__(self):
self.data = list(range(100))
def __iter__(self):
print("in __inter__()")
for i, dp in enumerate(self.data):
print("idx:", i)
yield dp
which uses the yield statement within the iter member function?
Also I noticed that upon calling the iter member function
it = Itertest2().__iter__()
batch = it.__next__()
the print statement is only executed when calling next() for the first time.
Is this due to this weird mixture of yield and iter? I think this is quite counter intuitive...
Something equivalent to Itertest2 could be written using a separate iterator class.
class Itertest3:
def __init__(self):
self.data = list(range(100))
def __iter__(self):
return Itertest3Iterator(self.data)
class Itertest3Iterator:
def __init__(self, data):
self.data = enumerate(data)
def __iter__(self):
return self
def __next__(self):
print("in __inter__()")
i, dp = next(self.state) # Let StopIteration exception propagate
print("idx:", i)
return dp
Compare this to Itertest1, where the instance of Itertest1 itself carried the state of the iteration around in it. Each call to Itertest1.__iter__ returned the same object (the instance of Itertest1), so they couldn't iterate over the data independently.
Notice I put print("in __iter__()") in __next__, not __iter__. As you observed, nothing in a generator function actually executes until the first call to __next__. The generator function itself only creates an generator; it does not actually start executing the code in it.
Having the yield statement anywhere in any function wraps the function code in a (native) generator object, and replaces the function with a stub that gives you said generator object.
So, here, calling __iter__ will give you an anonymous generator object that executes the code you want.
The main use case for __next__ is to provide a way to write an iterator without relying on (native) generators.
The use case of __iter__ is to distinguish between an object and an iteration state over said object. Consider code like
c = some_iterable()
for a in c:
for b in c:
# do something with a and b
You would not want the two interleaved iterations to interfere with each other's state. This is why such a loop would desugar to something like
c = some_iterable()
_iter1 = iter(c)
try:
while True:
a = next(_iter1)
_iter2 = iter(c)
try:
while True:
b = next(_iter2)
# do something with a and b
except StopIteration:
pass
except StopIteration:
pass
Typically, custom iterators implement a stub __iter__ that returns self, so that iter(iter(x)) is equivalent to iter(x). This is important when writing iterator wrappers.

How can I have multiple iterators over a single python iterable at the same time?

I would like to compare all elements in my iterable object combinatorically with each other. The following reproducible example just mimics the functionality of a plain list, but demonstrates my problem. In this example with a list of ["A","B","C","D"], I would like to get the following 16 lines of output, every combination of each item with each other. A list of 100 items should generate 100*100=10,000 lines.
A A True
A B False
A C False
... 10 more lines ...
D B False
D C False
D D True
The following code seemed like it should do the job.
class C():
def __init__(self):
self.stuff = ["A","B","C","D"]
def __iter__(self):
self.idx = 0
return self
def __next__(self):
self.idx += 1
if self.idx > len(self.stuff):
raise StopIteration
else:
return self.stuff[self.idx - 1]
thing = C()
for x in thing:
for y in thing:
print(x, y, x==y)
But after finishing the y-loop, the x-loop seems done, too, even though it's only used the first item in the iterable.
A A True
A B False
A C False
A D False
After much searching, I eventually tried the following code, hoping that itertools.tee would allow me two independent iterators over the same data:
import itertools
thing = C()
thing_one, thing_two = itertools.tee(thing)
for x in thing_one:
for y in thing_two:
print(x, y, x==y)
But I got the same output as before.
The real-world object this represents is a model of a directory and file structure with varying numbers of files and subdirectories, at varying depths into the tree. It has nested links to thousands of members and iterates correctly over them once, just like this example. But it also does expensive processing within its many internal objects on-the-fly as needed for comparisons, which would end up doubling the workload if I had to make a complete copy of it prior to iterating. I would really like to use multiple iterators, pointing into a single object with all the data, if possible.
Edit on answers: The critical flaw in the question code, pointed out in all answers, is the single internal self.idx variable being unable to handle multiple callers independently. The accepted answer is the best for my real class (oversimplified in this reproducible example), another answer presents a simple, elegant solution for simpler data structures like the list presented here.
It's actually impossible to make a container class that is it's own iterator. The container shouldn't know about the state of the iterator and the iterator doesn't need to know the contents of the container, it just needs to know which object is the corresponding container and "where" it is. If you mix iterator and container different iterators will share state with each other (in your case the self.idx) which will not give the correct results (they read and modify the same variable).
That's the reason why all built-in types have a seperate iterator class (and even some have an reverse-iterator class):
>>> l = [1, 2, 3]
>>> iter(l)
<list_iterator at 0x15e360c86d8>
>>> reversed(l)
<list_reverseiterator at 0x15e360a5940>
>>> t = (1, 2, 3)
>>> iter(t)
<tuple_iterator at 0x15e363fb320>
>>> s = '123'
>>> iter(s)
<str_iterator at 0x15e363fb438>
So, basically you could just return iter(self.stuff) in __iter__ and drop the __next__ altogether because list_iterator knows how to iterate over the list:
class C:
def __init__(self):
self.stuff = ["A","B","C","D"]
def __iter__(self):
return iter(self.stuff)
thing = C()
for x in thing:
for y in thing:
print(x, y, x==y)
prints 16 lines, like expected.
If your goal is to make your own iterator class, you need two classes (or 3 if you want to implement the reversed-iterator yourself).
class C:
def __init__(self):
self.stuff = ["A","B","C","D"]
def __iter__(self):
return C_iterator(self)
def __reversed__(self):
return C_reversed_iterator(self)
class C_iterator:
def __init__(self, parent):
self.idx = 0
self.parent = parent
def __iter__(self):
return self
def __next__(self):
self.idx += 1
if self.idx > len(self.parent.stuff):
raise StopIteration
else:
return self.parent.stuff[self.idx - 1]
thing = C()
for x in thing:
for y in thing:
print(x, y, x==y)
works as well.
For completeness, here's one possible implementation of the reversed-iterator:
class C_reversed_iterator:
def __init__(self, parent):
self.parent = parent
self.idx = len(parent.stuff) + 1
def __iter__(self):
return self
def __next__(self):
self.idx -= 1
if self.idx <= 0:
raise StopIteration
else:
return self.parent.stuff[self.idx - 1]
thing = C()
for x in reversed(thing):
for y in reversed(thing):
print(x, y, x==y)
Instead of defining your own iterators you could use generators. One way was already shown in the other answer:
class C:
def __init__(self):
self.stuff = ["A","B","C","D"]
def __iter__(self):
yield from self.stuff
def __reversed__(self):
yield from self.stuff[::-1]
or explicitly delegate to a generator function (that's actually equivalent to the above but maybe more clear that it's a new object that is produced):
def C_iterator(obj):
for item in obj.stuff:
yield item
def C_reverse_iterator(obj):
for item in obj.stuff[::-1]:
yield item
class C:
def __init__(self):
self.stuff = ["A","B","C","D"]
def __iter__(self):
return C_iterator(self)
def __reversed__(self):
return C_reverse_iterator(self)
Note: You don't have to implement the __reversed__ iterator. That was just meant as additional "feature" of the answer.
Your __iter__ is completely broken. Instead of actually making a fresh iterator on every call, it just resets some state on self and returns self. That means you can't actually have more than one iterator at a time over your object, and any call to __iter__ while another loop over the object is active will interfere with the existing loop.
You need to actually make a new object. The simplest way to do that is to use yield syntax to write a generator function. The generator function will automatically return a new iterator object every time:
class C(object):
def __init__(self):
self.stuff = ['A', 'B', 'C', 'D']
def __iter__(self):
for thing in self.stuff:
yield thing

Radix sorting, "Queue" object not iterable

I've come to an end with my assignment, I don't know where I go from where I am right now, the code is currently looking like this:
def radixsorting1(n,m):
div=1
mod=10
bin_list=[]
alist=[]
r=[]
s=[]
for bins in range(0,10):
bin_list.append(Queue())
for k in range(0,m):
r.append(random.randint(1,10**n))
if not len(r)==0:
o=max(r)
y=len(str(o))
for p in range(y):
for num in r:
minsta_tal=num%mod
minsta_tal=int(minsta_tal//div)
bin_list[minsta_tal].put(num)
new_list=[]
for bins in bin_list:
while not bins.isempty():
new_list.append(bins.dequeue())
alist=new_list
return alist
What I've been trying to do is to create 10 queues in put them in a list, then random m numbers from 1 to 10^n. Lets say I get 66 and 72, then I first sort them by the "small number", that is 6 and 2 in my numbers, then put them in a lost, and then do the process all over again but for the number 6 and 7 (the bigger number). In its current shape I get the error "Queue" object is not iterable.
My Queue class is looking like this, I think this one is okay.
class Queue:
def __init__(self):
self.lista=[]
def put(self,x):
self.lista.append(x)
def get(self):
if not len(self.lista)==0:
return self.lista.pop(0)
def isempty(self):
if len(self.lista)==0:
return True
else:
False
def length(self):
return len(self.lista)
def dequeue(self):
if not len(self.lista)==0:
n=self.lista.pop(0)
return n
You need to add a bit more code to make it an iterable. __iter__ should return an iterator. The iterator should have a next method.
Take a look at this:
Build a Basic Python Iterator
So it is my understanding that the thing you want to iterate over is the contents of self.lista... Why not just return lista's iterator.
Here is the easiest way to do that:
class Queue:
...
def __iter__(self):
return self.lista.__iter__()
It's a bit hard to see what exactly it is that you want.. If what you are trying to do is empty listaas you iterate over it (Queue is a fifo kinda deal) it then rather do this:
class Queue:
...
def __iter__(self):
return self
def next(self):
if self.lista: #since empty lists are Falsey
return self.lista.pop(0)
raise StopIteration

Static variable in Python?

In C++ we have static keyword which in loops is something like this:
for(int x=0; x<10; x++)
{
for(int y=0; y<10; y++)
{
static int number_of_times = 0;
number_of_times++;
}
}
static here makes number_of_times initialized once. How can I do same thing in python 3.x?
EDIT: Since most of the people got confused I would like to point out that the code I gave is just example of static usage in C++. My real problem is that I want to initialize only ONE time variable in function since I dont want it to be global(blah!) or default parameter..
Assuming what you want is "a variable that is initialised only once on first function call", there's no such thing in Python syntax. But there are ways to get a similar result:
1 - Use a global. Note that in Python, 'global' really means 'global to the module', not 'global to the process':
_number_of_times = 0
def yourfunc(x, y):
global _number_of_times
for i in range(x):
for j in range(y):
_number_of_times += 1
2 - Wrap you code in a class and use a class attribute (ie: an attribute that is shared by all instances). :
class Foo(object):
_number_of_times = 0
#classmethod
def yourfunc(cls, x, y):
for i in range(x):
for j in range(y):
cls._number_of_times += 1
Note that I used a classmethod since this code snippet doesn't need anything from an instance
3 - Wrap you code in a class, use an instance attribute and provide a shortcut for the method:
class Foo(object):
def __init__(self):
self._number_of_times = 0
def yourfunc(self, x, y):
for i in range(x):
for j in range(y):
self._number_of_times += 1
yourfunc = Foo().yourfunc
4 - Write a callable class and provide a shortcut:
class Foo(object):
def __init__(self):
self._number_of_times = 0
def __call__(self, x, y):
for i in range(x):
for j in range(y):
self._number_of_times += 1
yourfunc = Foo()
4 bis - use a class attribute and a metaclass
class Callable(type):
def __call__(self, *args, **kw):
return self._call(*args, **kw)
class yourfunc(object):
__metaclass__ = Callable
_numer_of_times = 0
#classmethod
def _call(cls, x, y):
for i in range(x):
for j in range(y):
cls._number_of_time += 1
5 - Make a "creative" use of function's default arguments being instantiated only once on module import:
def yourfunc(x, y, _hack=[0]):
for i in range(x):
for j in range(y):
_hack[0] += 1
There are still some other possible solutions / hacks, but I think you get the big picture now.
EDIT: given the op's clarifications, ie "Lets say you have a recursive function with default parameter but if someone actually tries to give one more argument to your function it could be catastrophic", it looks like what the OP really wants is something like:
# private recursive function using a default param the caller shouldn't set
def _walk(tree, callback, level=0):
callback(tree, level)
for child in tree.children:
_walk(child, callback, level+1):
# public wrapper without the default param
def walk(tree, callback):
_walk(tree, callback)
Which, BTW, prove we really had Yet Another XY Problem...
You can create a closure with nonlocal to make them editable (python 3.x only). Here's an example of a recursive function to calculate the length of a list.
def recursive_len(l):
res = 0
def inner(l2):
nonlocal res
if l2:
res += 1
inner(l2[1:])
inner(l)
return res
Or, you can assign an attribute to the function itself. Using the trick from here:
def fn(self):
self.number_of_times += 1
fn.func_defaults = (fn,)
fn.number_of_times = 0
fn()
fn()
fn()
print (fn.number_of_times)
Python doesn't have static variables by design. For your example, and use within loop blocks etc. in general, you just use a variable in an outer scope; if that makes it too long-lived, it might be time to consider breaking up that function into smaller ones.
For a variable that continues to exist between calls to a function, that's just reimplementing the basic idea of an object and a method on that object, so you should make one of those instead.
The another function-based way of doing this in python is:
def f(arg, static_var=[0]):
static_var[0] += arg
As the static_var object is initialised at the function definition, and then reused for all the calls, it will act like a static variable. Note that you can't just use an int, as they are immutable.
>>> def f(arg, static_var=[0]):
... static_var[0] += arg
... print(static_var[0])
...
>>> f(1)
1
>>> f(2)
3
>>> f(3)
6
You can also use the global keyword:
def main(args):
for i in xrange(10):
print i
global tmp
tmp = i
But be careful... I most cases it will add more issues than it solves.
Use defaultdict:
from collections import defaultdict
static = defaultdict(lambda: 0)
def myfunc():
for x in range(10):
for y in range(10):
static['number_of_times'] += 1

Resetting generator object in Python

I have a generator object returned by multiple yield. Preparation to call this generator is rather time-consuming operation. That is why I want to reuse the generator several times.
y = FunctionWithYield()
for x in y: print(x)
#here must be something to reset 'y'
for x in y: print(x)
Of course, I'm taking in mind copying content into simple list. Is there a way to reset my generator?
See also: How to look ahead one element (peek) in a Python generator?
Generators can't be rewound. You have the following options:
Run the generator function again, restarting the generation:
y = FunctionWithYield()
for x in y: print(x)
y = FunctionWithYield()
for x in y: print(x)
Store the generator results in a data structure on memory or disk which you can iterate over again:
y = list(FunctionWithYield())
for x in y: print(x)
# can iterate again:
for x in y: print(x)
The downside of option 1 is that it computes the values again. If that's CPU-intensive you end up calculating twice. On the other hand, the downside of 2 is the storage. The entire list of values will be stored on memory. If there are too many values, that can be unpractical.
So you have the classic memory vs. processing tradeoff. I can't imagine a way of rewinding the generator without either storing the values or calculating them again.
You could also use tee as suggested by other answers, however that would still store the entire list in memory in your case, so it would be the same results and similar performance to option 2.
Another option is to use the itertools.tee() function to create a second version of your generator:
import itertools
y = FunctionWithYield()
y, y_backup = itertools.tee(y)
for x in y:
print(x)
for x in y_backup:
print(x)
This could be beneficial from memory usage point of view if the original iteration might not process all the items.
>>> def gen():
... def init():
... return 0
... i = init()
... while True:
... val = (yield i)
... if val=='restart':
... i = init()
... else:
... i += 1
>>> g = gen()
>>> g.next()
0
>>> g.next()
1
>>> g.next()
2
>>> g.next()
3
>>> g.send('restart')
0
>>> g.next()
1
>>> g.next()
2
Probably the most simple solution is to wrap the expensive part in an object and pass that to the generator:
data = ExpensiveSetup()
for x in FunctionWithYield(data): pass
for x in FunctionWithYield(data): pass
This way, you can cache the expensive calculations.
If you can keep all results in RAM at the same time, then use list() to materialize the results of the generator in a plain list and work with that.
I want to offer a different solution to an old problem
class IterableAdapter:
def __init__(self, iterator_factory):
self.iterator_factory = iterator_factory
def __iter__(self):
return self.iterator_factory()
squares = IterableAdapter(lambda: (x * x for x in range(5)))
for x in squares: print(x)
for x in squares: print(x)
The benefit of this when compared to something like list(iterator) is that this is O(1) space complexity and list(iterator) is O(n). The disadvantage is that, if you only have access to the iterator, but not the function that produced the iterator, then you cannot use this method. For example, it might seem reasonable to do the following, but it will not work.
g = (x * x for x in range(5))
squares = IterableAdapter(lambda: g)
for x in squares: print(x)
for x in squares: print(x)
Using a wrapper function to handle StopIteration
You could write a simple wrapper function to your generator-generating function that tracks when the generator is exhausted. It will do so using the StopIteration exception a generator throws when it reaches end of iteration.
import types
def generator_wrapper(function=None, **kwargs):
assert function is not None, "Please supply a function"
def inner_func(function=function, **kwargs):
generator = function(**kwargs)
assert isinstance(generator, types.GeneratorType), "Invalid function"
try:
yield next(generator)
except StopIteration:
generator = function(**kwargs)
yield next(generator)
return inner_func
As you can spot above, when our wrapper function catches a StopIteration exception, it simply re-initializes the generator object (using another instance of the function call).
And then, assuming you define your generator-supplying function somewhere as below, you could use the Python function decorator syntax to wrap it implicitly:
#generator_wrapper
def generator_generating_function(**kwargs):
for item in ["a value", "another value"]
yield item
If GrzegorzOledzki's answer won't suffice, you could probably use send() to accomplish your goal. See PEP-0342 for more details on enhanced generators and yield expressions.
UPDATE: Also see itertools.tee(). It involves some of that memory vs. processing tradeoff mentioned above, but it might save some memory over just storing the generator results in a list; it depends on how you're using the generator.
If your generator is pure in a sense that its output only depends on passed arguments and the step number, and you want the resulting generator to be restartable, here's a sort snippet that might be handy:
import copy
def generator(i):
yield from range(i)
g = generator(10)
print(list(g))
print(list(g))
class GeneratorRestartHandler(object):
def __init__(self, gen_func, argv, kwargv):
self.gen_func = gen_func
self.argv = copy.copy(argv)
self.kwargv = copy.copy(kwargv)
self.local_copy = iter(self)
def __iter__(self):
return self.gen_func(*self.argv, **self.kwargv)
def __next__(self):
return next(self.local_copy)
def restartable(g_func: callable) -> callable:
def tmp(*argv, **kwargv):
return GeneratorRestartHandler(g_func, argv, kwargv)
return tmp
#restartable
def generator2(i):
yield from range(i)
g = generator2(10)
print(next(g))
print(list(g))
print(list(g))
print(next(g))
outputs:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[]
0
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1
From official documentation of tee:
In general, if one iterator uses most or all of the data before
another iterator starts, it is faster to use list() instead of tee().
So it's best to use list(iterable) instead in your case.
You can define a function that returns your generator
def f():
def FunctionWithYield(generator_args):
code here...
return FunctionWithYield
Now you can just do as many times as you like:
for x in f()(generator_args): print(x)
for x in f()(generator_args): print(x)
I'm not sure what you meant by expensive preparation, but I guess you actually have
data = ... # Expensive computation
y = FunctionWithYield(data)
for x in y: print(x)
#here must be something to reset 'y'
# this is expensive - data = ... # Expensive computation
# y = FunctionWithYield(data)
for x in y: print(x)
If that's the case, why not reuse data?
There is no option to reset iterators. Iterator usually pops out when it iterate through next() function. Only way is to take a backup before iterate on the iterator object. Check below.
Creating iterator object with items 0 to 9
i=iter(range(10))
Iterating through next() function which will pop out
print(next(i))
Converting the iterator object to list
L=list(i)
print(L)
output: [1, 2, 3, 4, 5, 6, 7, 8, 9]
so item 0 is already popped out. Also all the items are popped as we converted the iterator to list.
next(L)
Traceback (most recent call last):
File "<pyshell#129>", line 1, in <module>
next(L)
StopIteration
So you need to convert the iterator to lists for backup before start iterating.
List could be converted to iterator with iter(<list-object>)
You can now use more_itertools.seekable (a third-party tool) which enables resetting iterators.
Install via > pip install more_itertools
import more_itertools as mit
y = mit.seekable(FunctionWithYield())
for x in y:
print(x)
y.seek(0) # reset iterator
for x in y:
print(x)
Note: memory consumption grows while advancing the iterator, so be wary of large iterables.
You can do that by using itertools.cycle()
you can create an iterator with this method and then execute a for loop over the iterator which will loop over its values.
For example:
def generator():
for j in cycle([i for i in range(5)]):
yield j
gen = generator()
for i in range(20):
print(next(gen))
will generate 20 numbers, 0 to 4 repeatedly.
A note from the docs:
Note, this member of the toolkit may require significant auxiliary storage (depending on the length of the iterable).
How it's work for me.
csv_rows = my_generator()
for _ in range(10):
for row in csv_rows:
print(row)
csv_rows = my_generator()
Ok, you say you want to call a generator multiple times, but initialization is expensive... What about something like this?
class InitializedFunctionWithYield(object):
def __init__(self):
# do expensive initialization
self.start = 5
def __call__(self, *args, **kwargs):
# do cheap iteration
for i in xrange(5):
yield self.start + i
y = InitializedFunctionWithYield()
for x in y():
print x
for x in y():
print x
Alternatively, you could just make your own class that follows the iterator protocol and defines some sort of 'reset' function.
class MyIterator(object):
def __init__(self):
self.reset()
def reset(self):
self.i = 5
def __iter__(self):
return self
def next(self):
i = self.i
if i > 0:
self.i -= 1
return i
else:
raise StopIteration()
my_iterator = MyIterator()
for x in my_iterator:
print x
print 'resetting...'
my_iterator.reset()
for x in my_iterator:
print x
https://docs.python.org/2/library/stdtypes.html#iterator-types
http://anandology.com/python-practice-book/iterators.html
My answer solves slightly different problem: If the generator is expensive to initialize and each generated object is expensive to generate. But we need to consume the generator multiple times in multiple functions. In order to call the generator and each generated object exactly once we can use threads and Run each of the consuming methods in different thread. We may not achieve true parallelism due to GIL, but we will achieve our goal.
This approach did a good job in the following case: deep learning model processes a lot of images. The result is a lot of masks for a lot of objects on the image. Each mask consumes memory. We have around 10 methods which make different statistics and metrics, but they take all the images at once. All the images cannot fit in memory. The moethods can easily be rewritten to accept iterator.
class GeneratorSplitter:
'''
Split a generator object into multiple generators which will be sincronised. Each call to each of the sub generators will cause only one call in the input generator. This way multiple methods on threads can iterate the input generator , and the generator will cycled only once.
'''
def __init__(self, gen):
self.gen = gen
self.consumers: List[GeneratorSplitter.InnerGen] = []
self.thread: threading.Thread = None
self.value = None
self.finished = False
self.exception = None
def GetConsumer(self):
# Returns a generator object.
cons = self.InnerGen(self)
self.consumers.append(cons)
return cons
def _Work(self):
try:
for d in self.gen:
for cons in self.consumers:
cons.consumed.wait()
cons.consumed.clear()
self.value = d
for cons in self.consumers:
cons.readyToRead.set()
for cons in self.consumers:
cons.consumed.wait()
self.finished = True
for cons in self.consumers:
cons.readyToRead.set()
except Exception as ex:
self.exception = ex
for cons in self.consumers:
cons.readyToRead.set()
def Start(self):
self.thread = threading.Thread(target=self._Work)
self.thread.start()
class InnerGen:
def __init__(self, parent: "GeneratorSplitter"):
self.parent: "GeneratorSplitter" = parent
self.readyToRead: threading.Event = threading.Event()
self.consumed: threading.Event = threading.Event()
self.consumed.set()
def __iter__(self):
return self
def __next__(self):
self.readyToRead.wait()
self.readyToRead.clear()
if self.parent.finished:
raise StopIteration()
if self.parent.exception:
raise self.parent.exception
val = self.parent.value
self.consumed.set()
return val
Ussage:
genSplitter = GeneratorSplitter(expensiveGenerator)
metrics={}
executor = ThreadPoolExecutor(max_workers=3)
f1 = executor.submit(mean,genSplitter.GetConsumer())
f2 = executor.submit(max,genSplitter.GetConsumer())
f3 = executor.submit(someFancyMetric,genSplitter.GetConsumer())
genSplitter.Start()
metrics.update(f1.result())
metrics.update(f2.result())
metrics.update(f3.result())
If you want to reuse this generator multiple times with a predefined set of arguments, you can use functools.partial.
from functools import partial
func_with_yield = partial(FunctionWithYield, arg0, arg1)
for i in range(100):
for x in func_with_yield():
print(x)
This will wrap the generator function in another function so each time you call func_with_yield() it creates the same generator function.
It can be done by code object. Here is the example.
code_str="y=(a for a in [1,2,3,4])"
code1=compile(code_str,'<string>','single')
exec(code1)
for i in y: print i
1
2
3
4
for i in y: print i
exec(code1)
for i in y: print i
1
2
3
4

Categories

Resources