Turning any single-argument function into a generator function?

Turning any single-argument function into a generator function? - python

In David Beazley's talk on generators, he shows how to create a generator function from any single-argument function thus:
def generate(func):
def gen_func(s):
for item in s:
yield func(item)
return gen_func
and illustrates it with math.sqrt:
gen_sqrt = generate(math.sqrt)
for x in gen_sqrt(xrange(100)):
print x
So why does:
gen_sum = generate(sum)
for x in gen_sum([1,2,3,4,5]):
print x
produce:
TypeError Traceback (most recent call last)
<ipython-input-73-ef521f2bbfc8> in <module>()
1 gen_sum = generate(sum)
----> 2 for x in gen_sum(1):
3 print x
<ipython-input-50-3c0ba12c2429> in gen_func(s)
1 def generate(func):
2 def gen_func(s): # closure
----> 3 for item in s:
4 yield func(item)
5 return gen_func
TypeError: 'int' object is not iterable
Is it more accurate to say that the function being a single-argument function is a necessary but insufficient condition for this approach to work? And that the other necessary condition is that the single argument must be a single item (and not a sequence)?

You're passing a list whose elements are the wrong type. Of course it's not going to work, for the same reason that gen_sqrt(['a', 's', 'd', 'f']) wouldn't have worked.
You need to pass gen_sum a list of things it makes sense to call sum on, such as other lists:
for x in gen_sum([[1, 2], [3, 4]]):
print x
Output:
3
7

You are correct, both are necessary requirements:
def generate(func):
def gen_func(s):
for item in s: # <--- requires s to be interable
yield func(item) # <--- requires func to accept a single argument
return gen_func
So in
gen_sum = generate(func)
for x in gen_sum(s):
print x
func must accept a single argument and s must be iterable.

generate is the generator version of map.
map(lambda x: x*2, range(5)) [0, 2, 4, 6, 8]
It takes the input range and applies the function the each of the element in the range.
generate does the same, but it doesn't return a list. It yields the result of each transformation.
Now take a look at your example. What would be the first result? sum(1).
But sum expects a list, not an integer, hence the error message.

Related

"TypeError: <lambda>() takes 1 positional argument but 2 were given" using reduce()

I want to return sum of square of numbers passed in list.
from functools import reduce
def square_sum(numbers):
return reduce(lambda x: x ** 2, numbers)
print(square_sum([1, 2, 2]))
However i am getting the error: TypeError: <lambda>() takes 1 positional argument but 2 were given.
I couldn't understand reason behind it.

Here's how you might define sum if it didn't exist:
from functools import reduce
def sum(it):
return reduce(lambda acc, val: acc + val, it)
Or:
from functools import reduce
import operator
def sum(it):
return reduce(operator.add, it)
functools.reduce reduces the values produced by an iterator to a single value by repeatedly combining consecutive values using the function you provide. So the function needs to be able to combine two values and therefore must take two arguments.
So you could define sum_of_squares using reduce, like this, although there are a lot of corner cases to cope with:
from functools import reduce
def sum_of_squares(it):
it = iter(it)
try:
first = next(it)
except StopIteration:
return 0
return reduce(lambda acc, val: acc + val * val,
it,
first * first)
Personally, I think the following is clearer:
def sum_of_squares(it):
return sum(map(lambda x: x ** 2, it))

The function parameter to reduce() should take two arguments: an old one and a new one. To sum, you'd just need to add them together:
lambda r, x: x**2 + r
However, that doesn't actually do what you want because the first element never gets squared (so it doesn't get the right answer if the first element is >1). You might be thinking reduce() is like sum(map()):
def square_sum(numbers):
return sum(map(lambda x: x**2, numbers))
But it's more readable to replace the map with a generator expression:
def square_sum(numbers):
return sum(x**2 for x in numbers)
print(square_sum([1, 2, 2])) # -> 9

How do I resue a function in another function?

I have created a function in python to print the square value of a list's elements as shown below:
def square_list(list1):
lst = []
for i in list1:
lst.append(i*i)
return lst
x = [2,4,6,8]
print(square_list(x))
This is the output [4, 16, 36, 64]
I would like to reuse this function again as shown below:
n = [2,4,6,8]
print(list(map(square_list,n)))
But it shows this type of error as shown below:
TypeError Traceback (most recent call last)
<ipython-input-80-9c542410d2fa> in <module>
1 # list=[m,n,p] + f()==> Map ==> modified list = [f(m),f(n),f(p)]
2 n = [2,4,6,8]
----> 3 print(list(map(square_list,n)))
<ipython-input-79-51bec1661935> in square_list(list1)
1 def square_list(list1):
2 lst = []
----> 3 for i in list1:
4 lst.append(i*i)
5 return lst
TypeError: 'int' object is not iterable
May you please tell me where is my mistake and explain to me?

map() iterates over the list and passes each element to the function. That means the function needs to accept the individual elements of the list, not the list itself. You can't use the same function for both. map() simplifies the function to just:
def square_list(n):
return n * n
n = [2,4,6,8]
print(list(map(square_list,n)))
# [4, 16, 36, 64]
because it takes care of the iteration as part of the process.

If you want to use the same function for both cases, you can check the type of the parameter.
def square(x):
if type(x) is list:
lst = []
for i in x:
lst.append(i*i)
return lst
return x * x
Demo

optional multiple return values

I have a function that most of the time should return a single value, but sometimes I need a second value returned from the function. Here I found how to return multiple values, but as most of the time I need only one of them I would like to write something like this:
def test_fun():
return 1,2
def test_call():
x = test_fun()
print x
but calling this results in
>>> test_call()
(1,2)
and when trying to return more than two, as in
def test_fun2():
return 1,2,3
def test_call2():
x,y = test_fun2()
print x,y
I get an error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "my_module.py", line 47, in test_call2
x,y = test_fun2()
ValueError: too many values to unpack
I am thinking about something like in matlab, where x = test_fun() would result in x == 1 (while [x y] = test_fun() would also work as expected). Is something like that possible in python?

You can use star unpacking to gather all additional return values into a list:
x, *y = fun()
x will contain the first return value. y will be a list of the remaining values. y will be empty if there is only one return value. This particular example will only work if the function returns a tuple, even if there is only one value.
When fun always returns 1 or 2 values, you can just do
if y:
print(y[0])
else:
print('only one value')
If, on the other hand, you want to completely ignore the number of return values, do
*x = fun()
Now all the arguments will be gathered into the list. You can then print it with either
print(x)
or
print(*x)
The latter will pass each element as a separate argument, exactly as if you did
x, y, z = fun()
print(x, y, z)
The reason to use *x = fun() instead of just x = fun() is to get an error immediately when a function returns something that isn't a tuple. Think of it as an assertion to remind you to write fun properly.
Since this form of star unpacking only works in Python 3, your only option in Python 2 is to do
x = fun()
and to inspect the result manually.

There are several ways to get multiple return values.
Example 1:
def test_fun():
return 1,2
def test_call():
x, y = test_fun()
print x
print y
you will get correct output:
1
2
When you would like to ignore several return values, you can use * before a variable in python3.
Example 2:
def test_fun2():
return 1,2,3
def test_call2():
x, *y = test_fun2()
print x
print y
you will get the result:
1
(2, 3)

How list unpacking works when passed as argument to function with variable parameters?

I have a function which calculates arithmetic mean of variable parameter list. It has one positional parameter and rest variable parameter. So the code is shown below.
def mean(x, *l):
sum = x
for i in l:
sum += i
return sum / (1.0 + len(l))
Now I define a list containing var arguments
z = [1,2,3,4]
Now I call my function
mean(*z) which prints 2.5 which is correct.
So what happened here? My understanding is when I did *z it unpacked the list z.
But then how it picked the 1st positional parameter from the list and kept the rest of the list intact to get the length of 'l' as defined in the function mean. Did it unpack list z just to extract the 1st element from z then keeping rest of z as it is? If that can be done with a list then how?
Also, If I call the function with z alone as argument, it throws error
ari_mean(z)
Traceback (most recent call last):
File "", line 1, in
File "", line 5, in ari_mean
TypeError: unsupported operand type(s) for /: 'list' and 'float'
Thanks.

When you call mean(*z), you are correct in that Python unpacks z, which means that your function call is equivalent (in this case), to calling mean(1, 2, 3, 4)
Now, on to the second part of your question:
Did it unpack list z just to extract the 1st element from z then keeping rest of z as it is?
Not really. First, z was unpacked and each argument passed in separately (as mentioned above). So now you have we look at the definition of mean: def mean(x, *l):. This definition is expecting at least one positional argument (x), and than any number of extra arguments. So, because your initial call of mean(*z) got turned into mean(1, 2, 3, 4), then inside your mean, x is equal to 1 and *l becomes the tuple (2, 3, 4).
Also, If I call the function with z alone as argument, it throws error
If you just call the function with z alone (mean(z)), then, going back to your function definition, x will be the list [1,2,3,4] and l will be an empty tuple. Because l is an empty tuple, nothing happens in the for-loop, and you get to the last line, return sum / (1.0 + len(l)). Now, because x is a list, Python raises an exception because it does not know how to compute [1,2,3,4] / 1.0

Running the code below can show you how it works, the comments present the way the arguments are passed in the function.
def mean(x, *l):
sum=x
print("x:",x)
print("l:",l)
print("len(l):",len(l))
for i in l:
sum += i
return sum / (1.0+len(l))
z = [1,2,3,4]
print("1st call:")
print("result:",mean(*z)) ##x = z[0] and l=tuple(z[1:])
print("\n\n2nd call:")
print("result:",mean(z)) ##x = z and l = tuple()

Resetting generator object in Python

I have a generator object returned by multiple yield. Preparation to call this generator is rather time-consuming operation. That is why I want to reuse the generator several times.
y = FunctionWithYield()
for x in y: print(x)
#here must be something to reset 'y'
for x in y: print(x)
Of course, I'm taking in mind copying content into simple list. Is there a way to reset my generator?
See also: How to look ahead one element (peek) in a Python generator?

Generators can't be rewound. You have the following options:
Run the generator function again, restarting the generation:
y = FunctionWithYield()
for x in y: print(x)
y = FunctionWithYield()
for x in y: print(x)
Store the generator results in a data structure on memory or disk which you can iterate over again:
y = list(FunctionWithYield())
for x in y: print(x)
# can iterate again:
for x in y: print(x)
The downside of option 1 is that it computes the values again. If that's CPU-intensive you end up calculating twice. On the other hand, the downside of 2 is the storage. The entire list of values will be stored on memory. If there are too many values, that can be unpractical.
So you have the classic memory vs. processing tradeoff. I can't imagine a way of rewinding the generator without either storing the values or calculating them again.
You could also use tee as suggested by other answers, however that would still store the entire list in memory in your case, so it would be the same results and similar performance to option 2.

Another option is to use the itertools.tee() function to create a second version of your generator:
import itertools
y = FunctionWithYield()
y, y_backup = itertools.tee(y)
for x in y:
print(x)
for x in y_backup:
print(x)
This could be beneficial from memory usage point of view if the original iteration might not process all the items.

>>> def gen():
... def init():
... return 0
... i = init()
... while True:
... val = (yield i)
... if val=='restart':
... i = init()
... else:
... i += 1
>>> g = gen()
>>> g.next()
0
>>> g.next()
1
>>> g.next()
2
>>> g.next()
3
>>> g.send('restart')
0
>>> g.next()
1
>>> g.next()
2

Probably the most simple solution is to wrap the expensive part in an object and pass that to the generator:
data = ExpensiveSetup()
for x in FunctionWithYield(data): pass
for x in FunctionWithYield(data): pass
This way, you can cache the expensive calculations.
If you can keep all results in RAM at the same time, then use list() to materialize the results of the generator in a plain list and work with that.

I want to offer a different solution to an old problem
class IterableAdapter:
def __init__(self, iterator_factory):
self.iterator_factory = iterator_factory
def __iter__(self):
return self.iterator_factory()
squares = IterableAdapter(lambda: (x * x for x in range(5)))
for x in squares: print(x)
for x in squares: print(x)
The benefit of this when compared to something like list(iterator) is that this is O(1) space complexity and list(iterator) is O(n). The disadvantage is that, if you only have access to the iterator, but not the function that produced the iterator, then you cannot use this method. For example, it might seem reasonable to do the following, but it will not work.
g = (x * x for x in range(5))
squares = IterableAdapter(lambda: g)
for x in squares: print(x)
for x in squares: print(x)

Using a wrapper function to handle StopIteration
You could write a simple wrapper function to your generator-generating function that tracks when the generator is exhausted. It will do so using the StopIteration exception a generator throws when it reaches end of iteration.
import types
def generator_wrapper(function=None, **kwargs):
assert function is not None, "Please supply a function"
def inner_func(function=function, **kwargs):
generator = function(**kwargs)
assert isinstance(generator, types.GeneratorType), "Invalid function"
try:
yield next(generator)
except StopIteration:
generator = function(**kwargs)
yield next(generator)
return inner_func
As you can spot above, when our wrapper function catches a StopIteration exception, it simply re-initializes the generator object (using another instance of the function call).
And then, assuming you define your generator-supplying function somewhere as below, you could use the Python function decorator syntax to wrap it implicitly:
#generator_wrapper
def generator_generating_function(**kwargs):
for item in ["a value", "another value"]
yield item

If GrzegorzOledzki's answer won't suffice, you could probably use send() to accomplish your goal. See PEP-0342 for more details on enhanced generators and yield expressions.
UPDATE: Also see itertools.tee(). It involves some of that memory vs. processing tradeoff mentioned above, but it might save some memory over just storing the generator results in a list; it depends on how you're using the generator.

If your generator is pure in a sense that its output only depends on passed arguments and the step number, and you want the resulting generator to be restartable, here's a sort snippet that might be handy:
import copy
def generator(i):
yield from range(i)
g = generator(10)
print(list(g))
print(list(g))
class GeneratorRestartHandler(object):
def __init__(self, gen_func, argv, kwargv):
self.gen_func = gen_func
self.argv = copy.copy(argv)
self.kwargv = copy.copy(kwargv)
self.local_copy = iter(self)
def __iter__(self):
return self.gen_func(*self.argv, **self.kwargv)
def __next__(self):
return next(self.local_copy)
def restartable(g_func: callable) -> callable:
def tmp(*argv, **kwargv):
return GeneratorRestartHandler(g_func, argv, kwargv)
return tmp
#restartable
def generator2(i):
yield from range(i)
g = generator2(10)
print(next(g))
print(list(g))
print(list(g))
print(next(g))
outputs:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[]
0
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1

From official documentation of tee:
In general, if one iterator uses most or all of the data before
another iterator starts, it is faster to use list() instead of tee().
So it's best to use list(iterable) instead in your case.

You can define a function that returns your generator
def f():
def FunctionWithYield(generator_args):
code here...
return FunctionWithYield
Now you can just do as many times as you like:
for x in f()(generator_args): print(x)
for x in f()(generator_args): print(x)

I'm not sure what you meant by expensive preparation, but I guess you actually have
data = ... # Expensive computation
y = FunctionWithYield(data)
for x in y: print(x)
#here must be something to reset 'y'
# this is expensive - data = ... # Expensive computation
# y = FunctionWithYield(data)
for x in y: print(x)
If that's the case, why not reuse data?

There is no option to reset iterators. Iterator usually pops out when it iterate through next() function. Only way is to take a backup before iterate on the iterator object. Check below.
Creating iterator object with items 0 to 9
i=iter(range(10))
Iterating through next() function which will pop out
print(next(i))
Converting the iterator object to list
L=list(i)
print(L)
output: [1, 2, 3, 4, 5, 6, 7, 8, 9]
so item 0 is already popped out. Also all the items are popped as we converted the iterator to list.
next(L)
Traceback (most recent call last):
File "<pyshell#129>", line 1, in <module>
next(L)
StopIteration
So you need to convert the iterator to lists for backup before start iterating.
List could be converted to iterator with iter(<list-object>)

You can now use more_itertools.seekable (a third-party tool) which enables resetting iterators.
Install via > pip install more_itertools
import more_itertools as mit
y = mit.seekable(FunctionWithYield())
for x in y:
print(x)
y.seek(0) # reset iterator
for x in y:
print(x)
Note: memory consumption grows while advancing the iterator, so be wary of large iterables.

You can do that by using itertools.cycle()
you can create an iterator with this method and then execute a for loop over the iterator which will loop over its values.
For example:
def generator():
for j in cycle([i for i in range(5)]):
yield j
gen = generator()
for i in range(20):
print(next(gen))
will generate 20 numbers, 0 to 4 repeatedly.
A note from the docs:
Note, this member of the toolkit may require significant auxiliary storage (depending on the length of the iterable).

How it's work for me.
csv_rows = my_generator()
for _ in range(10):
for row in csv_rows:
print(row)
csv_rows = my_generator()

Ok, you say you want to call a generator multiple times, but initialization is expensive... What about something like this?
class InitializedFunctionWithYield(object):
def __init__(self):
# do expensive initialization
self.start = 5
def __call__(self, *args, **kwargs):
# do cheap iteration
for i in xrange(5):
yield self.start + i
y = InitializedFunctionWithYield()
for x in y():
print x
for x in y():
print x
Alternatively, you could just make your own class that follows the iterator protocol and defines some sort of 'reset' function.
class MyIterator(object):
def __init__(self):
self.reset()
def reset(self):
self.i = 5
def __iter__(self):
return self
def next(self):
i = self.i
if i > 0:
self.i -= 1
return i
else:
raise StopIteration()
my_iterator = MyIterator()
for x in my_iterator:
print x
print 'resetting...'
my_iterator.reset()
for x in my_iterator:
print x
https://docs.python.org/2/library/stdtypes.html#iterator-types
http://anandology.com/python-practice-book/iterators.html

My answer solves slightly different problem: If the generator is expensive to initialize and each generated object is expensive to generate. But we need to consume the generator multiple times in multiple functions. In order to call the generator and each generated object exactly once we can use threads and Run each of the consuming methods in different thread. We may not achieve true parallelism due to GIL, but we will achieve our goal.
This approach did a good job in the following case: deep learning model processes a lot of images. The result is a lot of masks for a lot of objects on the image. Each mask consumes memory. We have around 10 methods which make different statistics and metrics, but they take all the images at once. All the images cannot fit in memory. The moethods can easily be rewritten to accept iterator.
class GeneratorSplitter:
'''
Split a generator object into multiple generators which will be sincronised. Each call to each of the sub generators will cause only one call in the input generator. This way multiple methods on threads can iterate the input generator , and the generator will cycled only once.
'''
def __init__(self, gen):
self.gen = gen
self.consumers: List[GeneratorSplitter.InnerGen] = []
self.thread: threading.Thread = None
self.value = None
self.finished = False
self.exception = None
def GetConsumer(self):
# Returns a generator object.
cons = self.InnerGen(self)
self.consumers.append(cons)
return cons
def _Work(self):
try:
for d in self.gen:
for cons in self.consumers:
cons.consumed.wait()
cons.consumed.clear()
self.value = d
for cons in self.consumers:
cons.readyToRead.set()
for cons in self.consumers:
cons.consumed.wait()
self.finished = True
for cons in self.consumers:
cons.readyToRead.set()
except Exception as ex:
self.exception = ex
for cons in self.consumers:
cons.readyToRead.set()
def Start(self):
self.thread = threading.Thread(target=self._Work)
self.thread.start()
class InnerGen:
def __init__(self, parent: "GeneratorSplitter"):
self.parent: "GeneratorSplitter" = parent
self.readyToRead: threading.Event = threading.Event()
self.consumed: threading.Event = threading.Event()
self.consumed.set()
def __iter__(self):
return self
def __next__(self):
self.readyToRead.wait()
self.readyToRead.clear()
if self.parent.finished:
raise StopIteration()
if self.parent.exception:
raise self.parent.exception
val = self.parent.value
self.consumed.set()
return val
Ussage:
genSplitter = GeneratorSplitter(expensiveGenerator)
metrics={}
executor = ThreadPoolExecutor(max_workers=3)
f1 = executor.submit(mean,genSplitter.GetConsumer())
f2 = executor.submit(max,genSplitter.GetConsumer())
f3 = executor.submit(someFancyMetric,genSplitter.GetConsumer())
genSplitter.Start()
metrics.update(f1.result())
metrics.update(f2.result())
metrics.update(f3.result())

If you want to reuse this generator multiple times with a predefined set of arguments, you can use functools.partial.
from functools import partial
func_with_yield = partial(FunctionWithYield, arg0, arg1)
for i in range(100):
for x in func_with_yield():
print(x)
This will wrap the generator function in another function so each time you call func_with_yield() it creates the same generator function.

It can be done by code object. Here is the example.
code_str="y=(a for a in [1,2,3,4])"
code1=compile(code_str,'<string>','single')
exec(code1)
for i in y: print i
1
2
3
4
for i in y: print i
exec(code1)
for i in y: print i
1
2
3
4

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Turning any single-argument function into a generator function? - python

Related

"TypeError: <lambda>() takes 1 positional argument but 2 were given" using reduce()

How do I resue a function in another function?

optional multiple return values

How list unpacking works when passed as argument to function with variable parameters?

Resetting generator object in Python

Categories

Resources