I'm reading about generators on http://www.dabeaz.com/generators/
(which is very fine, informative article even if it's a ppt slide)
It has the following section about creating geneators
Any single-argument function is easy to turn
into a generator function
def generate(func):
def gen_func(s):
for item in s:
yield func(item)
return gen_func
• Example:
gen_sqrt = generate(math.sqrt)
for x in gen_sqrt(range(100)):
print(x)
I don't see the point of this slide. (it's on 114p of the slide)
Isn't it just (math.sqrt(e) for e in range(100))
What is he acomplishing with generate function?
The point of such higher-order functions is to allow multiple inputs to a function to be chosen at different times/places:
def filter_lines(f,filt):
with open(f) as f:
for l in f:
print(' '.join(list(filt(map(float,l.split())))))
This can accept any kind of iterable-transformer as filt, like
def ints(it):
for f in it:
if f==int(f): yield f
or the result of generate:
filter_lines("…",ints)
filter_lines("…",list) # the identity: print all
filter_lines("…",generate(math.sqrt))
filter_lines("…",generate(abs))
Therefore we can see that generate transforms a function of one element into a function of iterables of elements. (This is what is meant by “turn into a generator function”.) We can go one further:
import functools
filter_lines("…",functools.partial(map,math.sqrt))
from which we can conclude that generate itself is equivalent to functools.partial(functools.partial,map). Applying partial twice like that splits a parameter list in two, changing a normal function into a higher-order function.
Related
I have several test functions named test0 through test{n} which provide outputs for the script I am testing.
Instead of going through the cumbersome process of adding them all to a list and iterating through the list, I was wondering if there is a way to iterate through them in a similar way to an f'string.
E.g:
def t0()
def t9()
numTest = 10
def trueFunc(tn())
for i in range(numTest):
trueFunc(t{i}())
While you could use eval to get the function from a string representing its name, a much better approach is to put the functions into a list, and iterate over the list:
def f0(): return 0
def f1(): return 1
list_of_functions = [f0, f1]
for func in list_of_functions:
print(func())
This will work with functions without numbers in their names, and even for anonymous lambda functions that don't ever get a name.
I have a series of connected generators and I want to create a filter that can be used to wrap one of the generators. This filter wrapper should take a generator and a function as parameters. If a data item in the incoming stream does not pass the requirements of the filter, it should be passed downstream to the next generator without going through the wrapped generator. I have made a working example here that should make it more clear as to what I am trying to achieve:
import functools
is_less_than_three = lambda x : True if x < 3 else False
def add_one(numbers):
print("new generator created")
for number in numbers:
yield number + 1
def wrapper(generator1, filter_):
#functools.wraps(generator1)
def wrapped(generator2):
for data in generator2:
if filter_(data):
yield from generator1([data])
else:
yield data
return wrapped
add_one_to_numbers_less_than_three = wrapper(add_one, is_less_than_three)
answers = add_one_to_numbers_less_than_three(range(6))
for answer in answers:
print(answer)
#new generator created
#1
#new generator created
#2
#new generator created
#3
#3
#4
#5
The problem with this is that it requires creating a new generator for each data item. There must be a better way? I have also tried using itertools.tee and splitting the generator, but this causes memory problems when the generators yield values at different rates (they do). How can I accomplish what the above code does without re-creating generators and without causing memory problems?
edited to add background information below
As input I will receive large video streams. The video streams may or may not end (could be a webcam). Users are able to choose which image processing steps are carried out on the video frames, thus the order and number of functions will change. Subsequently, the functions should be able to take each other's outputs as inputs.
I have accomplished this by using a series of generators. The input:output ratio of the generators/functions is variable - it could be 1:n, 1:1, or n:1 (for example, extracting several objects (subimages) from an image to be processed separately).
Currently these generators take a few parameters that are repeated among them (not DRY) and I am trying to decrease the number of parameters by refactoring them into separate generators or wrappers. One of the more difficult ones is a filter on the data stream to determine whether or not a function should be applied to the frame (the function could be cpu-intensive and not needed on all frames).
The number of parameters makes the usage of the function more difficult for the user to understand. It also makes it more difficult for me in that whenever I want to make a change to one of the common parameters, I have to edit it for all functions.
edit2 renamed function to generator in example code to make it more clear
edit3 the solution
Thank you #Blckknght. This can be solved by creating an infinite iterator that passes the value of a local variable to the generator. I modified my example slightly to change add_one to a 1:n generator instead of a 1:1 generator to show how this solution can also work for 1:n generators.
import functools
is_less_than_three = lambda x : True if x < 3 else False
def add_one(numbers):
print("new generator created")
for number in numbers:
if number == 0:
yield number - 1
yield number
else:
yield number
def wrapper(generator1, filter_):
#functools.wraps(generator1)
def wrapped(generator2):
local_variable_passer = generator1(iter(lambda: data, object()))
for data in generator2:
if filter_(data):
next_data = next(local_variable_passer)
if data == 0:
yield next_data
next_data = next(local_variable_passer)
yield next_data
else:
yield next_data
else:
yield data
return wrapped
add_one_to_numbers_less_than_three = wrapper(add_one, is_less_than_three)
answers = add_one_to_numbers_less_than_three(range(6))
for answer in answers:
print(answer)
#new generator created
#-1
#0
#1
#2
#3
#3
#4
#5
As I understand your problem, you have a stream of video frames, and you're trying to create a pipeline of processing functions that modify the stream. Different processing functions might change the number of frames, so a single input frame could result in multiple output frames, or multiple input frames could be consumed before a single output frame is produced. Some functions might be 1:1, but that's not something you can count on.
Your current implementation uses generator functions for all the processing. The output function iterates on the chain, and each processing step in the pipeline requests frames from the one before it using iteration.
The function you're trying to write right now is a sort of selective bypass. You want for some frames (those meeting some condition) to get passed in to some already existing generator function, but other frames to skip over the processing and just go directly into the output. Unfortunately, that's probably not possible to do with Python generators. The iteration protocol is just not sophisticated enough to support it.
First off, it is possible to do this for 1:1 with generators, but you can't easily generalize to n:1 or 1:n cases. Here's what it might look like for 1:1:
def selective_processing_1to1(processing_func, condition, input_iterable):
processing_iterator = processing_func(iter(lambda: input_value, object()))
for input_value in input_iterator:
if condition(input_value):
yield next(processing_iterator)
else:
yield input_value
There's a lot of work being done in the processing_iterator creation step. By using the two-argument form of iter with a lambda function and a sentinel object (that will never be yielded), I'm creating an infinite iterator that always yields the current value of the local variable input_value. Then I pass that iterator it to the processing_func function. I can selectively call next on the generator object if I want to apply the processing the filter represents to the current value, or I can just yield the value myself without processing it.
But because this only works on one frame at a time, it won't do for n:1 or 1:n filters (and I don't even want to think about m:n kinds of scenarios).
A "peekable" iterator that lets you see what the next value is going to be before you iterate onto it might let you support a limited form of selective filtering for n:1 processes (that is, where a possibly-variable n input frames go into one output frame). The limitation is that you can only do the selective filtering on the first of the n frames that is going to be consumed by the processing, the others will get taken without you getting a chance to check them first. Maybe that's good enough?
Anyway, here's what that looks like:
_sentinel = object()
class PeekableIterator:
def __init__(self, input_iterable):
self.iterator = iter(input_iterable)
self.next_value = next(self.iterator, _sentinel)
def __iter__(self):
return self
def __next__(self):
if self.next_value != _sentinel:
return_value = self.next_value
self.next_value = next(self.iterator, _sentinel)
return return_value
raise StopIteration
def peek(self): # this is not part of the iteration protocol!
if self.next_value != _sentinel:
return self.next_value
raise ValueError("input exhausted")
def selective_processing_Nto1(processing_func, condition, input_iterable):
peekable = PeekableIterator(input_iterable)
processing_iter = processing_func(peekable)
while True:
try:
value = peekable.peek()
print(value, condition(value))
except ValueError:
return
try:
yield next(processing_iter) if condition(value) else next(peekable)
except StopIteration:
return
This is as good as we can practically do when the processing function is a generator. If we wanted to do more, such as supporting 1:n processing, we'd need some way to know how large the n was going to be, so we could get that many values before deciding if we will pass on the next input value or not. While you could write a custom class for the processing that would report that, it is probably less convenient than just calling the processing function repeatedly as you do in the question.
The architecture is a conditional map - as such, each item must be mapped individually. This means the function should receive one number, not many numbers.
As long as there is a stateless 1:1 connection, use a function instead of a generator.
def add_one(number): # takes one number
return number + 1 # provides one number
def conditional_map(function, condition):
#functools.wraps(function)
def wrapped(generator):
return (
function(item) if condition(item)
else item for item in generator
)
return wrapped
for answer in conditional_map(add_one, lambda x: x < 3)(range(6)):
print(answer)
If data must be passed to a stateful "generator", it is a coroutine and should be designed as such. This means that yield is used both to receive and provide data.
from itertools import count
def add_increment(start=0):
# initially receive data
number = yield
for increment in count(start):
# provide and receive data
number = yield number + increment
Since this is still a 1:1 connection, it can be used with the previous conditional_map.
mapper = add_increment()
next(mapper) # prime the coroutine - this could be done with a decorator
for answer in conditional_map(mapper.send, lambda x: x < 3)(range(6)):
print(answer)
If 1:n connections are needed, expect to receive a generator for each input.
def add_some(number): # takes one number
yield number - 1
yield number
yield number + 1
def conditional_map(function, condition):
#functools.wraps(function)
def wrapped(generator):
for data in generator:
if filter_(data):
yield from function(data) # passes one *one* item
else:
yield data
return wrapped
If a stateful 1:n connection is required, a coroutine that produces a generator/iterable can be used.
def add_increments(start=0):
# initially receive data
number = yield
for increment in count(start):
# provide and receive data
number = yield (number + increment + i for i in (-1, 0, 1))
It really seems like you all are making this too complicated.
If you think of a data processing pipeline as
source -> transform -> filter -> sink
where source, transform, filter are all generators.
this is similar to Unix pipelines
cat f | tr 'a' 'A' | grep 'word' > /dev/null
then you can see how a pipeline works (conceptually).
One big difference is that Unix pipelines push data, where with Python generators you pull data.
Using some of your functions:
# this is a source
def add_one(numbers):
print("new generator created")
# this is the output that becomes the next function's input
for number in numbers:
yield number + 1
# this is a transform
def filter(input, predicate):
for item in input:
if predicate(item):
yield item
# this is the sink
def save(input, filename):
with open(filename, 'w') as f:
for item in input:
f.write(item)
To put the pipeline of generators together in python you start with the source, then pass it to a transform or filter as a parameter that can be iterated over. Of course each of the generators has a yield statement. Finally the outermost function is the sink and it consumes the values while it iterates.
It looks like this. You can see how the predicate function is passed to the filter function in addition to the "source" of its data.
# now run the pipeline
save(filter(add_one(range(20), is_less_than_three), 'myfile')
Some find that this looks awkward but if you think of mathematical notation it is easier. I am sure you have seen f(g(x)) which is exactly the same notation.
You could also write it as:
save(filter(add_one(range(20),
is_less_than_three),
'myfile')
which shows better how the parameters are used.
To recap
The pipeline is a generator. In this case it won't have a generator as its source. It may have non-generator input such as a list of numbers (your example), or create them some other way such as reading a file.
Transform generators always have a generator for their source, and they yield output to their "sink". In other words a transform acts like a sink to its source and like a source to its sink.
The sink is final part of a pipeline is a sink that just iterates over its source input. It consumes its input and doesn't yield any output. Its job is to consume the items, by processing, saving printing or whatever.
A transform is a m to n function that for m inputs produces n outputs, meaning it can filter out some inputs and not pass them, or product multiple outputs by creating new items. An example might be transforming a video stream from 60fps to 30fps. For every two input frames it produces one output frame.
I don't even know if it is the proper way to put it, but I recently had trouble while trying to use a method from an object, both as a map engine (mapping closure to elements of an iterator) and as a generator of generator.
I is probably much more simple to explain this through a code example:
class maybe_generator():
def __init__(self, doer):
self.doer = doer
def give(self):
for i in [1,2,3]:
self.doer(i)
def printer(x):
print('This is {}'.format(x))
def gener(x):
yield(x)
p = maybe_generator(printer)
p.give()
g = maybe_generator(gener)
print('Type of result is {}'.format(g.give()))
Output is
This is 1
This is 2
This is 3
Type of result is None
I would have expected the g object ot be of type generator instead of NoneType. Then I wonder how it is possible to implement a function that can potentially generate a generator, or directly perform some border effect on the iterable.
Thank you in advance for your help
Ok, I finally found what I was looking for. having a function that works both as a mapping engine and a genrator may be possible with some hacks/tricks, but what I wanted in my use case was essentially getting a recursive generator.
This can be easily done with the keyword
yield from
The code now looks like something like that:
class maybe_generator():
def __init__(self, doer):
self.doer = doer
def give(self):
for i in [1,2,3]:
yield from self.doer(i)
def gener(x):
yield(x)
g = maybe_generator(gener)
gen = g.give()
print('Type of result is {}'.format(gen))
for k in gen:
print('value is {}'.format(k))
It was actually also worth taking a look at this advanced series of course on generator and coroutines: http://dabeaz.com/coroutines/
I am a c++ guy, learning the lambda function in python and wanna know it inside out. did some seraches before posting here. anyway, this piece of code came up to me.
<1> i dont quite understand the purpose of lambda function here. r we trying to get a function template? If so, why dont we just set up 2 parameters in the function input?
<2> also, make_incrementor(42), at this moment is equivalent to return x+42, and x is the 0,1 in f(0) and f(1)?
<3> for f(0), does it not have the same effect as >>>f = make_incrementor(42)? for f(0), what are the values for x and n respectively?
any commments are welcome! thanks.
>>> def make_incrementor(n):
... return lambda x: x + n
...
>>> f = make_incrementor(42)
>>> f(0)
42
>>> f(1)
43
Yes, this is similar to a C++ int template. However, instead of at compile time (yes, Python (at least for CPython) is "compiled"), the function is created at run time. Why the lambda is used in this specific case is unclear, probably only for demonstration that functions can be returned from other functions rather than practical use. Sometimes, however, statements like this may be necessary if you need a function taking a specified number of arguments (e.g. for map, the function must take the same number of arguments as the number of iterables given to map) but the behaviour of the function should depend on other arguments.
make_incrementor returns a function that adds n (here, 42) to any x passed to that function. In your case the x values you tried are 0 and `1``
f = make_incrementor(42) sets f to a function that returns x + 42. f(0), however, returns 0 + 42, which is 42 - the returned types and values are both different, so the different expressions don't have the same effect.
The purpose is to show a toy lambda return. It lets you create a function with data baked in. I have used this less trivial example of a similar use.
def startsWithFunc(testString):
return lambda x: x.find(testString) == 0
Then when I am parsing, I create some functions:
startsDesctription = startsWithFunc("!Sample_description")
startMatrix = startsWithFunc("!series_matrix_table_begin")
Then in code I use:
while line:
#.... other stuff
if startsDesctription(line):
#do description work
if startMatrix(line):
#do matrix start work
#other stuff ... increment line ... etc
Still perhaps trival, but it shows creating general funcitons with data baked it.
I need to run several functions in a module as follws:
mylist = open('filing2.txt').read()
noTables = remove_tables(mylist)
newPassage = clean_text_passage(noTables)
replacement = replace(newPassage)
ncount = count_words(replacement)
riskcount = risk_count(ncount)
Is there any way that I can run all the functions at once? Should I make all the functions into a big function and run that big function?
Thanks.
You should make a new function in the module which executes the common sequence being used. This will require you to figure out what input arguments are required and what results to return. So given the code you posted, the new function might look something like this -- I just guessed as to what final results you might be interested in. Also note that I opened the file within a with statement to ensure that it gets closed after reading it.
def do_combination(file_name):
with open(file_name) as input:
mylist = input.read()
noTables = remove_tables(mylist)
newPassage = clean_text_passage(noTables)
replacement = replace(newPassage)
ncount = count_words(replacement)
riskcount = risk_count(ncount)
return replacement, riskcount
Example of usage:
replacement, riskcount = do_combination('filing2.txt')
If you simply store these lines in a Python (.py) file you can simply execute them.
Or am I missing something here?
Creating a function is also easy to call them though:
def main():
mylist = open('filing2.txt').read()
noTables = remove_tables(mylist)
newPassage = clean_text_passage(noTables)
replacement = replace(newPassage)
ncount = count_words(replacement)
riskcount = risk_count(ncount)
main()
As far as I understood, use need function composition. There is no special function for this in Python stdlib, but you can do this with reduce function:
funcs = [remove_tables, clean_text_passage, replace, count_words, risk_count]
do_all = lambda args: reduce(lambda prev, f: f(prev), funcs, args)
Using as
with open('filing2.txt') as f:
riskcount = do_all(f.read())
Here's another approach.
You could write a general function somewhat like that shown in the First-class composition section of the Wikipedia article on Function composition. Note that unlike in the article the functions are applied in the the order they are listed in the call to compose().
try:
from functools import reduce # Python 3 compatibility
except:
pass
def compose(*funcs, **kwargs):
"""Compose a series of functions (...(f3(f2(f1(*args, **kwargs))))) into
a single composite function which passes the result of each
function as the argument to the next, from the first to last
given.
"""
return reduce(lambda f, g:
lambda *args, **kwargs: f(g(*args, **kwargs)),
reversed(funcs))
Here's a trivial example illustrating what it does:
f = lambda x: 'f({!r})'.format(x)
g = lambda x: 'g({})'.format(x)
h = lambda x: 'h({})'.format(x)
my_composition = compose(f, g, h)
print my_composition('X')
Output:
h(g(f('X')))
Here's how it could be applied to the series of functions in your module:
my_composition = compose(remove_tables, clean_text_passage, replace,
count_words, risk_count)
with open('filing2.txt') as input:
riskcount = my_composition(input.read())