Expressive way compose generators in Python

Expressive way compose generators in Python - python

I really like Python generators. In particular, I find that they are just the right tool for connecting to Rest endpoints - my client code only has to iterate on the generator that is connected the the endpoint. However, I am finding one area where Python's generators are not as expressive as I would like. Typically, I need to filter the data I get out of the endpoint. In my current code, I pass a predicate function to the generator and it applies the predicate to the data it is handling and only yields data if the predicate is True.
I would like to move toward composition of generators - like data_filter(datasource( )). Here is some demonstration code that shows what I have tried. It is pretty clear why it does not work, what I am trying to figure out is what is the most expressive way of arriving at the solution:
# Mock of Rest Endpoint: In actual code, generator is
# connected to a Rest endpoint which returns dictionary(from JSON).
def mock_datasource ():
mock_data = ["sanctuary", "movement", "liberty", "seminar",
"formula","short-circuit", "generate", "comedy"]
for d in mock_data:
yield d
# Mock of a filter: simplification, in reality I am filtering on some
# aspect of the data, like data['type'] == "external"
def data_filter (d):
if len(d) < 8:
yield d
# First Try:
# for w in data_filter(mock_datasource()):
# print(w)
# >> TypeError: object of type 'generator' has no len()
# Second Try
# for w in (data_filter(d) for d in mock_datasource()):
# print(w)
# I don't get words out,
# rather <generator object data_filter at 0x101106a40>
# Using a predicate to filter works, but is not the expressive
# composition I am after
for w in (d for d in mock_datasource() if len(d) < 8):
print(w)

data_filter should apply len on the elements of d not on d itself, like this:
def data_filter (d):
for x in d:
if len(x) < 8:
yield x
now your code:
for w in data_filter(mock_datasource()):
print(w)
returns
liberty
seminar
formula
comedy

More concisely, you can do this with a generator expression directly:
def length_filter(d, minlen=0, maxlen=8):
return (x for x in d if minlen <= len(x) < maxlen)
Apply the filter to your generator just like a regular function:
for element in length_filter(endpoint_data()):
...
If your predicate is really simple, the built-in function filter may also meet your needs.

You could pass a filter function that you apply for each item:
def mock_datasource(filter_function):
mock_data = ["sanctuary", "movement", "liberty", "seminar",
"formula","short-circuit", "generate", "comedy"]
for d in mock_data:
yield filter_function(d)
def filter_function(d):
# filter
return filtered_data

What I would do is define filter(data_filter) to receive a generator as input and return a generator with values filtered by data_filter predicate (regular predicate, not aware of generator interface).
The code is:
def filter(pred):
"""Filter, for composition with generators that take coll as an argument."""
def generator(coll):
for x in coll:
if pred(x):
yield x
return generator
def mock_datasource ():
mock_data = ["sanctuary", "movement", "liberty", "seminar",
"formula","short-circuit", "generate", "comedy"]
for d in mock_data:
yield d
def data_filter (d):
if len(d) < 8:
return True
gen1 = mock_datasource()
filtering = filter(data_filter)
gen2 = filtering(gen1) # or filter(data_filter)(mock_datasource())
print(list(gen2))
If you want to further improve, may use compose which was the whole intent I think:
from functools import reduce
def compose(*fns):
"""Compose functions left to right - allows generators to compose with same
order as Clojure style transducers in first argument to transduce."""
return reduce(lambda f,g: lambda *x, **kw: g(f(*x, **kw)), fns)
gen_factory = compose(mock_datasource,
filter(data_filter))
gen = gen_factory()
print(list(gen))
PS: I used some code found here, where the Clojure guys expressed composition of generators inspired by the way they do composition generically with transducers.
PS2: filter may be written in a more pythonic way:
def filter(pred):
"""Filter, for composition with generators that take coll as an argument."""
return lambda coll: (x for x in coll if pred(x))

Here is a function I have been using to compose generators together.
def compose(*funcs):
""" Compose generators together to make a pipeline.
e.g.
pipe = compose(func1, func2, func3)
result = pipe(range(0, 5))
"""
return lambda x: reduce(lambda f, g: g(f), list(funcs), x)
Where funcs is a list of generator functions. So your example would look like
pipe = compose(mock_datasource, data_filter)
print(list(pipe))
This is not original

Related

one-liner reduce in Python3

In Python3, I am looking for a way to compute in one line a lambda function called on elements two by two. Let’s say I want to compute the LCM of a list of integers, this can be done in one line in Python2:
print reduce(lambda a,b: a * b // gcd(a, b), mylist)
Is it possible to do the same in one line Python3 (implied, without functools.reduce)?
In Python3 I know that filter, map and reduce are gone. I don’t feel I need filter and map anymore because they can be written in Python3 in a shorter and more clear fashion but I thought I could find a nice replacement for reduce as well, except I haven’t found any. I have seen many articles that suggest to use functools.reduce or to “write out the accumulation loop explicitly” but I’d like to do it without importing functools and in one line.
If it makes it any easier, I should mention I use functions that are both associative and commutative. For instance with a function f on the list [1,2,3,4], the result will be good if it either computes:
f(1,f(2,f(3,4)))
f(f(1,2),f(3,4))
f(f(3,f(1,4)),2)
or any other order

So I actually did come up with something. I do not guarantee the performance though, but it is a one-liner using exclusively lambda functions - nothing from functools or itertools, not even a single loop.
my_reduce = lambda l, f: (lambda u, a: u(u, a))((lambda v, m: None if len(m) == 0 else (m[0] if len(m) == 1 else v(v, [f(m[0], m[1])] + m[2:]))), l)
This is somewhat unreadable, so here it is expanded:
my_reduce = lambda l, f: (
lambda u, a: u(u, a)) (
(lambda v, m: None if len(m) == 0
else (m[0] if len(m) == 1
else v(v, [f(m[0], m[1])] + m[2:])
)
),
l
)
Test:
>>> f = lambda a,b: a+b
>>> my_reduce([1, 2, 3, 4], f)
10
>>> my_reduce(['a', 'b', 'c', 'd'], f)
'abcd'
Please check this other post for a deeper explanation of how this works.
The principle if to emulate a recursive function, by using a lambda function whose first parameter is a function, and will be itself.
This recursive function is embedded inside of a function that effectively triggers the recursive calling: lambda u, a: u(u, a).
Finally, everything is wrapped in a function whose parameters are a list and a binary function.
Using my_reduce with your code:
my_reduce(mylist, lambda a,b: a * b // gcd(a, b))

Assuming you have a sequence that is at least one item long you can simply define reduce recursivly like this:
def reduce(func, seq): return seq[0] if len(seq) == 1 else func(reduce(func, seq[:-1]), seq[-1])
The long version would be slightly more readable:
def reduce(func, seq):
if len(seq) == 1:
return seq[0]
else:
return func(reduce(func, seq[:-1]), seq[-1])
However that's recursive and python isn't very good at recursive calls (meaning slow and the recursion limit prevents prosessing sequences longer than 300 items). A much faster implementation would be:
def reduce(func, seq):
tmp = seq[0]
for item in seq[1:]:
tmp = func(tmp, item)
return tmp
But because of the loop it can't be put in one-line. It could be solved using side-effects:
def reduce(func, seq): d = {}; [d.__setitem__('last', func(d['last'], i)) if 'last' in d else d.__setitem__('last', i) for i in seq]; return d['last']
or:
def reduce(func, seq): d = {'last': seq[0]}; [d.__setitem__('last', func(d['last'], i)) for i in seq[1:]]; return d['last']
Which is the equivalent of:
def reduce(func, seq):
d = {}
for item in seq:
if 'last' in d:
d['last'] = func(d['last'], item)
else:
d['last'] = item
return d['last'] # or "d.get('last', 0)"
That should be faster but it's not exactly pythonic because the list-comprehension in the one-line implementation is just used because of the side-effects.

Nest multiple yield functions without eval

I have the following structure (which might need a rework but to me this feels natural):
def get(baseVar):
if type(baseVar) == GeneratorType:
yield from baseVar
else:
yield baseVar
def multiply(baseVar):
if type(baseVar) == GeneratorType:
for item in baseVar:
yield item*2
else:
yield baseVar*2
funcs = {'get' : get, 'multiply' : multiply}
result = 10
for f in funcs:
result = funcs[f](result)
print(list(result))
Another approach would be (but this isn't dynamic at all) that performance wise works like i want it to, where an iterator object is passed to each functions thus gaining more momentum (theoretically) out of the functions:
for result in multiply(get(10)):
...
How can i nest multiple yield functions in a row and pass the generator object without hard-coding the function names, getattr?

I'm not sure, what you want to do. If you have different functions, that work on single elements, use map:
def get(x):
return x
def multiply(x):
return x*2
print(list(map(multiply,map(get,[10]))
How would you like to get the names of you function? From external source, then your dict is the correct way, from internal, then you can use the functions directly:
funcs = (get, multiply)
result = [10]
for f in funcs:
result = map(funcs,result)

What is a good way to decorate an iterator to alter the value before next is called in python?

I am working on a problem that involves validating a format from within unified diff patch.
The variables within the inner format can span multiple lines at a time, so I wrote a generator that pulls each line and yields the variable when it is complete.
To avoid having to rewrite this function when reading from a unified diff file, I created a generator to strip the unified diff characters from the line before passing it to the inner format validator. However, I am getting stuck in an infinite loop (both in the code and in my head). I have abstracted to problem to the following code. I'm sure there is a better way to do this. I just don't know what it is.
from collections import Iterable
def inner_format_validator(inner_item):
# Do some validation to inner items
return inner_item[0] != '+'
def inner_gen(iterable):
for inner_item in iterable:
# Operates only on inner_info type data
yield inner_format_validator(inner_item)
def outer_gen(iterable):
class DecoratedGenerator(Iterable):
def __iter__(self):
return self
def next(self):
# Using iterable from closure
for outer_item in iterable:
self.outer_info = outer_item[0]
inner_item = outer_item[1:]
return inner_item
decorated_gen = DecoratedGenerator()
for inner_item in inner_gen(decorated_gen):
yield inner_item, decorated_gen.outer_info
if __name__ == '__main__':
def wrap(string):
# The point here is that I don't know what the first character will be
pseudo_rand = len(string)
if pseudo_rand * pseudo_rand % 2 == 0:
return '+' + string
else:
return '-' + string
inner_items = ["whatever"] * 3
# wrap screws up inner_format_validator
outer_items = [wrap("whatever")] * 3
# I need to be able to
# iterate over inner_items
for inner_info in inner_gen(inner_items):
print(inner_info)
# and iterate over outer_items
for outer_info, inner_info in outer_gen(outer_items):
# This is an infinite loop
print(outer_info)
print(inner_info)
Any ideas as to a better, more pythonic way to do this?

I would do something simpler, like this:
def outer_gen(iterable):
iterable = iter(iterable)
first_item = next(iterable)
info = first_item[0]
yield info, first_item[1:]
for item in iterable:
yield info, item
This will execute the 4 first lines only once, then enter the loop and yield what you want.
You probably want to add some try/except to cacth IndexErrors here and there.
If you want to take values while they start with something or the contrary, remember you can use a lot of stuff from the itertools toolbox, and in particular dropwhile, takewhile and chain:
>>> import itertools
>>> l = ['+foo', '-bar', '+foo']
>>> list(itertools.takewhile(lambda x: x.startswith('+'), l))
['+foo']
>>> list(itertools.dropwhile(lambda x: x.startswith('+'), l))
['-bar', '+foo']
>>> a = itertools.takewhile(lambda x: x.startswith('+'), l)
>>> b = itertools.dropwhile(lambda x: x.startswith('+'), l)
>>> list(itertools.chain(a, b))
['+foo', '-bar', '+foo']
And remember that you can create generators like comprehension lists, store them in variables and chain them, just like you would pipe linux commands:
import random
def create_item():
return random.choice(('+', '-')) + random.choice(('foo', 'bar'))
random_items = (create_item() for s in xrange(10))
added_items = ((i[0], i[1:]) for i in random_items if i.startswith('+'))
valid_items = ((prefix, line) for prefix, line in added_items if 'foo' in line)
print list(valid_items)
With all this, you should be able to find some pythonic way to solve your problem :-)

I still don't like this very much, but at least it's shorter and a tad more pythonic:
from itertools import imap, izip
from functools import partial
def inner_format_validator(inner_item):
return not inner_item.startswith('+')
inner_gen = partial(imap, inner_format_validator)
def split(astr):
return astr[0], astr[1:]
def outer_gen(iterable):
outer_stuff, inner_stuff = izip(*imap(split, iterable))
return izip(inner_gen(inner_stuff), outer_stuff)
[EDIT] inner_gen() and outer_gen() without imap and partial:
def inner_gen(iterable):
for each in iterable:
yield inner_format_validator(each)
def outer_gen(iterable):
outer_stuff, inner_stuff = izip(*(split(each) for each in iterable))
return izip(inner_gen(inner_stuff), outer_stuff)
Maybe this is a better, though different, solution:
def transmogrify(iter_of_iters, *transmogrifiers):
for iters in iter_of_iters:
yield (
trans(each) if trans else each
for trans, each in izip(transmogrifiers, iters)
)
for outer, inner in transmogrify(imap(split, stuff), inner_format_validator, None):
print inner, outer

I think it will do what you intended if you change the definition of DecoratedGenerator to this:
class DecoratedGenerator(Iterable):
def __iter__(self):
# Using iterable from closure
for outer_item in iterable:
self.outer_info = outer_item[0]
inner_item = outer_item[1:]
yield inner_item
Your original version never terminated because its next() method was stateless and would return the same value every time it was called. You didn't need to have a next() method at all, though--you can implement __iter__() yourself (as I did), and then it all works fine.

Can you dynamically combine multiple conditional functions into one in Python?

I'm curious if it's possible to take several conditional functions and create one function that checks them all (e.g. the way a generator takes a procedure for iterating through a series and creates an iterator).
The basic usage case would be when you have a large number of conditional parameters (e.g. "max_a", "min_a", "max_b", "min_b", etc.), many of which could be blank. They would all be passed to this "function creating" function, which would then return one function that checked them all. Below is an example of a naive way of doing what I'm asking:
def combining_function(max_a, min_a, max_b, min_b, ...):
f_array = []
if max_a is not None:
f_array.append( lambda x: x.a < max_a )
if min_a is not None:
f_array.append( lambda x: x.a > min_a )
...
return lambda x: all( [ f(x) for f in f_array ] )
What I'm wondering is what is the most efficient to achieve what's being done above? It seems like executing a function call for every function in f_array would create a decent amount of overhead, but perhaps I'm engaging in premature/unnecessary optimization. Regardless, I'd be interested to see if anyone else has come across usage cases like this and how they proceeded.
Also, if this isn't possible in Python, is it possible in other (perhaps more functional) languages?
EDIT: It looks like the consensus solution is to compose a string containing the full collection of conditions and then use exec or eval to generate a single function. #doublep suggests this is pretty hackish. Any thoughts on how bad this is? Is it plausible to check the arguments closely enough when composing the function that a solution like this could be considered safe? After all, whatever rigorous checking is required only needs to be performed once whereas the benefit from a faster combined conditional can be accrued over a large number of calls. Are people using stuff like this in deployment scenarios or is this mainly a technique to play around with?

Replacing
return lambda x: all( [ f(x) for f in f_array ] )
with
return lambda x: all( f(x) for f in f_array )
will give a more efficient lambda as it will stop early if any f returns a false value and doesn't need to create unnecessary list. This is only possible on Python 2.4 or 2.5 and up, though. If you need to support ancient values, do the following:
def check (x):
for f in f_array:
if not f (x):
return False
return True
return check
Finally, if you really need to make this very efficient and are not afraid of bounding-on-hackish solutions, you could try compilation at runtime:
def combining_function (max_a, min_a):
constants = { }
checks = []
if max_a is not None:
constants['max_a'] = max_a
checks.append ('x.a < max_a')
if min_a is not None:
constants['min_a'] = min_a
checks.append ('x.a > min_a')
if not checks:
return lambda x: True
else:
func = 'def check (x): return (%s)' % ') and ('.join (checks)
exec func in constants, constants
return constants['check']
class X:
def __init__(self, a):
self.a = a
check = combining_function (3, 1)
print check (X (0)), check (X (2)), check (X (4))
Note that in Python 3.x exec becomes a function, so the above code is not portable.

Based on your example, if your list of possible parameters is just a sequence of max,min,max,min,max,min,... then here's an easy way to do it:
def combining_function(*args):
maxs, mins = zip(*zip(*[iter(args)]*2))
minv = max(m for m in mins if m is not None)
maxv = min(m for m in maxs if m is not None)
return lambda x: minv < x.a < maxv
But this kind of "cheats" a bit: it precomputes the smallest maximum value and the largest minimum value. If your tests can be something more complicated than just max/min testing, the code will need to be modified.

The combining_function() interface is horrible, but if you can't change it then you could use:
def combining_function(min_a, max_a, min_b, max_b):
conditions = []
for name, value in locals().items():
if value is None:
continue
kind, sep, attr = name.partition("_")
op = {"min": ">", "max": "<"}.get(kind, None)
if op is None:
continue
conditions.append("x.%(attr)s %(op)s %(value)r" % dict(
attr=attr, op=op, value=value))
if conditions:
return eval("lambda x: " + " and ".join(conditions), {})
else:
return lambda x: True

Python, lambda, find minimum

I have foreach function which calls specified function on every element which it contains. I want to get minimum from thise elements but I have no idea how to write lambda or function or even a class that would manage that.
Thanks for every help.
I use my foreach function like this:
o.foreach( lambda i: i.call() )
or
o.foreach( I.call )
I don't like to make a lists or other objects. I want to iterate trough it and find min.
I manage to write a class that do the think but there should be some better solution than that:
class Min:
def __init__(self,i):
self.i = i
def get_min(self):
return self.i
def set_val(self,o):
if o.val < self.i: self.i = o.val
m = Min( xmin )
self.foreach( m.set_val )
xmin = m.get_min()
Ok, so I suppose that my .foreach method is non-python idea. I should do my Class iterable because all your solutions are based on lists and then everything will become easier.
In C# there would be no problem with lambda function like that, so I though that python is also that powerful.

Python has built-in support for finding minimums:
>>> min([1, 2, 3])
1
If you need to process the list with a function first, you can do that with map:
>>> def double(x):
... return x * 2
...
>>> min(map(double, [1, 2, 3]))
2
Or you can get fancy with list comprehensions and generator expressions, for example:
>>> min(double(x) for x in [1, 2, 3])
2

You can't do this with foreach and a lambda. If you want to do this in a functional style without actually using min, you'll find reduce is pretty close to the function you were trying to define.
l = [5,2,6,7,9,8]
reduce(lambda a,b: a if a < b else b, l[1:], l[0])

Writing foreach method is not very pythonic. You should better make it an iterator so that it works with standard python functions like min.
Instead of writing something like this:
def foreach(self, f):
for d in self._data:
f(d)
write this:
def __iter__(self):
for d in self._data:
yield d
Now you can call min as min(myobj).

I have foreach function which calls specified function on every element which it contains
It sounds, from the comment you subsequently posted, that you have re-invented the built-in map function.
It sounds like you're looking for something like this:
min(map(f, seq))
where f is the function that you want to call on every item in the list.
As gnibbler shows, if you want to find the value x in the sequence for which f(x) returns the lowest value, you can use:
min(seq, key=f)
...unless you want to find all of the items in seq for which f returns the lowest value. For instance, if seq is a list of dictionaries,
min(seq, key=len)
will return the first dictionary in the list with the smallest number of items, not all dictionaries that contain that number of items.
To get a list of all items in a sequence for which the function f returns the smallest value, do this:
values = map(f, seq)
result = [seq[i] for (i, v) in enumerate(values) if v == min(values)]

Okay, one thing you need to understand: lambda creates a function object for you. But so does plain, ordinary def. Look at this example:
lst = range(10)
print filter(lambda x: x % 2 == 0, lst)
def is_even(x):
return x % 2 == 0
print filter(is_even, lst)
Both of these work. They produce the same identical result. lambda makes an un-named function object; def makes a named function object. filter() doesn't care whether the function object has a name or not.
So, if your only problem with lambda is that you can't use = in a lambda, you can just make a function using def.
Now, that said, I don't suggest you use your .foreach() method to find a minimum value. Instead, make your main object return a list of values, and simply call the Python min() function.
lst = range(10)
print min(lst)
EDIT: I agree that the answer that was accepted is better. Rather than returning a list of values, it is better to define __iter__() and make the object iterable.

Suppose you have
>>> seq = range(-4,4)
>>> def f(x):
... return x*x-2
for the minimum value of f
>>> min(f(x) for x in seq)
-2
for the value of x at the minimum
>>> min(seq, key=f)
0
of course you can use lambda too
>>> min((lambda x:x*x-2)(x) for x in range(-4,4))
-2
but that is a little ugly, map looks better here
>>> min(map(lambda x:x*x-2, seq))
-2
>>> min(seq,key=lambda x:x*x-2)
0

You can use this:
x = lambda x,y,z: min(x,y,z)
print(x(3,2,1))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Expressive way compose generators in Python - python

data_filter should apply len on the elements of d not on d itself, like this: def data_filter (d): for x in d: if len(x) < 8: yield x now your code: for w in data_filter(mock_datasource()): print(w) returns liberty seminar formula comedy

Related

one-liner reduce in Python3

Nest multiple yield functions without eval

What is a good way to decorate an iterator to alter the value before next is called in python?

Can you dynamically combine multiple conditional functions into one in Python?

Python, lambda, find minimum

Categories

Resources