Split filter function? - python

Is there some function that acts as filter but also returns the list of rejected values? For example, I might want to do something like split a list into values greater and lower than 5. So if I have an array a I might say apply this hypothetical function splitfilter as follows:
lower, higher = splitfilter(lambda x: x<5, a)

This function is not built-in, to my knowledge. If we want to eagerly consume the input iterable and produce lists as output, then this is fairly straightforward to write.
def splitfilter(p, xs):
t, f = [], []
for x in xs:
if p(x):
t.append(x)
else:
f.append(x)
return t, f
On the other hand, if our input is a multi-pass iterable and we want lazy output, we can do something like this, keeping in mind that this will iterate the input list twice.
from itertools import filterfalse
def splitfilter_lazy(f, xs):
return filter(f, xs), filterfalse(f, xs)
I suspect that's exactly why it's not provided. Most of the built-in itertools functionality (and map and filter are itertools in spirit; they're just important enough to be included in builtins) take input and produce output in the form of (lazy) iterators, and with this function there's a choice of tradeoff: Do you want eager output but only one iteration on the input, or do you want proper iterators as output but to force the input to be multi-pass?

Related

How to make reduce function take three parameters in Python?

In python,reduce takes a function that accepts only two arguments.
Is there any elegant way to have a reduce that can take more than two arguments?
For example:
from operator import add
list = [1,add,2,add,3,add,4]
func = lambda operand1,operator,operand2: operator(operand1,operand2)
reduce(func,list) # mock reduce. Expected result is 10(1+2+3+4=10)
EDIT:
The reduce3 function is used to calculate the result of the abstract syntax tree(AST).
Every children of AST can be either a node or another tree.
For simplicity, assume that leaf nodes can only be number,addition symbol(+) or subtraction symbol(-).To calculate the result, one intuitive idea is to firstly work out the result of every nodes(can be implemented by recursively calls the method), and then combine them together.
For example, here's a simple AST:
Tree(
Node(1),
Node("+"),
Tree(
Node(2),
"+",
Node(3)
),
Node("-"),
Node(4)
)
After recursion, we get the following result.
If there is a reduce take 3 arguments, we can easily combine the result.
Tree(
Node(1),
Node("+"),
Node(5)
Node("-"),
Node(4)
)
This gives the exact result without 3 arguments.
It creates a partial function on each argument if arguments are like 1, add and calls the function if arguments are like partial,2
from operator import add
from functools import reduce, partial
lst = [1,add,2,add,3,add,4]
def operation_reduce(x, y):
if callable(x):
return x(y)
else:
return partial(y, x)
print(reduce(operation_reduce, lst)) #10
reduce can't be changed, but you could provide a callable that handles the differing inputs:
from functools import reduce
from operator import add, sub
L = [1, add, 2, add, 3, add, 4, sub]
class Reducer:
def __init__(self):
self.pending = 0
def __call__(self, total, value):
if callable(value):
total = value(total, self.pending)
else:
self.pending = value
return total
func = Reducer()
result = reduce(func, L, 0)
print(result)
Or you could batch the inputs in tuples, and use a callable that could handle those
def reducer(total, next_):
value, operation = next_
return operation(total, value)
result = reduce(reducer, zip(L[::2], L[1::2]), 0)
print(result)
I'd be interested to hear what is your desired use case as I believe reduce taking more than 2 arguments is not elegant, period. Here's my rationale (though it's a bit hard arguing in abstractness, with no concrete use case):
Assume the reducing operation takes 3 arguments as in your example. Let's also make initial accumulator explicit, reduce3(func,list,acc). Then in every call to func:
first argument is accumulator as usual
second argument is always an element at the odd position in the list
third argument is always at an even position in the list
Even positioned elements from the list are treated by the func differently than odd positioned elements. We can ever really need such a reduce3 function if the elements at odd and even positions are of different type!*
It is the case in the example. Mixing elements of different type in one list requires more caution not to make a mistake (mess up the order of some 2 elements and boom the reduce3 breaks) and IMO should be avoided in general.
Modifying your example:
list = [2,3,4] # we'l include 1 as initial acc
ops = [add, add, add]
func = lambda acc, pair : pair[0](acc, pair[1])
reduce(func, zip(ops, list), 1) # result is 10(1+2+3+4=10)
Reducing over a list of tuples is a general solution to your question. If mathematical expressions is what you want to evaluate you should consider their tree structure, which gets lost when collapsed to a list.
Edit:
Indeed you want to evaluate mathematical expressions. Consider changing the way you are parsing these expressions so that for the example the AST looks like:
Tree("-",
Tree("+",
Node(1),
Tree("+",
Node(2),
Node(3))),
Node(4))
# -
# / \
# + 4
# / \
# 1 +
# / \
# 2 3
That is internal nodes are always tagged with binary operation and it is only leafs that carry numbers. Now reducing tree to a value is trivial but at a cost of parsing doing most of the work.
# Here assuming some specifics of the Tree and Node classes.
# strToOpt is mapping of operation symbols to functions.
def evaluate(expr): # expr : Tree or Node
if (isinstance(expr, Node)): # expr : Node
return expr.x
else: # expr : Tree
return strToOp[expr.op](evaluate(expr.left), evaluate(expr.right))
If you allow for non-binary tree nodes then swap the "else" branch with reduce like reduce(strToOpt[expr.op], expr.children).
I understand that this is very different approach. Reduce3 shouldn't be that hard to implement if you want to stick to it, right? For some help on parsing math from scratch in python you could look at the old project of mine BUT it's amateur code written by a student (me). On the other hand would be unfair not to share, here you go: parsing
*Not necessarily of different type in a programming sense, but certainly of different type or meaning to us.

Can the walrus operator be used to avoid multiple function calls within a list comprehension?

Let's say I have a list of lists like this
lol = [[1, 'e_r_i'], [2, 't_u_p']]
and I want to apply a function to the string elements which returns several values from which I need only a subset (which ones differ per use-case). For illustration purposes, I just make a simple split() operation:
def dummy(s):
return s.split('_')
Now, let's say I only want the last two letters and concatenate those; there is the straightforward option
positions = []
for _, s in lol:
stuff = dummy(s)
positions.append(f"{stuff[1]}{stuff[2]}")
and doing the same in a list comprehension
print([f"{dummy(s)[1]}{dummy(s)[2]}" for _, s in lol])
both give the identical, desired outcome
['ri', 'up']
Is there a way to use the walrus operator here in the list comprehension to avoid calling dummy twice?
PS: Needless to say that in reality the dummy function is far more complex, so I don't look for a better solution regarding the split but it is fully about the structure and potential usage of the walrus operator.
I will have to say that your first explicit loop is the best option here. It is clear, readable code and you're not repeating any calls.
Still, as you asked for it, you could always do:
print([f"{(y:=dummy(s))[1]}{y[2]}" for _, s in lol])
You could also wrap the processing in another function:
def dummy2(l):
return f"{l[1]}{l[2]}"
And this removes the need of walrus altogether and simplifies the code further:
print([dummy2(dummy(s)) for _, s in lol])
Yes. This is what you want
output = [f"{(stuff := dummy(s))[1]}{stuff[2]}" for _, s in lol]

When and why to map a lambda function to a list

I am working through a preparatory course for a Data Science bootcamp and it goes over the lambda keyword and map and filter functions fairly early on in the course. It gives you syntax and how to use it, but I am looking for why and when for context. Here is a sample of their solutions:
def error_line_traces(x_values, y_values, m, b):
return list(map(lambda x_value: error_line_trace(x_values, y_values, m, b, x_value), x_values))
I feel as if every time I go over their solutions to the labs I've turned a single return line solution into a multi-part function. Is this style or is it something that I should be doing?
I'm not aware of any situations where it makes sense to use a map of a lambda, since it's shorter and clearer to use a generator expression instead. And a list of a map of a lambda is even worse cause it could be a list comprehension:
def error_line_traces(x_values, y_values, m, b):
return [error_line_trace(x_values, y_values, m, b, x) for x in x_values]
Look how much shorter and clearer that is!
A filter of a lambda can also be rewritten as a comprehension. For example:
list(filter(lambda x: x>5, range(10)))
[x for x in range(10) if x>5]
That said, there are good uses for lambda, map, and filter, but usually not in combination. Even list(map(...)) can be OK depending on the context, for example converting a list of strings to a list of integers:
[int(x) for x in list_of_strings]
list(map(int, list_of_strings))
These are about as clear and concise, so really the only thing to consider is whether people reading your code will be familiar with map, and whether you want to give a meaningful name to the elements of the iterable (here x, which, admittedly, is not a great example).
Once you get past the bootcamp, keep in mind that map and filter are iterators and do lazy evaluation, so if you're only looping over them and not building a list, they're often preferable for performance reasons, though a generator will probably perform just as well.

Python Idiom for applying sequential steps to an iterable

When doing data processing tasks I often find myself applying a series of compositions, vectorized functions, etc. to some input iterable of data to generate a final result. Ideally I would like something that will work for both lists and generators (in addition to any other iterable). I can think of a number of approaches to structuring code to accomplish this, but every way I can think of has one or more ways where it feels unclean/unidiomatic to me. I have outlined below the different methods I can think of to do this, but my question is—is there a recommended, idiomatic way to do this?
Methods I can think of, illustrated with a simple example that is generally representative of:
Write it as one large expression
result = [sum(group)
for key, group in itertools.groupby(
filter(lambda x: x <= 2, [x **2 for x in input]),
keyfunc=lambda x: x % 3)]
This is often quite difficult to read for any non-trivial sequence of steps. When reading through the code one also encounters each step in reverse order.
Save each step into a different variable name
squared = [x**2 for x in input]
filtered = filter(lambda x: x < 2, squared)
grouped = itertools.groupby(filtered, keyfunc=lambda x: x % 3)
result = [sum(group) for key, group in grouped]
This introduces a number of local variables that can often be hard to name descriptively; additionally, if the result of some or all of the intermediate steps is especially large keeping them around could be very wasteful of memory. If one wants to add a step to this process, care must be taken that all variable names get updated correctly—for example, if we wished to divide every number by two we would add the line halved = [x / 2.0 for x in filtered], but would also have to remember to change filtered to halved in the following line.
Store each step into the same variable name
tmp = [x**2 for x in input]
tmp = filter(lambda x: x < 2, tmp)
tmp = itertools.groupby(tmp, keyfunc=lambda x: x % 3)
result = [sum(group) for key, group in tmp]
I guess this seems to me as the least-bad of these options, but storing things in a generically named placeholder variable feels un-pythonic to me and makes me suspect that there is some better way out there.
Code Review often is a better place for style questions. SO is more for problem solving. But CR can be picky about the completeness of the example.
But I can a few observations:
if you wrap this calculation in a function, naming isn't such a big deal. The names don't have to be globally meaningful.
a number of your expressions are generators. Itertools tends to produce generators or gen. expressions. So memory use shouldn't be much of an issue.
def better_name(input):
squared = (x**2 for x in input) # gen expression
filtered = filter(lambda x: x < 2, squared)
grouped = itertools.groupby(filtered, lambda x: x % 3)
result = (sum(group) for key, group in grouped)
return result
list(better_name(input))
Using def functions instead of lambdas can also make the code clearer. There's a trade off. Your lambdas are simple enough that I'd probably keep them.
Your 2nd option is much more readable than the 1st. The order of the expressions guides my reading and mental evaluation. In the 1st it's hard to identify the inner-most or first evaluation. And groupby is a complex operation, so any help in compartmentalizing the action is welcome.
Following the filter docs, these are equivalent:
filtered = filter(lambda x: x < 2, squared)
filtered = (x for x in squared if x<2)
I was missing the return. The function could return a generator as I show, or an evaluated list.
groupby keyfunc is not a keyword argument, but rather positional one.
groupby is complex function. It returns a generator that produces tuples, an element of which is a generator itself. Returning this makes it more obvious.
((key, list(group)) for key, group in grouped)
So a code style that clarifies its use is desirable.

python input for itertools.product

Looking for a way to simulate nested loops (or a cartesian product) i came across the itertools.product function.
i need a function or piece of code that receive a list of integers as input and returns a specific generator.
example:
input = [3,2,4] -> gen = product(xrange(3),xrange(2),xrange(4))
or
input = [2,4,5,6] -> gen = product(xrange(2),xrange(4),xrange(5),xrange(6))
as the size of the lists varies i am very confused in how to do that without the need of a lot of precoding based on a crazy amount of ifs and the size of the list.
also is there a difference in calling product(range(3)) or product(xrange(3))?
def bigproduct(*args):
newargs = [xrange(x) for x in args]
return itertools.product(*newargs)
for i in bigproduct(3, 2, 4):
....
range() generates a list up-front, therefore uses time up front and more space, but takes less time to get each element. xrange() generates each element on the fly, so takes up less space and initial time, but takes more time to return each element.
This can be easily accomplished using map:
from itertools import product
for i in product(*map(range, shape)):
print i

Categories

Resources