Python list comprehension - want to avoid repeated evaluation - python

I have a list comprehension which approximates to:
[f(x) for x in l if f(x)]
Where l is a list and f(x) is an expensive function which returns a list.
I want to avoid evaluating f(x) twice for every non-empty occurance of f(x). Is there some way to save its output within the list comprehension?
I could remove the final condition, generate the whole list and then prune it, but that seems wasteful.
Edit:
Two basic approaches have been suggested:
An inner generator comprehension:
[y for y in (f(x) for x in l) if y]
or memoization.
I think the inner generator comprehension is elegant for the problem as stated. In actual fact I simplified the question to make it clear, I really want:
[g(x, f(x)) for x in l if f(x)]
For this more complicated situation, I think memoization produces a cleaner end result.

[y for y in (f(x) for x in l) if y]
Will do.

Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), it's possible to use a local variable within a list comprehension in order to avoid calling twice the same function:
In our case, we can name the evaluation of f(x) as a variable y while using the result of the expression to filter the list but also as the mapped value:
[y for x in l if (y := f(x))]

A solution (the best if you have repeated value of x) would be to memoize the function f, i.e. to create a wrapper function that saves the argument by which the function is called and save it, than return it if the same value is asked.
a really simple implementation is the following:
storage = {}
def memoized(value):
if value not in storage:
storage[value] = f(value)
return storage[value]
[memoized(x) for x in l if memoized(x)]
and then use this function in the list comprehension. This approach is valid under two condition, one theoretical and one practical. The first one is that the function f should be deterministic, i.e. returns the same results given the same input, and the other is that the object x can be used as a dictionary keys. If the first one is not valid than you should recompute f each timeby definition, while if the second one fails it is possible to use some slightly more robust approaches.
You can find a lot of implementation of memoization around the net, and I think that the new versions of python have something included in them too.
On a side note, never use the small L as a variable name, is a bad habit as it can be confused with an i or a 1 on some terminals.
EDIT:
as commented, a possible solution using generators comprehension (to avoid creating useless duplicate temporaries) would be this expression:
[g(x, fx) for x, fx in ((x,f(x)) for x in l) if fx]
You need to weight your choice given the computational cost of f, the number of duplication in the original list and memory at you disposition. Memoization make a space-speed tradeoff, meaning that it keep tracks of each result saving it, so if you have huge lists it can became costly on the memory occupation front.

You should use a memoize decorator. Here is an interesting link.
Using memoization from the link and your 'code':
def memoize(f):
""" Memoization decorator for functions taking one or more arguments. """
class memodict(dict):
def __init__(self, f):
self.f = f
def __call__(self, *args):
return self[args]
def __missing__(self, key):
ret = self[key] = self.f(*key)
return ret
return memodict(f)
#memoize
def f(x):
# your code
[f(x) for x in l if f(x)]

[y for y in [f(x) for x in l] if y]
For your updated problem, this might be useful:
[g(x,y) for x in l for y in [f(x)] if y]

Nope. There's no (clean) way to do this. There's nothing wrong with a good-old-fashioned loop:
output = []
for x in l:
result = f(x)
if result:
output.append(result)
If you find that hard to read, you can always wrap it in a function.

As the previous answers have shown, you can use a double comprehension or use memoization. For reasonably-sized problems it's a matter of taste (and I agree that memoization looks cleaner, since it hides the optimization). But if you're examining a very large list, there's a huge difference: Memoization will store every single value you've calculated, and can quickly blow out your memory. A double comprehension with a generator (round parens, not square brackets) only stores what you want to keep.
To come to your actual problem:
[g(x, f(x)) for x in series if f(x)]
To calculate the final value you need both x and f(x). No problem, pass them both like this:
[g(x, y) for (x, y) in ( (x, f(x)) for x in series ) if y ]
Again: this should be using a generator (round parens), not a list comprehension (square brackets). Otherwise you will build the whole list before you start filtering the results. This is the list comprehension version:
[g(x, y) for (x, y) in [ (x, f(x)) for x in series ] if y ] # DO NOT USE THIS

There have been a lot of answers regarding memoizing. The Python 3 standard library now has a lru_cache, which is a Last Recently Used Cache. So you can:
from functools import lru_cache
#lru_cache()
def f(x):
# function body here
This way your function will only be called once. You can also specify the size of the lru_cache, by default this is 128. The problem with the memoize decorators shown above is that the size of the lists can grow well out of hand.

You can use memoization. It is a technique which is used in order to avoid doing the same computation twice by saving somewhere the result for each calculated value.
I saw that there is already an answer that uses memoization, but I would like to propose a generic implementation, using python decorators:
def memoize(func):
def wrapper(*args):
if args in wrapper.d:
return wrapper.d[args]
ret_val = func(*args)
wrapper.d[args] = ret_val
return ret_val
wrapper.d = {}
return wrapper
#memoize
def f(x):
...
Now f is a memoized version of itself.
With this implementation you can memoize any function using the #memoize decorator.

Use map() !!
comp = [x for x in map(f, l) if x]
f is the function f(X), l is the list
map() will return the result of f(x) for each x in the list.

Here is my solution:
filter(None, [f(x) for x in l])

How about defining:
def truths(L):
"""Return the elements of L that test true"""
return [x for x in L if x]
So that, for example
> [wife.children for wife in henry8.wives]
[[Mary1], [Elizabeth1], [Edward6], [], [], []]
> truths(wife.children for wife in henry8.wives)
[[Mary1], [Elizabeth1], [Edward6]]

Related

Powerset recursive, list comprehension python3

I'm new to Python3 and are trying to do a recursive powerset function. It should use list comprehension.
I wrote:
def powerset(seq):
if not seq:
return [[]]
return powerset(seq[1:]) + [[seq[0]] + n for n in powerset(seq[1:])]
This function works but I got feedback and was told it was unnecessary to call the function two times. It did to much computing. It should easily be able to compute up to 20 powersets. So how should I do? I can't get it to work without calling the function twice. Thanks.
Just calculate powerset(seq[1:]) once, store it in a variable, and use it twice:
def powerset(seq):
if not seq:
return [[]]
ps = powerset(seq[1:])
return ps + [[seq[0]] + n for n in ps]
The difference to yours is that this way, you use ps twice, but you compute it just once.
Alternatively, you could use a double list-comprehension (if you like that sort of thing...)
def powerset(seq):
return [x for ps in powerset(seq[1:]) for x in ([seq[0]] + ps, ps)] if seq else [[]]
Here, the same temporary variable ps is defined inside the list comprehension. Note, however, that the results will be in a slightly different order this way.
I feel very unclear. I actually don't understand how just assigning it to a variable can change that? It means the same thing?
You seem to think too much in terms of pure math here. In programming, y = f(x) does not mean "y is the same as/synonymous for f(x)", but "assign the result of f(x) to y".

Python generator conflicting with list comprehension

I've been messing around in Python with generator functions. I want to write a function that took a generator whose values were tuples, and returns a list of generators, where each generator's values correspond to one index in the original tuple.
Currently, I have a function which accomplishes this for a hardcoded number of elements in the tuple. Here is my code:
import itertools
def tee_pieces(generator):
copies = itertools.tee(generator)
dropped_copies = [(x[0] for x in copies[0]), (x[1] for x in copies[1])]
# dropped_copies = [(x[i] for x in copies[i]) for i in range(2)]
return dropped_copies
def gen_words():
for i in "Hello, my name is Fred!".split():
yield i
def split_words(words):
for word in words:
yield (word[:len(word)//2], word[len(word)//2:])
def print_words(words):
for word in words:
print(word)
init_words = gen_words()
right_left_words = split_words(init_words)
left_words, right_words = tee_pieces(right_left_words)
print("Left halves:")
print_words(left_words)
print("Right halves:")
print_words(right_words)
This correctly splits the generator, leading to left_words containing the left halves and right_words containing the right halves.
The problem comes when I try to parameterize the number of generators to be created, using the commented out line above. As far as I know it should be equivalent, but when I use that line instead, both left_words and right_words end up containg the right half of the word, giving an output like this:
Left halves:
lo,
y
me
s
ed!
Right halves:
lo,
y
me
s
ed!
Why is this happening? How can I accommplish the desired result, namely parameterize the number of pieces to split the generator into?
This has to do with python's lexical scoping rules. The classical "surprising" example for demonstrating it:
funcs = [ lambda: i for i in range(3) ]
print(funcs[0]())
=> 2 #??
print(funcs[1]())
=> 2 #??
print(funcs[2]())
=> 2
Your examples is another result of the same rules.
To fix, you can "break" the scoping with an additional function:
def make_gen(i):
return (x[i] for x in copies[i])
dropped_copies = [make_gen(i) for i in range(2)]
This binds the the value of i to the specific value passed to a specific call to make_gen, which achieves the desired behavior. Without it, it is bound the "the current value of the variable named i", which ends up as the same value for all generators you create (as there's only one variable named i).
Too add to shx2's answer, you could also substitute the additional function by a lambda:
dropped_copies = [(lambda j: (x[j] for x in copies[j]))(i) for i in range(2)]
This too creates a new scope when the lambda gets called, as is abundantly clear by the different variable name. It would however also work with using the same name, since the parameter inside the lambda shadows the one inside the generator:
dropped_copies = [(lambda i: (x[i] for x in copies[i]))(i) for i in range(2)]
This sort of scoping seems very confusing but becomes more intuitive if you rewrite the generator as a for loop:
dropped_copies = []
for i in range(2):
dropped_copies.append((x[i] for x in copies[i]))
Note that this is broken in the same way the original list comprehension version is.
This is because dropped_copies is a pair of iterators, and when the iterators are evaluated, i has already been incremented to 1.
Try use list comprehension, you can see the difference:
dropped_copies = [[x[i] for x in copies[i]] for i in range(2)]

Three lines to find the greatest product in a string of numbers in Python

Full disclosure: this is for an assignment. Simply getting working code is enough, but doing this in three lines gets me extra credit.
I'm trying to take a 1000-digit string and find the largest product of 5 consecutive digits. You may recognize this as Project Euler's Problem #8.
I've tried a lot of options, but I seem to be stuck. I'm working on figuring out if I can make a lambda statement that will work, but I have no experience with lambda so it's evading me.
Here's what I have so far:
for i in range(1, 996):
max = int(number[i+0]) * int(number[i+1]) * int(number[i+2]) * int(number[i+3]) * int(number[i+4]) if max < int(number[i+0]) * int(number[i+1]) * int(number[i+2]) * int(number[i+3]) * int(number[i+4]) else max = max
return max
That doesn't work and triggers SyntaxError: can't assign to conditional expression.
I don't want outright code, or at least not a complete function, but just a little help understanding how I can move forward.
This isn't legal python:
x = y if z else x = w
This is:
x = y if z else w
So is this:
if z: x = y
By the way, there is a one line solution, that is much shorter and clearer than your three.
= appears twice in your (very long) line. Effectively you have this:
max = something if something else max = max
which Python parses as:
max = (something if something else max) = max
And, indeed, you can't assign to a conditional expression, which is that whole thing in the middle.
You probably didn't intend to have the final = max at the end.
In [15]: def myinput(l,n):
...: for x in l:
...: yield l[x:x+n]
...:
In [16]: max([reduce(lambda a,b:a*b, x) for x in myinput(range(1000),5) if len(x)==5])
Out[16]: 985084775273880L
Like recursive mentioned, there is a simple one-liner solution. It involves using the max function - always bad to name variables after builtins!
In Python 2 it looks something like this:
max(reduce(lambda x, y: x*y, map(int, num[i:i+5])) for i in xrange(996))
In Python 3 reduce was removed, so you have to get it through functools:
from functools import reduce
max(reduce(lambda x, y: x*y, map(int, num[i:i+5])) for i in range(996))
Look into:
the built-in max function to find the greatest number in a sequence,
the built-in map function to apply a function to all elements in a list,
the built-in reduce function to obtain a single object as a result of applying a function that returns a single object repeatedly to two elements in a list,
lambda definitions to be able to define function objects that you can pass to map() and reduce(),
and list comprehensions (and generators, which are very similar) to compose the above functions in a one-liner.

Is it possible to add a where clause with list comprehension?

Consider the following list comprehension
[ (x,f(x)) for x in iterable if f(x) ]
This filters the iterable based a condition f and returns the pairs of x,f(x). The problem with this approach is f(x) is calculated twice.
It would be great if we could write like
[ (x,fx) for x in iterable if fx where fx = f(x) ]
or
[ (x,fx) for x in iterable if fx with f(x) as fx ]
But in python we have to write using nested comprehensions to avoid duplicate call to f(x) and it makes the comprehension look less clear
[ (x,fx) for x,fx in ( (y,f(y) for y in iterable ) if fx ]
Is there any other way to make it more pythonic and readable?
Update
Coming soon in python 3.8! PEP
# Share a subexpression between a comprehension filter clause and its output
filtered_data = [y for x in data if (y := f(x)) is not None]
There is no where statement but you can "emulate" it using for:
a=[0]
def f(x):
a[0] += 1
return 2*x
print [ (x, y) for x in range(5) for y in [f(x)] if y != 2 ]
print "The function was executed %s times" % a[0]
Execution:
$ python 2.py
[(0, 0), (2, 4), (3, 6), (4, 8)]
The function was executed 5 times
As you can see, the functions is executed 5 times, not 10 or 9.
This for construction:
for y in [f(x)]
imitate where clause.
You seek to have let-statement semantics in python list comprehensions, whose scope is available to both the ___ for..in(map) and the if ___(filter) part of the comprehension, and whose scope depends on the ..for ___ in....
Your solution, modified:
Your (as you admit unreadable) solution of [ (x,fx) for x,fx in ( (y,f(y) for y in iterable ) if fx ] is the most straightforward way to write the optimization.
Main idea: lift x into the tuple (x,f(x)).
Some would argue the most "pythonic" way to do things would be the original [(x,f(x)) for x in iterable if f(x)] and accept the inefficiencies.
You can however factor out the ((y,fy) for y in iterable) into a function, if you plan to do this a lot. This is bad because if you ever wish to have access to more variables than x,fx (e.g. x,fx,ffx), then you will need to rewrite all your list comprehensions. Therefore this isn't a great solution unless you know for sure you only need x,fx and plan to reuse this pattern.
Generator expression:
Main idea: use a more complicated alternative to generator expressions: one where python will let you write multiple lines.
You could just use a generator expression, which python plays nicely with:
def xfx(iterable):
for x in iterable:
fx = f(x)
if fx:
yield (x,fx)
xfx(exampleIterable)
This is how I would personally do it.
Memoization/caching:
Main idea: You could also use(abuse?) side-effects and make f have a global memoization cache, so you don't repeat operations.
This can have a bit of overhead, and requires a policy of how large the cache should be and when it should be garbage-collected. Thus this should only be used if you'd have other uses for memoizing f, or if f is very expensive. But it would let you write...
[ (x,f(x)) for x in iterable if f(x) ]
...like you originally wanted without the performance hit of doing the expensive operations in f twice, even if you technically call it twice. You can add a #memoized decorator to f: example (without maximum cache size). This will work as long as x is hashable (e.g. a number, a tuple, a frozenset, etc.).
Dummy values:
Main idea: capture fx=f(x) in a closure and modify the behavior of the list comprehension.
filterTrue(
(lambda fx=f(x): (x,fx) if fx else None)() for x in iterable
)
where filterTrue(iterable) is filter(None, iterable). You would have to modify this if your list type (a 2-tuple) was actually capable of being None.
Nothing says you must use comprehensions. In fact most style guides I've seen request that you limit them to simple constructs, anyway.
You could use a generator expression, instead.
def fun(iterable):
for x in iterable:
y = f(x)
if y:
yield x, y
print list(fun(iterable))
Map and Zip ?
fnRes = map(f, iterable)
[(x,fx) for x,fx in zip(iterable, fnRes) if fx)]

Expanding elements in a list

I'm looking for a "nice" way to process a list where some elements need to be expanded into more elements (only once, no expansion on the results).
Standard iterative way would be to do:
i=0
while i < len(l):
if needs_expanding(l[i]):
new_is = expand(l[i])
l[i:i] = new_is
i += len(new_is)
else:
i += 1
which is pretty ugly. I could rewrite the contents into a new list with:
nl = []
for x in l:
if needs_expanding(x):
nl += expand(x)
else:
nl.append(x)
But they both seem too long. Or I could simply do 2 passes and flatten the list later:
flatten(expand(x) if needs_expanding(x) else x for x in l)
# or
def try_expanding(x)....
flatten(try_expanding(x) for x in l)
but this doesn't feel "right" either.
Are there any other clear ways of doing this?
Your last two answers are what I would do. I'm not familiar with flatten() though, but if you have such a function then that looks ideal. You can also use the built-in sum():
sum(expand(x) if needs_expanding(x) else [x] for x in l, [])
sum(needs_expanding(x) and expand(x) or [x] for x in l, [])
If you do not need random access in the list you are generating, you could also use write a generator.
def iter_new_list(old_list):
for x in old_list:
if needs_expanding(x):
for y in expand(x):
yield y
else:
yield x
new_list = list(iter_new_list(old_list))
This is functionally equivalent to your second example, but it might be more readable in your real-world situation.
Also, Python coding standards forbid the use of lowercase-L as a variable name, as it is nearly indistinguishable from the numeral one.
The last one is probably your most pythonic, but you could try an implied loop (or in py3, generator) with map:
flatten(map(lambda x: expand(x) if needs_expanding(x) else x, l))
flatten(map(try_expanding, l))

Categories

Resources