I'm looking for a "nice" way to process a list where some elements need to be expanded into more elements (only once, no expansion on the results).
Standard iterative way would be to do:
i=0
while i < len(l):
if needs_expanding(l[i]):
new_is = expand(l[i])
l[i:i] = new_is
i += len(new_is)
else:
i += 1
which is pretty ugly. I could rewrite the contents into a new list with:
nl = []
for x in l:
if needs_expanding(x):
nl += expand(x)
else:
nl.append(x)
But they both seem too long. Or I could simply do 2 passes and flatten the list later:
flatten(expand(x) if needs_expanding(x) else x for x in l)
# or
def try_expanding(x)....
flatten(try_expanding(x) for x in l)
but this doesn't feel "right" either.
Are there any other clear ways of doing this?
Your last two answers are what I would do. I'm not familiar with flatten() though, but if you have such a function then that looks ideal. You can also use the built-in sum():
sum(expand(x) if needs_expanding(x) else [x] for x in l, [])
sum(needs_expanding(x) and expand(x) or [x] for x in l, [])
If you do not need random access in the list you are generating, you could also use write a generator.
def iter_new_list(old_list):
for x in old_list:
if needs_expanding(x):
for y in expand(x):
yield y
else:
yield x
new_list = list(iter_new_list(old_list))
This is functionally equivalent to your second example, but it might be more readable in your real-world situation.
Also, Python coding standards forbid the use of lowercase-L as a variable name, as it is nearly indistinguishable from the numeral one.
The last one is probably your most pythonic, but you could try an implied loop (or in py3, generator) with map:
flatten(map(lambda x: expand(x) if needs_expanding(x) else x, l))
flatten(map(try_expanding, l))
Related
I'm kinda new to Programming and Python and I'm self learning before going to uni so please be gentle, I'm a newbie. I hope my english won't have too many grammatical errors.
Basically I had this exercise in a book I'm currently reading to take a list of tuples as a function parameter, then take every item in the each tuple and put it to 2nd power and sum the items up.
My code looks like this and works good if my function call includes the same amount of arguments as the function for loop requires:
def summary(xs):
for x,y,z in xs:
print( x*x + y*y + z*z)
xs =[(2,3,4), (2,-3,4), (1,2,3)]
summary(xs)
However, If I use a list with less tuples than the function definition, I get an error: ValueError : not enough values to unpack(expected 3, got 0):
xs =[(2,3,4), (), (1,2,3)]
I would like to know how to make a function that would accept a tuple I shown before () - with no tuples, and the function would return 0. I have been trying multiple ways how to solve this for 2 days already and googling as well, but it occurs to me I'm either missing something or I'm not aware of a function i could use. Thank you all for the help.
One way is to iterate over the tuple values, this would also be the way to tackle this problem in nearly every programming language:
def summary(xs):
for item in xs:
s = 0
for value in item:
s += value**2
print(s)
Or using a list comprehension:
def summary(xs):
for item in xs:
result = sum([x**2 for x in item])
print(result)
also note that sum([]) will return 0 for an empty iterable.
Well, the issue is that you don't have enough indices in your inner tuple to unpack into three variables. The simplest way to go around it is to manually unpack after checking that you have enough variables, i.e.:
def summary(xs):
for values in xs:
if values and len(values) == 3:
x, y, z = values # or don't unpack, refer to them by index, i.e. v[0], v[1]...
print(x*x + y*y + z*z)
else:
print(0)
Or use a try..except block:
def summary(xs):
for values in xs:
try:
x, y, z = values # or don't unpack, refer to them by index, i.e. v[0], v[1]...
print(x*x + y*y + z*z)
except ValueError: # check for IndexError if not unpacking
print(0)
One way is to use try / except. In the below example, we use a generator and catch occasions when unpacking fails with ValueError and yield 0.
While you are learning, I highly recommend you practice writing functions which return or yield rather than using them to print values.
def summary(xs):
for item in xs:
try:
yield sum(i**2 for i in item)
except ValueError:
yield 0
xs = [(2,3,4), (), (1,2,3)]
res = list(summary(xs))
print(res)
[29, 0, 14]
Or to actually utilise the generator in a lazy fashion:
for i in summary(xs):
print(i)
29
0
14
You should use the "len > 0" condition. This code should work for any list or tuple length :
def summary(xs):
for tup in xs:
prod = [a*a for a in tup if len(tup)>0]
print(sum(prod))
Note that I defined a "prod" list in order to use "sum" so that it is not calculated the hard way. It replaces your "x* x + y* y + z* z" and works for any tuple length.
It often pays to separate your algorithm into functions that just do one thing. In this case a function to sum the squares of a list of values and a function to print them. It is very helpful to keep your variable names meaningful. In this case your xs is a list of lists, so might be better named xss
import math
def sum_of_squares(xs):
return sum(map(math.sqr, xs))
def summary(xss):
for xs in xss:
print sum_of_squares(xs)
xss = [(2,3,4), (), (1,2,3)]
summary(xss)
or
map(print, sum(map(math.sqr, (x for x in xs))))
I have a list comprehension which approximates to:
[f(x) for x in l if f(x)]
Where l is a list and f(x) is an expensive function which returns a list.
I want to avoid evaluating f(x) twice for every non-empty occurance of f(x). Is there some way to save its output within the list comprehension?
I could remove the final condition, generate the whole list and then prune it, but that seems wasteful.
Edit:
Two basic approaches have been suggested:
An inner generator comprehension:
[y for y in (f(x) for x in l) if y]
or memoization.
I think the inner generator comprehension is elegant for the problem as stated. In actual fact I simplified the question to make it clear, I really want:
[g(x, f(x)) for x in l if f(x)]
For this more complicated situation, I think memoization produces a cleaner end result.
[y for y in (f(x) for x in l) if y]
Will do.
Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), it's possible to use a local variable within a list comprehension in order to avoid calling twice the same function:
In our case, we can name the evaluation of f(x) as a variable y while using the result of the expression to filter the list but also as the mapped value:
[y for x in l if (y := f(x))]
A solution (the best if you have repeated value of x) would be to memoize the function f, i.e. to create a wrapper function that saves the argument by which the function is called and save it, than return it if the same value is asked.
a really simple implementation is the following:
storage = {}
def memoized(value):
if value not in storage:
storage[value] = f(value)
return storage[value]
[memoized(x) for x in l if memoized(x)]
and then use this function in the list comprehension. This approach is valid under two condition, one theoretical and one practical. The first one is that the function f should be deterministic, i.e. returns the same results given the same input, and the other is that the object x can be used as a dictionary keys. If the first one is not valid than you should recompute f each timeby definition, while if the second one fails it is possible to use some slightly more robust approaches.
You can find a lot of implementation of memoization around the net, and I think that the new versions of python have something included in them too.
On a side note, never use the small L as a variable name, is a bad habit as it can be confused with an i or a 1 on some terminals.
EDIT:
as commented, a possible solution using generators comprehension (to avoid creating useless duplicate temporaries) would be this expression:
[g(x, fx) for x, fx in ((x,f(x)) for x in l) if fx]
You need to weight your choice given the computational cost of f, the number of duplication in the original list and memory at you disposition. Memoization make a space-speed tradeoff, meaning that it keep tracks of each result saving it, so if you have huge lists it can became costly on the memory occupation front.
You should use a memoize decorator. Here is an interesting link.
Using memoization from the link and your 'code':
def memoize(f):
""" Memoization decorator for functions taking one or more arguments. """
class memodict(dict):
def __init__(self, f):
self.f = f
def __call__(self, *args):
return self[args]
def __missing__(self, key):
ret = self[key] = self.f(*key)
return ret
return memodict(f)
#memoize
def f(x):
# your code
[f(x) for x in l if f(x)]
[y for y in [f(x) for x in l] if y]
For your updated problem, this might be useful:
[g(x,y) for x in l for y in [f(x)] if y]
Nope. There's no (clean) way to do this. There's nothing wrong with a good-old-fashioned loop:
output = []
for x in l:
result = f(x)
if result:
output.append(result)
If you find that hard to read, you can always wrap it in a function.
As the previous answers have shown, you can use a double comprehension or use memoization. For reasonably-sized problems it's a matter of taste (and I agree that memoization looks cleaner, since it hides the optimization). But if you're examining a very large list, there's a huge difference: Memoization will store every single value you've calculated, and can quickly blow out your memory. A double comprehension with a generator (round parens, not square brackets) only stores what you want to keep.
To come to your actual problem:
[g(x, f(x)) for x in series if f(x)]
To calculate the final value you need both x and f(x). No problem, pass them both like this:
[g(x, y) for (x, y) in ( (x, f(x)) for x in series ) if y ]
Again: this should be using a generator (round parens), not a list comprehension (square brackets). Otherwise you will build the whole list before you start filtering the results. This is the list comprehension version:
[g(x, y) for (x, y) in [ (x, f(x)) for x in series ] if y ] # DO NOT USE THIS
There have been a lot of answers regarding memoizing. The Python 3 standard library now has a lru_cache, which is a Last Recently Used Cache. So you can:
from functools import lru_cache
#lru_cache()
def f(x):
# function body here
This way your function will only be called once. You can also specify the size of the lru_cache, by default this is 128. The problem with the memoize decorators shown above is that the size of the lists can grow well out of hand.
You can use memoization. It is a technique which is used in order to avoid doing the same computation twice by saving somewhere the result for each calculated value.
I saw that there is already an answer that uses memoization, but I would like to propose a generic implementation, using python decorators:
def memoize(func):
def wrapper(*args):
if args in wrapper.d:
return wrapper.d[args]
ret_val = func(*args)
wrapper.d[args] = ret_val
return ret_val
wrapper.d = {}
return wrapper
#memoize
def f(x):
...
Now f is a memoized version of itself.
With this implementation you can memoize any function using the #memoize decorator.
Use map() !!
comp = [x for x in map(f, l) if x]
f is the function f(X), l is the list
map() will return the result of f(x) for each x in the list.
Here is my solution:
filter(None, [f(x) for x in l])
How about defining:
def truths(L):
"""Return the elements of L that test true"""
return [x for x in L if x]
So that, for example
> [wife.children for wife in henry8.wives]
[[Mary1], [Elizabeth1], [Edward6], [], [], []]
> truths(wife.children for wife in henry8.wives)
[[Mary1], [Elizabeth1], [Edward6]]
for x in [temp for temp in xlist if temp<=xmax]:
This code works, but looks like an unnecessarily foreign stuttering way of starting a for loop.
Is there a cleaner syntax?
What are you trying to do here?
for x in xlist:
if x > xmax: continue
will work. (what does the rest of your for loop do?) If it can be accomplished using only a list-comp, that may be the way to go. If it can't, then you probably want the idiom above, or some variant that you'll see in the other answers posted here.
for x in xlist:
if x <= xmax:
#do stuff
As an aside, if possible you'd want to use a generator expression in your original version, as that won't pre-create an unnecessary list.
for x in (temp for temp in xlist if temp <= xmax):
#etc.
for x in filter(lambda x: x<=xmax, xlist):
pass
# or with itertools (faster on python2, on py3 these are equivalent):
import itertools
for x in itertools.ifilter(lambda x: x<=xmax, xlist):
pass
But also look at #mgilson answer which suggests that you may be able to rewrite the whole code as list comprehension.
Why not filter?
filter(function, iterable)
filter(function, iterable) is equivalent to [item for item in iterable if function(item)]
In your case:
filter(lambda item: item <= xmax, xlist)
I recently started learning Python, and the concept of for loops is still a little confusing for me. I understand that it generally follows the format for x in y, where y is just some list.
The for-each loop for (int n: someArray)
becomes for n in someArray,
And the for loop for (i = 0; i < 9; i-=2) can be represented by for i in range(0, 9, -2)
Suppose instead of a constant increment, I wanted i*=2, or even i*=i. Is this possible, or would I have to use a while loop instead?
As you say, a for loop iterates through the elements of a list. The list can contain anything you like, so you can construct a list beforehand that contains each step.
A for loop can also iterate over a "generator", which is a small piece of code instead of an actual list. In Python, range() is actually a generator (in Python 2.x though, range() returned a list while xrange() was the generator).
For example:
def doubler(x):
while True:
yield x
x *= 2
for i in doubler(1):
print i
The above for loop will print
1
2
4
8
and so on, until you press Ctrl+C.
You can use a generator expression to do this efficiently and with little excess code:
for i in (2**x for x in range(10)): #In Python 2.x, use `xrange()`.
...
Generator expressions work just like defining a manual generator (as in Greg Hewgill's answer), with a syntax similar to a list comprehension. They are evaluated lazily - meaning that they don't generate a list at the start of the operation, which can cause much better performance on large iterables.
So this generator works by waiting until it is asked for a value, then asking range(10) for a value, doubling that value, and passing it back to the for loop. It does this repeatedly until the range() generator yields no more values.
Bear in mind that the 'list' part of the Python can be any iterable sequence.
Examples:
A string:
for c in 'abcdefg':
# deal with the string on a character by character basis...
A file:
with open('somefile','r') as f:
for line in f:
# deal with the file line by line
A dictionary:
d={1:'one',2:'two',3:'three'}
for key, value in d.items():
# deal with the key:value pairs from a dict
A slice of a list:
l=range(100)
for e in l[10:20:2]:
# ever other element between 10 and 20 in l
etc etc etc etc
So it really is a lot deeper than 'just some list'
As others have stated, just set the iterable to be what you want it to be for your example questions:
for e in (i*i for i in range(10)):
# the squares of the sequence 0-9...
l=[1,5,10,15]
for i in (i*2 for i in l):
# the list l as a sequence * 2...
You will want to use list comprehensions for this
print [x**2 for x in xrange(10)] # X to the 2nd power.
and
print [x**x for x in xrange(10)] # X to the Xth power.
The list comprehension syntax is a follows:
[EXPRESSION for VARIABLE in ITERABLE if CONDITION]
Under the hood, it acts similar to the map and filter function:
def f(VARIABLE): return EXPRESSION
def c(VARIABLE): return CONDITION
filter(c, map(f, ITERABLE))
Example given:
def square(x): return x**2
print map(square, xrange(10))
and
def hypercube(x): return x**x
print map(hypercube, xrange(10))
Which can be used as alternative approach if you don't like list comprehensions.
You could as well use a for loop, but that would step away from being Python idiomatic...
Just for an alternative, how about generalizing the iterate/increment operation to a lambda function so you can do something like this:
for i in seq(1, 9, lambda x: x*2):
print i
...
1
2
4
8
Where seq is defined below:
#!/bin/python
from timeit import timeit
def seq(a, b, f):
x = a;
while x < b:
yield x
x = f(x)
def testSeq():
l = tuple(seq(1, 100000000, lambda x: x*2))
#print l
def testGen():
l = tuple((2**x for x in range(27)))
#print l
testSeq();
testGen();
print "seq", timeit('testSeq()', 'from __main__ import testSeq', number = 1000000)
print "gen", timeit('testGen()', 'from __main__ import testGen', number = 1000000)
The difference in performance isn't that much:
seq 7.98655080795
gen 6.19856786728
[EDIT]
To support reverse iteration and with a default argument...
def seq(a, b, f = None):
x = a;
if b > a:
if f == None:
f = lambda x: x+1
while x < b:
yield x
x = f(x)
else:
if f == None:
f = lambda x: x-1
while x > b:
yield x
x = f(x)
for i in seq(8, 0, lambda x: x/2):
print i
Note: This behaves differently to range/xrange in which the direction </> test is chosen by the iterator sign, rather than the difference between start and end values.
(This is professional best practise/ pattern interest, not home work request)
INPUT: any unordered sequence or generator items, function myfilter(item) returns True if filter condition is fulfilled
OUTPUT: (filter_true, filter_false) tuple of sequences of
original type which contain the
elements partitioned according to
filter in original sequence order.
How would you express this without doing double filtering, or should I use double filtering? Maybe filter and loop/generator/list comprehencion with next could be answer?
Should I take out the requirement of keeping the type or just change requirement giving tuple of tuple/generator result, I can not return easily generator for generator input, or can I? (The requirements are self-made)
Here test of best candidate at the moment, offering two streams instead of tuple
import itertools as it
from sympy.ntheory import isprime as myfilter
mylist = xrange(1000001,1010000,2)
left,right = it.tee((myfilter(x), x) for x in mylist)
filter_true = (x for p,x in left if p)
filter_false = (x for p,x in right if not p)
print 'Hundred primes and non-primes odd numbers'
print '\n'.join( " Prime %i, not prime %i" %
(next(filter_true),next(filter_false))
for i in range(100))
Here is a way to do it which only calls myfilter once for each item and will also work if mylist is a generator
import itertools as it
left,right = it.tee((myfilter(x), x) for x in mylist)
filter_true = (x for p,x in left if p)
filter_false = (x for p,x in right if not p)
Let's suppose that your probleme is not memory but cpu, myfilter is heavy and you don't want to iterate and filter the original dataset twice. Here are some single pass ideas :
The simple and versatile version (memoryvorous) :
filter_true=[]
filter_false=[]
for item in items:
if myfilter(item):
filter_true.append(item)
else:
filter_false.append(item)
The memory friendly version : (doesn't work with generators (unless used with list(items)))
while items:
item=items.pop()
if myfilter(item):
filter_true.append(item)
else:
filter_false.append(item)
The generator friendly version :
while True:
try:
item=next(items)
if myfilter(item):
filter_true.append(item)
else:
filter_false.append(item)
except StopIteration:
break
The easy way (but less efficient) is to tee the iterable and filter both of them:
import itertools
left, right = itertools.tee( mylist )
filter_true = (x for x in left if myfilter(x))
filter_false = (x for x in right if myfilter(x))
This is less efficient than the optimal solution, because myfilter will be called repeatedly for each element. That is, if you have tested an element in left, you shouldn't have to re-test it in right because you already know the answer. If you require this optimisation, it shouldn't be hard to implement: have a look at the implementation of tee for clues. You'll need a deque for each returned iterable which you stock with the elements of the original sequence that should go in it but haven't been asked for yet.
I think your best bet will be constructing two separate generators:
filter_true = (x for x in mylist if myfilter(x))
filter_false = (x for x in mylist if not myfilter(x))