Flattening nested generator expressions

Flattening nested generator expressions - python

I'm trying to flatten a nested generator of generators but I'm getting an unexpected result:
>>> g = ((3*i + j for j in range(3)) for i in range(3))
>>> list(itertools.chain(*g))
[6, 7, 8, 6, 7, 8, 6, 7, 8]
I expected the result to look like this:
[0, 1, 2, 3, 4, 5, 6, 7, 8]
I think I'm getting the unexpected result because the inner generators are not being evaluated until the outer generator has already been iterated over, setting i to 2. I can hack together a solution by forcing evaluation of the inner generators by using a list comprehension instead of a generator expression:
>>> g = ([3*i + j for j in range(3)] for i in range(3))
>>> list(itertools.chain(*g))
[0, 1, 2, 3, 4, 5, 6, 7, 8]
Ideally, I would like a solution that's completely lazy and doesn't force evaluation of the inner nested elements until they're used.
Is there a way to flatten nested generator expressions of arbitrary depth (maybe using something other than itertools.chain)?
Edit:
No, my question is not a duplicate of Variable Scope In Generators In Classes. I honestly can't tell how these two questions are related at all. Maybe the moderator could explain why he thinks this is a duplicate.
Also, both answers to my question are correct in that they can be used to write a function that flattens nested generators correctly.
def flattened1(iterable):
iter1, iter2 = itertools.tee(iterable)
if isinstance(next(iter1), collections.Iterable):
return flattened1(x for y in iter2 for x in y)
else:
return iter2
def flattened2(iterable):
iter1, iter2 = itertools.tee(iterable)
if isinstance(next(iter1), collections.Iterable):
return flattened2(itertools.chain.from_iterable(iter2))
else:
return iter2
As far as I can tell with timeit, they both perform identically.
>>> timeit(test1, setup1, number=1000000)
18.173431718023494
>>> timeit(test2, setup2, number=1000000)
17.854709611972794
I'm not sure which one is better from a style standpoint either, since x for y in iter2 for x in y is a bit of a brain twister, but arguably more elegant than itertools.chain.from_iterable(iter2). Input is appreciated.
Regrettably, I was only able to mark one of the two equally good answers correct.

Instead of using chain(*g), you can use chain.from_iterable:
>>> g = ((3*i + j for j in range(3)) for i in range(3))
>>> list(itertools.chain(*g))
[6, 7, 8, 6, 7, 8, 6, 7, 8]
>>> g = ((3*i + j for j in range(3)) for i in range(3))
>>> list(itertools.chain.from_iterable(g))
[0, 1, 2, 3, 4, 5, 6, 7, 8]

How about this:
[x for y in g for x in y]
Which yields:
[0, 1, 2, 3, 4, 5, 6, 7, 8]

Guess you already have your answer, but here's another perspective.
The problem is that when each inner generator is created, the value-generating expression is closed over the outer variable i so even when the first inner generator starts generating values, it's using the "current" value of i. This will have value i=2 if the outer generator has been fully consumed (and that's exactly the case right after the argument in the chain(*g) call is evaluated, before chain is actually called).
The following devious trick will work around the problem:
g = ((3*i1 + j for i1 in [i] for j in range(3)) for i in range(3))
Note that these inner generators aren't closed over i because the for clauses are evaluated at generator creation time so the singleton list [i] is evaluated and its value "frozen" in the face of further changes to the value of i.
This approach has the advantage over the from_iterable answer that it's a little more general if you want to use it outside a chain.from_iterable call -- it will always produce the "correct" inner generators, whether the outer generator is partially or fully consumed before the inner generators are used. For example, in the following code:
g = ((3*i1 + j for i1 in [i] for j in range(3)) for i in range(3))
g1 = next(g)
g2 = next(g)
g3 = next(g)
you can insert the lines:
list(g1)
list(g2)
list(g3)
in any order at any point after the respective inner generator has been defined, and you'll get the correct results.

Related

How to calculate a cumulative product of a list using list comprehension

I'm trying my hand at converting the following loop to a comprehension.
Problem is given an input_list = [1, 2, 3, 4, 5]
return a list with each element as multiple of all elements till that index starting from left to right.
Hence return list would be [1, 2, 6, 24, 120].
The normal loop I have (and it's working):
l2r = list()
for i in range(lst_len):
if i == 0:
l2r.append(lst_num[i])
else:
l2r.append(lst_num[i] * l2r[i-1])

Python 3.8+ solution:
:= Assignment Expressions
lst = [1, 2, 3, 4, 5]
curr = 1
out = [(curr:=curr*v) for v in lst]
print(out)
Prints:
[1, 2, 6, 24, 120]
Other solution (with itertools.accumulate):
from itertools import accumulate
out = [*accumulate(lst, lambda a, b: a*b)]
print(out)

Well, you could do it like this(a):
import math
orig = [1, 2, 3, 4, 5]
print([math.prod(orig[:pos]) for pos in range(1, len(orig) + 1)])
This generates what you wanted:
[1, 2, 6, 24, 120]
and basically works by running a counter from 1 to the size of the list, at each point working out the product of all terms before that position:
pos values prod
=== ========= ====
1 1 1
2 1,2 2
3 1,2,3 6
4 1,2,3,4 24
5 1,2,3,4,5 120
(a) Just keep in mind that's less efficient at runtime since it calculates the full product for every single element (rather than caching the most recently obtained product). You can avoid that while still making your code more compact (often the reason for using list comprehensions), with something like:
def listToListOfProds(orig):
curr = 1
newList = []
for item in orig:
curr *= item
newList.append(curr)
return newList
print(listToListOfProds([1, 2, 3, 4, 5]))
That's obviously not a list comprehension but still has the advantages in that it doesn't clutter up your code where you need to calculate it.
People seem to often discount the function solution in Python, simply because the language is so expressive and allows things like list comprehensions to do a lot of work in minimal source code.
But, other than the function itself, this solution has the same advantages of a one-line list comprehension in that it, well, takes up one line :-)
In addition, you're free to change the function whenever you want (if you find a better way in a later Python version, for example), without having to change all the different places in the code that call it.

This should not be made into a list comprehension if one iteration depends on the state of an earlier one!
If the goal is a one-liner, then there are lots of solutions with #AndrejKesely's itertools.accumulate() being an excellent one (+1). Here's mine that abuses functools.reduce():
from functools import reduce
lst = [1, 2, 3, 4, 5]
print(reduce(lambda x, y: x + [x[-1] * y], lst, [lst.pop(0)]))
But as far as list comprehensions go, #AndrejKesely's assignment-expression-based solution is the wrong thing to do (-1). Here's a more self contained comprehension that doesn't leak into the surrounding scope:
lst = [1, 2, 3, 4, 5]
seq = [a.append(a[-1] * b) or a.pop(0) for a in [[lst.pop(0)]] for b in [*lst, 1]]
print(seq)
But it's still the wrong thing to do! This is based on a similar problem that also got upvoted for the wrong reasons.

A recursive function could help.
input_list = [ 1, 2, 3, 4, 5]
def cumprod(ls, i=None):
i = len(ls)-1 if i is None else i
if i == 0:
return 1
return ls[i] * cumprod(ls, i-1)
output_list = [cumprod(input_list, i) for i in range(len(input_list))]
output_list has value [1, 2, 6, 24, 120]
This method can be compressed in python3.8 using the walrus operator
input_list = [ 1, 2, 3, 4, 5]
def cumprod_inline(ls, i=None):
return 1 if (i := len(ls)-1 if i is None else i) == 0 else ls[i] * cumprod_inline(ls, i-1)
output_list = [cumprod_inline(input_list, i) for i in range(len(input_list))]
output_list has value [1, 2, 6, 24, 120]
Because you plan to use this in list comprehension, there's no need to provide a default for the i argument. This removes the need to check if i is None.
input_list = [ 1, 2, 3, 4, 5]
def cumprod_inline_nodefault(ls, i):
return 1 if i == 0 else ls[i] * cumprod_inline_nodefault(ls, i-1)
output_list = [cumprod_inline_nodefault(input_list, i) for i in range(len(input_list))]
output_list has value [1, 2, 6, 24, 120]
Finally, if you really wanted to keep it to a single , self-contained list comprehension line, you can follow the approach note here to use recursive lambda calls
input_list = [ 1, 2, 3, 4, 5]
output_list = [(lambda func, x, y: func(func,x,y))(lambda func, ls, i: 1 if i == 0 else ls[i] * func(func, ls, i-1),input_list,i) for i in range(len(input_list))]
output_list has value [1, 2, 6, 24, 120]
It's entirely over-engineered, and barely legible, but hey! it works and its just for fun.

For your list, it might not be intentional that the numbers are consecutive, starting from 1. But for cases that that pattern is intentional, you can use the built in method, factorial():
from math import factorial
input_list = [1, 2, 3, 4, 5]
l2r = [factorial(i) for i in input_list]
print(l2r)
Output:
[1, 2, 6, 24, 120]

The package numpy has a number of fast implementations of list comprehensions built into it. To obtain, for example, a cumulative product:
>>> import numpy as np
>>> np.cumprod([1, 2, 3, 4, 5])
array([ 1, 2, 6, 24, 120])
The above returns a numpy array. If you are not familiar with numpy, you may prefer to obtain just a normal python list:
>>> list(np.cumprod([1, 2, 3, 4, 5]))
[1, 2, 6, 24, 120]

using itertools and operators:
from itertools import accumulate
import operator as op
ip_lst = [1,2,3,4,5]
print(list(accumulate(ip_lst, func=op.mul)))

Why does using "and" between two lists give me the second list (as opposed to an error)? [duplicate]

This question already has answers here:
Python AND operator on two boolean lists - how?
(10 answers)
Closed 5 years ago.
I encountered a bug in some code. The (incorrect) line was similar to:
[x for x in range(3, 6) and range(0, 10)]
print x
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
(the correct way of writing this statement is not part of the question)
Wondering what someList and someOtherList does, I experimented. It seems to only ever set the result to the last parameter passed:
x = range(0,3) # [0, 1, 2]
y = range(3, 10) # [3, 4, 5, 6, 7, 8, 9]
z = range(4, 8) # [4, 5, 6, 7]
print x and y # [3, 4, 5, 6, 7, 8, 9]
print y and x # [0, 1, 2]
print z and y and x # [0, 1, 2]
I would assume that this is an unintentional consequence of being able to write something that is useful, but I'm not really seeing how the semantics of the "and" operator are being applied here.
From experience, python won't apply operators to things that don't support those operators (i.e. it spits out a TypeError), so I'd expect an error for and-ing something that should never be and-ed. The fact I don't get an error is telling me that I'm missing... something.
What am I missing? Why is list and list an allowed operation? And is there anything "useful" that can be done with this behaviour?

The and operator checks the truthiness of the first operand. If the truthiness is False, then the first argument is returned, otherwise the second.
So if both range(..)s contain elements, the last operand is returned. So your expression:
[x for x in range(3, 6) and range(0, 10)]
is equivalent to:
[x for x in range(0, 10)]
You can however use an if to filter:
[x for x in range(3, 6) if x in range(0, 10)]
In python-2.7 range(..) however constructs a list, making it not terribly efficient. Since we know that x is an int, we can do the bounds check like:
[x for x in range(3, 6) if 0 <= x < 10]
Of course in this case the check is useless since every element in range(3,6) is in the range of range(0,10).

concatenate an arbitrary number of lists in a function in Python

I hope to write the join_lists function to take an arbitrary number of lists and concatenate them. For example, if the inputs are
m = [1, 2, 3]
n = [4, 5, 6]
o = [7, 8, 9]
then we I call print join_lists(m, n, o), it will return [1, 2, 3, 4, 5, 6, 7, 8, 9]. I realize I should use *args as the argument in join_lists, but not sure how to concatenate an arbitrary number of lists. Thanks.

Although you can use something which invokes __add__ sequentially, that is very much the wrong thing (for starters you end up creating as many new lists as there are lists in your input, which ends up having quadratic complexity).
The standard tool is itertools.chain:
def concatenate(*lists):
return itertools.chain(*lists)
or
def concatenate(*lists):
return itertools.chain.from_iterable(lists)
This will return a generator which yields each element of the lists in sequence. If you need it as a list, use list: list(itertools.chain.from_iterable(lists))
If you insist on doing this "by hand", then use extend:
def concatenate(*lists):
newlist = []
for l in lists: newlist.extend(l)
return newlist
Actually, don't use extend like that - it's still inefficient, because it has to keep extending the original list. The "right" way (it's still really the wrong way):
def concatenate(*lists):
lengths = map(len,lists)
newlen = sum(lengths)
newlist = [None]*newlen
start = 0
end = 0
for l,n in zip(lists,lengths):
end+=n
newlist[start:end] = list
start+=n
return newlist
http://ideone.com/Mi3UyL
You'll note that this still ends up doing as many copy operations as there are total slots in the lists. So, this isn't any better than using list(chain.from_iterable(lists)), and is probably worse, because list can make use of optimisations at the C level.
Finally, here's a version using extend (suboptimal) in one line, using reduce:
concatenate = lambda *lists: reduce((lambda a,b: a.extend(b) or a),lists,[])

One way would be this (using reduce) because I currently feel functional:
import operator
from functools import reduce
def concatenate(*lists):
return reduce(operator.add, lists)
However, a better functional method is given in Marcin's answer:
from itertools import chain
def concatenate(*lists):
return chain(*lists)
although you might as well use itertools.chain(*iterable_of_lists) directly.
A procedural way:
def concatenate(*lists):
new_list = []
for i in lists:
new_list.extend(i)
return new_list
A golfed version: j=lambda*x:sum(x,[]) (do not actually use this).

You can use sum() with an empty list as the start argument:
def join_lists(*lists):
return sum(lists, [])
For example:
>>> join_lists([1, 2, 3], [4, 5, 6])
[1, 2, 3, 4, 5, 6]

Another way:
>>> m = [1, 2, 3]
>>> n = [4, 5, 6]
>>> o = [7, 8, 9]
>>> p = []
>>> for (i, j, k) in (m, n, o):
... p.append(i)
... p.append(j)
... p.append(k)
...
>>> p
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>>

This seems to work just fine:
def join_lists(*args):
output = []
for lst in args:
output += lst
return output
It returns a new list with all the items of the previous lists. Is using + not appropriate for this kind of list processing?

Or you could be logical instead, making a variable (here 'z') equal to the first list passed to the 'join_lists' function
then assigning the items in the list (not the list itself) to a new list to which you'll then be able add the elements of the other lists:
m = [1, 2, 3]
n = [4, 5, 6]
o = [7, 8, 9]
def join_lists(*x):
z = [x[0]]
for i in range(len(z)):
new_list = z[i]
for item in x:
if item != z:
new_list += (item)
return new_list
then
print (join_lists(m, n ,o)
would output:
[1, 2, 3, 4, 5, 6, 7, 8, 9]

Remove all values within one list from another list? [duplicate]

This question already has answers here:
Remove all the elements that occur in one list from another
(13 answers)
Closed 6 years ago.
I am looking for a way to remove all values within a list from another list.
Something like this:
a = range(1,10)
a.remove([2,3,7])
print a
a = [1,4,5,6,8,9]

>>> a = range(1, 10)
>>> [x for x in a if x not in [2, 3, 7]]
[1, 4, 5, 6, 8, 9]

I was looking for fast way to do the subject, so I made some experiments with suggested ways. And I was surprised by results, so I want to share it with you.
Experiments were done using pythonbenchmark tool and with
a = range(1,50000) # Source list
b = range(1,15000) # Items to remove
Results:
def comprehension(a, b):
return [x for x in a if x not in b]
5 tries, average time 12.8 sec
def filter_function(a, b):
return filter(lambda x: x not in b, a)
5 tries, average time 12.6 sec
def modification(a,b):
for x in b:
try:
a.remove(x)
except ValueError:
pass
return a
5 tries, average time 0.27 sec
def set_approach(a,b):
return list(set(a)-set(b))
5 tries, average time 0.0057 sec
Also I made another measurement with bigger inputs size for the last two functions
a = range(1,500000)
b = range(1,100000)
And the results:
For modification (remove method) - average time is 252 seconds
For set approach - average time is 0.75 seconds
So you can see that approach with sets is significantly faster than others. Yes, it doesn't keep similar items, but if you don't need it - it's for you.
And there is almost no difference between list comprehension and using filter function. Using 'remove' is ~50 times faster, but it modifies source list.
And the best choice is using sets - it's more than 1000 times faster than list comprehension!

If you don't have repeated values, you could use set difference.
x = set(range(10))
y = x - set([2, 3, 7])
# y = set([0, 1, 4, 5, 6, 8, 9])
and then convert back to list, if needed.

a = range(1,10)
itemsToRemove = set([2, 3, 7])
b = filter(lambda x: x not in itemsToRemove, a)
or
b = [x for x in a if x not in itemsToRemove]
Don't create the set inside the lambda or inside the comprehension. If you do, it'll be recreated on every iteration, defeating the point of using a set at all.

The simplest way is
>>> a = range(1, 10)
>>> for x in [2, 3, 7]:
... a.remove(x)
...
>>> a
[1, 4, 5, 6, 8, 9]
One possible problem here is that each time you call remove(), all the items are shuffled down the list to fill the hole. So if a grows very large this will end up being quite slow.
This way builds a brand new list. The advantage is that we avoid all the shuffling of the first approach
>>> removeset = set([2, 3, 7])
>>> a = [x for x in a if x not in removeset]
If you want to modify a in place, just one small change is required
>>> removeset = set([2, 3, 7])
>>> a[:] = [x for x in a if x not in removeset]

Others have suggested ways to make newlist after filtering e.g.
newl = [x for x in l if x not in [2,3,7]]
or
newl = filter(lambda x: x not in [2,3,7], l)
but from your question it looks you want in-place modification for that you can do this, this will also be much much faster if original list is long and items to be removed less
l = range(1,10)
for o in set([2,3,7,11]):
try:
l.remove(o)
except ValueError:
pass
print l
output:
[1, 4, 5, 6, 8, 9]
I am checking for ValueError exception so it works even if items are not in orginal list.
Also if you do not need in-place modification solution by S.Mark is simpler.

>>> a=range(1,10)
>>> for i in [2,3,7]: a.remove(i)
...
>>> a
[1, 4, 5, 6, 8, 9]
>>> a=range(1,10)
>>> b=map(a.remove,[2,3,7])
>>> a
[1, 4, 5, 6, 8, 9]

Transforming nested Python loops into list comprehensions

I've started working on some Project Euler problems, and have solved number 4 with a simple brute force solution:
def mprods(a,b):
c = range(a,b)
f = []
for d in c:
for e in c:
f.append(d*e)
return f
max([z for z in mprods(100,1000) if str(z)==(''.join([str(z)[-i] for i in range(1,len(str(z))+1)]))])
After solving, I tried to make it as compact as possible, and came up with that horrible bottom line!
Not to leave something half-done, I am trying to condense the mprods function into a list comprehension. So far, I've come up with these attempts:
[d*e for d,e in (range(a,b), range(a,b))]
Obviously completely on the wrong track. :-)
[d*e for x in [e for e in range(1,5)] for d in range(1,5)]
This gives me [4, 8, 12, 16, 4, 8, 12, 16, 4, 8, 12, 16, 4, 8, 12, 16], where I expect
[1, 2, 3, 4, 2, 4, 6, 8, 3, 6, 9, 12, 4, 8, 12, 16] or similar.
Any Pythonistas out there that can help? :)

c = range(a, b)
print [d * e for d in c for e in c]

from itertools import product
def palindrome(i):
return str(i) == str(i)[::-1]
x = xrange(900,1000)
max(a*b for (a,b) in (product(x,x)) if palindrome(a*b))
xrange(900,1000) is like range(900,1000) but instead of returning a list it returns an object that generates the numbers in the range on demand. For looping, this is slightly faster than range() and more memory efficient.
product(xrange(900,1000),xrange(900,1000)) gives the Cartesian product of the input iterables. It is equivalent to nested for-loops. For example, product(A, B) returns the same as: ((x,y) for x in A for y in B). The leftmost iterators are in the outermost for-loop, so the output tuples cycle in a manner similar to an odometer (with the rightmost element changing on every iteration).
product('ab', range(3)) --> ('a',0) ('a',1) ('a',2) ('b',0) ('b',1) ('b',2)
product((0,1), (0,1), (0,1)) --> (0,0,0) (0,0,1) (0,1,0) (0,1,1) (1,0,0) ...
str(i)[::-1] is list slicing shorthand to reverse a list.
Note how everything is wrapped in a generator expression, a high performance, memory efficient generalization of list comprehensions and generators.
Also note that the largest palindrome made from the product of two 2-digit numbers is made from the numbers 91 99, two numbers in the range(90,100). Extrapolating to 3-digit numbers you can use range(900,1000).

I think you'll like this one-liner (formatted for readability):
max(z for z in (d*e
for d in xrange(100, 1000)
for e in xrange(100, 1000))
if str(z) == str(z)[::-1])
Or slightly changed:
c = range(100, 1000)
max(z for z in (d*e for d in c for e in c) if str(z) == str(z)[::-1])
Wonder how many parens that would be in Lisp...

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Flattening nested generator expressions - python

Instead of using chain(g), you can use chain.from_iterable: >>> g = ((3i + j for j in range(3)) for i in range(3)) >>> list(itertools.chain(g)) [6, 7, 8, 6, 7, 8, 6, 7, 8] >>> g = ((3i + j for j in range(3)) for i in range(3)) >>> list(itertools.chain.from_iterable(g)) [0, 1, 2, 3, 4, 5, 6, 7, 8]

How about this: [x for y in g for x in y] Which yields: [0, 1, 2, 3, 4, 5, 6, 7, 8]

Related

How to calculate a cumulative product of a list using list comprehension

Why does using "and" between two lists give me the second list (as opposed to an error)? [duplicate]

concatenate an arbitrary number of lists in a function in Python

Remove all values within one list from another list? [duplicate]

Transforming nested Python loops into list comprehensions

Categories

Resources