Related
By nested 2-tuples, I mean something like this: ((a,b),(c,(d,e))) where all tuples have two elements. I don't need different orderings of the elements, just the different ways of putting parentheses around them. For items = [a, b, c, d], there are 5 unique pairings, which are:
(((a,b),c),d)
((a,(b,c)),d)
(a,((b,c),d))
(a,(b,(c,d)))
((a,b),(c,d))
In a perfect world I'd also like to have control over the maximum depth of the returned tuples, so that if I generated all pairings of items = [a, b, c, d] with max_depth=2, it would only return ((a,b),(c,d)).
This problem turned up because I wanted to find a way to generate the results of addition on non-commutative, non-associative numbers. If a+b doesn't equal b+a, and a+(b+c) doesn't equal (a+b)+c, what are all the possible sums of a, b, and c?
I have made a function that generates all pairings, but it also returns duplicates.
import itertools
def all_pairings(items):
if len(items) == 2:
yield (*items,)
else:
for i, pair in enumerate(itertools.pairwise(items)):
for pairing in all_pairings(items[:i] + [pair] + items[i+2:]):
yield pairing
For example, it returns ((a,b),(c,d)) twice for items=[a, b, c, d], since it pairs up (a,b) first in one case and (c,d) first in the second case.
Returning duplicates becomes a bigger and bigger problem for larger numbers of items. With duplicates, the number of pairings grows factorially, and without duplicates it grows exponentially, according to the Catalan Numbers (https://oeis.org/A000108).
n
With duplicates: (n-1)!
Without duplicates: (2(n-1))!/(n!(n-1)!)
1
1
1
2
1
1
3
2
2
4
6
5
5
24
14
6
120
42
7
720
132
8
5040
429
9
40320
1430
10
362880
4862
Because of this, I have been trying to come up with an algorithm that doesn't need to search through all the possibilities, only the unique ones. Again, it would also be nice to have control over the maximum depth, but that could probably be added to an existing algorithm. So far I've been unsuccessful in coming up with an approach, and I also haven't found any resources that cover this specific problem. I'd appreciate any help or links to helpful resources.
Using a recursive generator:
items = ['a', 'b', 'c', 'd']
def split(l):
if len(l) == 1:
yield l[0]
for i in range(1, len(l)):
for a in split(l[:i]):
for b in split(l[i:]):
yield (a, b)
list(split(items))
Output:
[('a', ('b', ('c', 'd'))),
('a', (('b', 'c'), 'd')),
(('a', 'b'), ('c', 'd')),
(('a', ('b', 'c')), 'd'),
((('a', 'b'), 'c'), 'd')]
Check of uniqueness:
assert len(list(split(list(range(10))))) == 4862
Reversed order of the items:
items = ['a', 'b', 'c', 'd']
def split(l):
if len(l) == 1:
yield l[0]
for i in range(len(l)-1, 0, -1):
for a in split(l[:i]):
for b in split(l[i:]):
yield (a, b)
list(split(items))
[((('a', 'b'), 'c'), 'd'),
(('a', ('b', 'c')), 'd'),
(('a', 'b'), ('c', 'd')),
('a', (('b', 'c'), 'd')),
('a', ('b', ('c', 'd')))]
With maxdepth:
items = ['a', 'b', 'c', 'd']
def split(l, maxdepth=None):
if len(l) == 1:
yield l[0]
elif maxdepth is not None and maxdepth <= 0:
yield tuple(l)
else:
for i in range(1, len(l)):
for a in split(l[:i], maxdepth=maxdepth and maxdepth-1):
for b in split(l[i:], maxdepth=maxdepth and maxdepth-1):
yield (a, b)
list(split(items))
# or
list(split(items, maxdepth=3))
# or
list(split(items, maxdepth=2))
[('a', ('b', ('c', 'd'))),
('a', (('b', 'c'), 'd')),
(('a', 'b'), ('c', 'd')),
(('a', ('b', 'c')), 'd'),
((('a', 'b'), 'c'), 'd')]
list(split(items, maxdepth=1))
[('a', ('b', 'c', 'd')),
(('a', 'b'), ('c', 'd')),
(('a', 'b', 'c'), 'd')]
list(split(items, maxdepth=0))
[('a', 'b', 'c', 'd')]
Full-credit to mozway for the algorithm - my original idea was to represent the pairing in reverse-polish notation, which would not have lent itself to the following optimizations:
First, we replace the two nested loops:
for a in split(l[:i]):
for b in split(l[i:]):
yield (a, b)
-with itertools.product, which will itself cache the results of the inner split(...) call, as well as produce the pairing in internal C code, which will run much faster.
yield from product(split(l[:i]), split(l[i:]))
Next, we cache the results of the previous split(...) calls. To do this we must sacrifice the laziness of generators, as well as ensure that our function parameters are hashable. Explicitly, this means creating a wrapper that casts the input list to a tuple, and to modify the function body to return lists instead of yielding.
def split(l):
return _split(tuple(l))
def _split(l):
if len(l) == 1:
return l[:1]
res = []
for i in range(1, len(l)):
res.extend(product(_split(l[:i]), _split(l[i:])))
return res
We then decorate the function with functools.cache, to perform the caching. So putting it all together:
from itertools import product
from functools import cache
def split(l):
return _split(tuple(l))
#cache
def _split(l):
if len(l) == 1:
return l[:1]
res = []
for i in range(1, len(l)):
res.extend(product(_split(l[:i]), _split(l[i:])))
return res
Testing for following input-
test = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n']`
-produces the following timings:
Original: 5.922573089599609
Revised: 0.08888077735900879
I did also verify that the results matched the original exactly- order and all.
Again, full credit to mozway for the algorithm. I've just applied a few optimizations to speed it up a bit.
I currently want all permutations of a set of elements with replacement.
Example:
elements = ['a', 'b']
permutations with replacement =
[('a', 'a', 'a'),
('a', 'a', 'b'),
('a', 'b', 'a'),
('a', 'b', 'b'),
('b', 'a', 'a'),
('b', 'a', 'b'),
('b', 'b', 'a'),
('b', 'b', 'b')]
The only way I have been able to do this is so far is with itertools.product as follows:
import itertools as it
sample_space = ['a', 'b']
outcomes = it.product(sample_space, sample_space, sample_space)
list(outcomes)
I am just wondering if there is a better way to do this as it obvious that this can get unwieldy and error prone as the sample space and required length gets larger
was expecting to find something along the lines of itertools.permutations(['a', 'b'], length=3, replace=True) maybe?
I tried itertools.permutations but the only arguments are iterable, and r which is the length required.
The output for the above example using it.permutations(sample_space, 3) would be an empty list []
If you're sampling with replacement, what you get is by definition not a permutation (which just means "rearrangement") of the set. So I wouldn't look to the permutations function for this. product is the right thing; e.g. itertools.product(['a','b'], repeat=3).
Note that if you sample from a two-element set with replacement N times, then you have effectively created an N-digit binary number, with all the possibilities that go with it. You've just swapped in 'a' and 'b' for 0 and 1.
I have an array. I want to generate all permutations from that array, including single element, repeated element, change the order, etc. For example, say I have this array:
arr = ['A', 'B', 'C']
And if I use the itertools module by doing this:
from itertools import permutations
perms = [''.join(p) for p in permutations(['A','B','C'])]
print(perms)
Or using loop like this:
def permutations(head, tail=''):
if len(head) == 0:
print(tail)
else:
for i in range(len(head)):
permutations(head[:i] + head[i+1:], tail + head[i])
arr= ['A', 'B', 'C']
permutations(arr)
I only get:
['ABC', 'ACB', 'BAC', 'BCA', 'CAB', 'CBA']
But what I want is:
['A', 'B', 'C',
'AA', 'AB', 'AC', 'BB', 'BA', 'BC', 'CA', 'CB', 'CC',
'AAA', 'AAB', 'AAC', 'ABA', 'ABB', 'ACA', 'ACC', 'BBB', 'BAA', 'BAB', 'BAC', 'CCC', 'CAA', 'CCA'.]
The result is all permutations from the array given. Since the array is 3 element and all the element can be repetitive, so it generates 3^3 (27) ways. I know there must be a way to do this but I can't quite get the logic right.
A generator that would generate all sequences as you describe (which has infinite length if you would try to exhaust it):
from itertools import product
def sequence(xs):
n = 1
while True:
yield from (product(xs, repeat=n))
n += 1
# example use: print first 100 elements from the sequence
s = sequence('ABC')
for _ in range(100):
print(next(s))
Output:
('A',)
('B',)
('C',)
('A', 'A')
('A', 'B')
('A', 'C')
('B', 'A')
('B', 'B')
('B', 'C')
('C', 'A')
('C', 'B')
('C', 'C')
('A', 'A', 'A')
('A', 'A', 'B')
('A', 'A', 'C')
('A', 'B', 'A')
...
Of course, if you don't want tuples, but strings, just replace the next(s) with ''.join(next(s)), i.e.:
print(''.join(next(s)))
If you don't want the sequences to exceed the length of the original collection:
from itertools import product
def sequence(xs):
n = 1
while n <= len(xs):
yield from (product(xs, repeat=n))
n += 1
for element in sequence('ABC'):
print(''.join(element))
Of course, in that limited case, this will do as well:
from itertools import product
xs = 'ABC'
for s in (''.join(x) for n in range(len(xs)) for x in product(xs, repeat=n+1)):
print(s)
Edit: In the comments, OP asked for an explanation of the yield from (product(xs, repeat=n)) part.
product() is a function in itertools that generates the cartesian product of iterables, which is a fancy way to say that you get all possible combinations of elements from the first iterable, with elements from the second etc.
Play around with it a bit to get a better feel for it, but for example:
list(product([1, 2], [3, 4])) == [(1, 3), (1, 4), (2, 3), (2, 4)]
If you take the product of an iterable with itself, the same happens, for example:
list(product('AB', 'AB')) == [('A', 'A'), ('A', 'B'), ('B', 'A'), ('B', 'B')]
Note that I keep calling product() with list() around it here, that's because product() returns a generator and passing the generator to list() exhausts the generator into a list, for printing.
The final step with product() is that you can also give it an optional repeat argument, which tells product() to do the same thing, but just repeat the iterable a certain number of times. For example:
list(product('AB', repeat=2)) == [('A', 'A'), ('A', 'B'), ('B', 'A'), ('B', 'B')]
So, you can see how calling product(xs, repeat=n) will generate all the sequences you're after, if you start at n=1 and keep exhausting it for ever greater n.
Finally, yield from is a way to yield results from another generator one at a time in your own generator. For example, yield from some_gen is the same as:
for x in some_gen:
yield x
So, yield from (product(xs, repeat=n)) is the same as:
for p in (product(xs, repeat=n)):
yield p
I am trying to use the permutation feature from itertools, then I noticed. If I try to unpack/read the data from permutation, it changes some attribute info
from itertools import permutations
a = permutations('abc')
print(('a', 'b', 'c') in a)
for x in a:
print(x)
print(('a', 'b', 'c') in a)
for x in a:
print(x)
Output:
True
('a', 'c', 'b')
('b', 'a', 'c')
('b', 'c', 'a')
('c', 'a', 'b')
('c', 'b', 'a')
False
How come does this happen? I checked out the official page, and cannot find any clue.
My environment is pycharm with python 3.7.4
As others aready said, the problem is that a is not a list, but a generator; that is, a sequence that gets used up as you iterate over it -- hence you can only iterate over it once.
If you look carefully, you'll see that your first print loop only printed five of the six permutations; the first permutation disappeared when you checked it against ('a', 'b', 'c') in your first print statement. The for-loop then prints out what's left, and the rest of your code is trying to drink from an empty cup.
To get the behavior you expect, make a into a list like this:
a = list(permutations('abc'))
And when you get a chance, read up on generators, iterators, and "comprehensions"; they're everywhere in Python (often hidden in plain sight), and they're great.
Get rid from generator and convert the output to list since the comparison is vanishing because it used already. itertools.permutation is just an iterator which is shifting to next when you use one value in comparison.
CODE:
from itertools import permutations
a = list(permutations('abc'))
print(('a', 'b', 'c') in a)
for x in a:
print(x)
print(('a', 'b', 'c') in a)
for x in a:
print(x)
OUTPUT:
True
('a', 'b', 'c')
('a', 'c', 'b')
('b', 'a', 'c')
('b', 'c', 'a')
('c', 'a', 'b')
('c', 'b', 'a')
True
('a', 'b', 'c')
('a', 'c', 'b')
('b', 'a', 'c')
('b', 'c', 'a')
('c', 'a', 'b')
('c', 'b', 'a')
I wish to have a function, subset(("A","b","C","D"),3), which gives the following output:
("A","b","C")
("A","b","D")
("A","C","D")
("b","C","D")
How might I do this in python 3?
The itertools.combinations function was built explicitly for this purpose:
>>> from itertools import combinations
>>> list(combinations(("A","b","C","D"), 3))
[('A', 'b', 'C'), ('A', 'b', 'D'), ('A', 'C', 'D'), ('b', 'C', 'D')]
>>>
From the docs:
itertools.combinations(iterable, r)
Return r length subsequences of elements from the input iterable.