Cartesian products of lists without duplicates

Cartesian products of lists without duplicates - python

Given an array a=['a','b','c'], how would you go about returning the Cartesian product of the array without duplicates. Example:
[['a', 'a' , 'a' ,'a']
['a', 'a' , 'a' ,'b']
['a', 'a' , 'a' ,'c']
['a', 'a' , 'b' ,'b']
['a', 'a' , 'b' ,'c']
['a', 'a' , 'c' ,'c']
...etc..]
Following How to generate all permutations of a list in Python, I tried :
print list(itertools.permutations(['a', 'b' , 'c'], 4))
[]
print list(itertools.product(['a', 'b' , 'c'], repeat=4)
But I get the Cartesian product with duplicates. For example the list will contain both ['a','a','b','b'] and ['a','b','b','a'] which are clearly the equal.
Note: my 'a','b','c' are variables which store numbers say 1,2,3. So after getting the list of combinations of the letters, I would need to: say,
['a','b','c','c'] ----> a*b*c*c = 1*2*3*3 = 18
What is the fastest way of doing this in python? Would it be possible/faster to do it with numpy??
Thanks!

Maybe you actually want combinations_with_replacement?
>>> from itertools import combinations_with_replacement
>>> a = ['a', 'b', 'c']
>>> c = combinations_with_replacement(a, 4)
>>> for x in c:
... print x
...
('a', 'a', 'a', 'a')
('a', 'a', 'a', 'b')
('a', 'a', 'a', 'c')
('a', 'a', 'b', 'b')
('a', 'a', 'b', 'c')
('a', 'a', 'c', 'c')
('a', 'b', 'b', 'b')
('a', 'b', 'b', 'c')
('a', 'b', 'c', 'c')
('a', 'c', 'c', 'c')
('b', 'b', 'b', 'b')
('b', 'b', 'b', 'c')
('b', 'b', 'c', 'c')
('b', 'c', 'c', 'c')
('c', 'c', 'c', 'c')
Without more information about how you're mapping strings to numbers I can't comment on your second question, but writing your own product function or using numpy's isn't too difficult.

Edit: Don't use this; use the other answer
If your original set is guaranteed uniqueness, then the `combinations_with_replacement` solution will work. If not, you can first pass it through `set()` to get it down to unique variables. Regarding the product, assuming you have the values stored in a dictionary `values` and that all the variables are valid python identifiers, you can do something like the following
combos = combinations_with_replacement(a, 4)
product_strings = ['*'.join(c) for c in combos]
products = [eval(s, globals(), values) for s in product_strings]
Needless to say, be very careful with eval. Only use this solution if you are creating the list a.
Example exploit: a = ['from os import', '; system("rm -rf .");']

Related

Need to permute list with order mattering after first two elements

Suppose l = ['a', 'b', 'c', 'd'] ...
I need to generate the following combinations/permutations from this list (in general, the list could have. more elements):
['a', 'b', 'c', 'd']
['a', 'b', 'd', 'c']
['a', 'c', 'b', 'd']
['a', 'c', 'd', 'b']
['a', 'd', 'b', 'c']
['a', 'd', 'c', 'b']
['b', 'c', 'a', 'd']
['b', 'c', 'd', 'a']
['b', 'd', 'a', 'c']
['b', 'd', 'c', 'a']
['c', 'd', 'a', 'b']
['c', 'd', 'b', 'a']
So, for the first two positions in the list order does not matter, although I need to take all combinations of list elements, while in the last two (or n) positions of the list order does matter. I've tried various combinations of using permutations and combinations from itertools, all with no success (I dare not post my code for fear of embarrassment).

The most direct solution using the existing itertools library functions is to select the first two elements as a combination, and then the rest as a permutation of the remaining elements:
import itertools
def partly_unordered_permutations(lst, k):
elems = set(lst)
for c in itertools.combinations(lst, k):
for d in itertools.permutations(elems - set(c)):
yield c + d
Usage:
>>> for p in partly_unordered_permutations('abcd', 2):
... print(p)
...
('a', 'b', 'c', 'd')
('a', 'b', 'd', 'c')
('a', 'c', 'b', 'd')
('a', 'c', 'd', 'b')
('a', 'd', 'b', 'c')
('a', 'd', 'c', 'b')
('b', 'c', 'a', 'd')
('b', 'c', 'd', 'a')
('b', 'd', 'a', 'c')
('b', 'd', 'c', 'a')
('c', 'd', 'a', 'b')
('c', 'd', 'b', 'a')

Combinations of single elements from respective sublists [duplicate]

This question already has answers here:
How to get the cartesian product of multiple lists
(17 answers)
Closed 3 years ago.
If I have the list my_list = [['a', 'b'], ['c', 'd', 'e'], how can I create a list of all possible tuples where each tuple contains a single element from each sublist? For example:
('a', 'e') is valid
('a', 'b') is invalid
('b', 'c') is valid
('c', 'd') is invalid
Importantly, my_list can contain any number of elements (sublists) and each sublist can be of any length. I tried to get a recursive generator off the ground, but it's not quite there.
I’d like to try and use recursion rather than itertools.
The logic was to iterate through 2 sublists at a time, and store those results as input to the next sublist's expansion.
def foil(lis=l):
if len(lis) == 2:
for x in l[0]:
for y in l[1]:
yield x + y
else:
for p in foil(l[:-1]):
yield p
for i in foil():
print(i)
However, this obviously only works for the len(my_list) == 2. It also needs to work with, say, my_list = [['a'], ['b'], ['c', 'd']] which would return:
('a', 'b', 'c')
('a', 'b', 'd')
Cheers!

Use itertools.product:
import itertools
my_list = [['a', 'b'], ['c', 'd', 'e']]
print(list(itertools.product(*my_list)))
# [('a', 'c'), ('a', 'd'), ('a', 'e'), ('b', 'c'), ('b', 'd'), ('b', 'e')]
my_list = [['a'], ['b'], ['c', 'd']]
print(list(itertools.product(*my_list)))
#[('a', 'b', 'c'), ('a', 'b', 'd')]

Fastest way to find unique combinations of list

I'm trying to solve the general problem of getting the unique combinations from a list in Python
Mathematically from https://www.mathsisfun.com/combinatorics/combinations-permutations-calculator.html I can see that the formula for the number of combinations is n!/r!(n-r)! where n is the length of the sequence and r is the number to choose.
As shown by the following python where n is 4 and r is 2:
lst = 'ABCD'
result = list(itertools.combinations(lst, len(lst)/2))
print len(result)
6
The following is a helper function to show the issue I have:
def C(lst):
l = list(itertools.combinations(sorted(lst), len(lst)/2))
s = set(l)
print 'actual', len(l), l
print 'unique', len(s), list(s)
If I run this from iPython I can call it thus:
In [41]: C('ABCD')
actual 6 [('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D'), ('C', 'D')]
unique 6 [('B', 'C'), ('C', 'D'), ('A', 'D'), ('A', 'B'), ('A', 'C'), ('B', 'D')]
In [42]: C('ABAB')
actual 6 [('A', 'A'), ('A', 'B'), ('A', 'B'), ('A', 'B'), ('A', 'B'), ('B', 'B')]
unique 3 [('A', 'B'), ('A', 'A'), ('B', 'B')]
In [43]: C('ABBB')
actual 6 [('A', 'B'), ('A', 'B'), ('A', 'B'), ('B', 'B'), ('B', 'B'), ('B', 'B')]
unique 2 [('A', 'B'), ('B', 'B')]
In [44]: C('AAAA')
actual 6 [('A', 'A'), ('A', 'A'), ('A', 'A'), ('A', 'A'), ('A', 'A'), ('A', 'A')]
unique 1 [('A', 'A')]
What I want to get is the unique count as shown above but doing a combinations and then set doesn't scale.
As when the length of lst which is n gets longer it slows down as the combinations get greater and greater.
Is there a way of using math or Python tricks to to solve the issue of counting the unique combinations ?

Here's some Python code based on the generating function approach outlined in this Math Forum article. For each letter appearing in the input we create a polynomial 1 + x + x^2 + ... + x^k, where k is the number of times that the letter appears. We then multiply those polynomials together: the nth coefficient of the resulting polynomial then tells you how many combinations of length n there are.
We'll represent a polynomial simply as a list of its (integer) coefficients, with the first coefficient representing the constant term, the next coefficient representing the coefficient of x, and so on. We'll need to be able to multiply such polynomials, so here's a function for doing so:
def polymul(p, q):
"""
Multiply two polynomials, represented as lists of coefficients.
"""
r = [0]*(len(p) + len(q) - 1)
for i, c in enumerate(p):
for j, d in enumerate(q):
r[i+j] += c*d
return r
With the above in hand, the following function computes the number of combinations:
from collections import Counter
from functools import reduce
def ncombinations(it, k):
"""
Number of combinations of length *k* of the elements of *it*.
"""
counts = Counter(it).values()
prod = reduce(polymul, [[1]*(count+1) for count in counts], [1])
return prod[k] if k < len(prod) else 0
Testing this on your examples:
>>> ncombinations("abcd", 2)
6
>>> ncombinations("abab", 2)
3
>>> ncombinations("abbb", 2)
2
>>> ncombinations("aaaa", 2)
1
And on some longer examples, demonstrating that this approach is feasible even for long-ish inputs:
>>> ncombinations("abbccc", 3) # the math forum example
6
>>> ncombinations("supercalifragilisticexpialidocious", 10)
334640
>>> from itertools import combinations # double check ...
>>> len(set(combinations(sorted("supercalifragilisticexpialidocious"), 10)))
334640
>>> ncombinations("supercalifragilisticexpialidocious", 20)
1223225
>>> ncombinations("supercalifragilisticexpialidocious", 34)
1
>>> ncombinations("supercalifragilisticexpialidocious", 35)
0
>>> from string import printable
>>> ncombinations(printable, 50) # len(printable)==100
100891344545564193334812497256
>>> from math import factorial
>>> factorial(100)//factorial(50)**2 # double check the result
100891344545564193334812497256
>>> ncombinations("abc"*100, 100)
5151
>>> factorial(102)//factorial(2)//factorial(100) # double check (bars and stars)
5151

Start with a regular recursive definition of combinations() but add a test to only recurse when the lead value at that level hasn't been used before:
def uniq_comb(pool, r):
""" Return an iterator over a all distinct r-length
combinations taken from a pool of values that
may contain duplicates.
Unlike itertools.combinations(), element uniqueness
is determined by value rather than by position.
"""
if r:
seen = set()
for i, item in enumerate(pool):
if item not in seen:
seen.add(item)
for tail in uniq_comb(pool[i+1:], r-1):
yield (item,) + tail
else:
yield ()
if __name__ == '__main__':
from itertools import combinations
pool = 'ABRACADABRA'
for r in range(len(pool) + 1):
assert set(uniq_comb(pool, r)) == set(combinations(pool, r))
assert dict.fromkeys(uniq_comb(pool, r)) == dict.fromkeys(combinations(pool, r))

It seems that this is called a multiset combination. I've faced the same problem and finally came up rewriting a function from sympy (here).
Instead of passing your iterable to something like itertools.combinations(p, r), you pass collections.Counter(p).most_common() to the following function to directly retrieve distinct combinations. It's a lot faster than filtering all combinations and also memory safe!
def counter_combinations(g, n):
if sum(v for k, v in g) < n or not n:
yield []
else:
for i, (k, v) in enumerate(g):
if v >= n:
yield [k]*n
v = n - 1
for v in range(min(n, v), 0, -1):
for j in counter_combinations(g[i + 1:], n - v):
rv = [k]*v + j
if len(rv) == n:
yield rv
Here is an example:
from collections import Counter
p = Counter('abracadabra').most_common()
print(p)
c = [_ for _ in counter_combinations(p, 4)]
print(c)
print(len(c))
Output:
[('a', 5), ('b', 2), ('r', 2), ('c', 1), ('d', 1)]
[['a', 'a', 'a', 'a'], ['a', 'a', 'a', 'b'], ['a', 'a', 'a', 'r'], ['a', 'a', 'a', 'c'], ['a', 'a', 'a', 'd'], ['a', 'a', 'b', 'b'], ['a', 'a', 'b', 'r'], ['a', 'a', 'b', 'c'], ['a', 'a', 'b', 'd'], ['a', 'a', 'r', 'r'], ['a', 'a', 'r', 'c'], ['a', 'a', 'r', 'd'], ['a', 'a', 'c', 'd'], ['a', 'b', 'b', 'r'], ['a', 'b', 'b', 'c'], ['a', 'b', 'b', 'd'], ['a', 'b', 'r', 'r'], ['a', 'b', 'r', 'c'], ['a', 'b', 'r', 'd'], ['a', 'b', 'c', 'd'], ['a', 'r', 'r', 'c'], ['a', 'r', 'r', 'd'], ['a', 'r', 'c', 'd'], ['b', 'b', 'r', 'r'], ['b', 'b', 'r', 'c'], ['b', 'b', 'r', 'd'], ['b', 'b', 'c', 'd'], ['b', 'r', 'r', 'c'], ['b', 'r', 'r', 'd'], ['b', 'r', 'c', 'd'], ['r', 'r', 'c', 'd']]
31

List of possible combinations of characters with OR statement

I managed to generate a list of all possible combinations of characters 'a', 'b' and 'c' (code below). Now I want to add a fourth character, which can be either 'd' or 'f' but NOT both in the same combination. How could I achieve this ?
items = ['a', 'b', 'c']
from itertools import permutations
for p in permutations(items):
print(p)
('a', 'b', 'c')
('a', 'c', 'b')
('b', 'a', 'c')
('b', 'c', 'a')
('c', 'a', 'b')
('c', 'b', 'a')

Created a new list items2 for d and f. Assuming that OP needs all combinations of [a,b,c,d] and [a,b,c,f]
items1 = ['a', 'b', 'c']
items2 = ['d','f']
from itertools import permutations
for x in items2:
for p in permutations(items1+[x]):
print(p)

A variation on #Van Peer's solution. You can modify the extended list in-place:
from itertools import permutations
items = list('abc_')
for items[3] in 'dg':
for p in permutations(items):
print(p)

itertools.product is suitable for representing these distinct groups in a way which generalizes well. Just let the exclusive elements belong to the same iterable passed to the Cartesian product.
For instance, to get a list with the items you're looking for,
from itertools import chain, permutations, product
list(chain.from_iterable(map(permutations, product(*items, 'df'))))
# [('a', 'b', 'c', 'd'),
# ('a', 'b', 'd', 'c'),
# ('a', 'c', 'b', 'd'),
# ('a', 'c', 'd', 'b'),
# ('a', 'd', 'b', 'c'),
# ('a', 'd', 'c', 'b'),
# ('b', 'a', 'c', 'd'),
# ('b', 'a', 'd', 'c'),
# ('b', 'c', 'a', 'd'),
# ('b', 'c', 'd', 'a'),
# ('b', 'd', 'a', 'c'),
# ('b', 'd', 'c', 'a'),
# ('c', 'a', 'b', 'd'),
# ('c', 'a', 'd', 'b'),
# ...

like this for example
items = ['a', 'b', 'c','d']
from itertools import permutations
for p in permutations(items):
print(p)
items = ['a', 'b', 'c','f']
from itertools import permutations
for p in permutations(items):
print(p)

How to turn multiple lists into a list of sublists where each sublist is made up of the same index items across all lists?

How to turn multiple lists into one list of sublists, where each sublist is made up of the items at the same index across the original lists?
lsta = ['a','b','c','d']
lstb = ['a','b','c','d']
lstc = ['a','b','c','d']
Desired_List = [['a','a','a'],['b','b','b'],['c','c','c'],['d','d','d']]
I can't seem to use zip here, so how would I do this?

List of list will give like this:
>>> [list(x) for x in zip(lsta, lstb, lstc)]
[['a', 'a', 'a'], ['b', 'b', 'b'], ['c', 'c', 'c'], ['d', 'd', 'd']]
>>>

Using zip, under duress:
>>> zip(lsta, lstb, lstc)
[('a', 'a', 'a'), ('b', 'b', 'b'), ('c', 'c', 'c'), ('d', 'd', 'd')]
If Python 3, you'll need to convert the zip to a list:
>>> list(zip(lsta, lstb, lstc))
[('a', 'a', 'a'), ('b', 'b', 'b'), ('c', 'c', 'c'), ('d', 'd', 'd')]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Cartesian products of lists without duplicates - python

Related

Need to permute list with order mattering after first two elements

Combinations of single elements from respective sublists [duplicate]

Fastest way to find unique combinations of list

List of possible combinations of characters with OR statement

How to turn multiple lists into a list of sublists where each sublist is made up of the same index items across all lists?

Categories

Resources