Generate list of all combinations and maintain index position - python

I'm looking for a method to generate a list of all combinations with its current index kept maintained:
So far i've been using this method:
stuff = ['A', 'B', 'C']
for L in range(0, len(stuff)+1):
for subset in itertools.combinations(stuff, L):
a = subset
list.append(a)
which gives:
[(),
('A',),
('B',),
('C',),
('A', 'B'),
('A', 'C'),
('B', 'C'),
('A', 'B', 'C')]
What I'm looking for is a solution that gives or can be convertet to the string below:
[(0, 0, 0),
('A', 0, 0),
('B', 0, 0),
('C', 0, 0),
('A', 'B', 0),
('A', 0, 'C'),
(0, 'B', 'C'),
('A', 'B', 'C')]
Best,
Christian

Instead of directly appending the result, you can generate a new list in the desired format using a list comprehension [i if i in subset else 0 for i in stuff]
Here you go:
import itertools
stuff, result = ['A', 'B', 'C'], []
for L in range(0, len(stuff)+1):
for subset in itertools.combinations(stuff, L):
result.append([i if i in subset else 0 for i in stuff])
And now if you check the result,
>>> print result
[[0, 0, 0], ['A', 0, 0], [0, 'B', 0], [0, 0, 'C'], ['A', 'B', 0], ['A', 0, 'C'], [0, 'B', 'C'], ['A', 'B', 'C']]

Related

Calculating total and relative frequency of values in a dict representing a Markov-chain rule

I have made a function make_rule(text, scope=1) that simply goes over a string and generates a dictionary that serves as a rule for a Markovian text-generator (where the scope is the number of linked characters, not words).
>>> rule = make_rule("abbcad", 1)
>>> rule
{'a': ['b', 'd'], 'b': ['b', 'c'], 'c': ['a']}
I have been tasked with calculating the entropy of this system. In order to do that I think I would need to know:
How often a value appears in the dictionary in total, i.e. its total frequency.
How often a value appears given a key in the dictionary, i.e. its relative frequency.
Is there a quick way to get both of these numbers for each of the values in the dictionary?
For the above example I would need this output:
'a' total: 1, 'a'|'a': 0, 'a'|'b': 0, 'a'|'c': 1
'b' total: 2, 'b'|'a': 1, 'b'|'b': 1, 'b'|'c': 0
'c' total: 1, 'c'|'a': 0, 'c'|'b': 1, 'c'|'c': 0
'd' total: 1, 'd'|'a': 1, 'a'|'b': 1, 'a'|'c': 1
I guess the 'a' total is easily inferred, so maybe instead just output a list of triples for every unique item that appears in the dictionary:
[[('a', 'a', 0), ('a', 'b', 0), ('a', 'c', 1)], [('b', 'a', 1), ('b', 'b', 1), ('b', 'c', 0)], ...]
I'll just deal with "How often a value appears given a key in the dictionary", since you've said that "How often a value appears in the dictionary in total" is easily inferred.
If you just want to be able to look up the relative frequency of a value for a given key, it's easy to get that with a dict of Counter objects:
from collections import Counter
rule = {'a': ['b', 'd'], 'b': ['b', 'c'], 'c': ['a']}
freq = {k: Counter(v) for k, v in rule.items()}
… which gives you a freq like this:
{
'a': Counter({'b': 1, 'd': 1}),
'b': Counter({'b': 1, 'c': 1}),
'c': Counter({'a': 1})
}
… so that you can get the relative frequency of 'a' given the key 'c' like this:
>>> freq['c']['a']
1
Because Counter objects return 0 for nonexistent keys, you'll also get zero frequencies as you would expect:
>>> freq['a']['c']
0
If you need a list of 3-tuples as specified in your question, you can get that with a little extra work. Here's a function to do it:
def triples(rule):
freq = {k: Counter(v) for k, v in rule.items()}
all_values = sorted(set().union(*rule.values()))
sorted_keys = sorted(rule)
return [(v, k, freq[k][v]) for v in all_values for k in sorted_keys]
The only thing here which I think may not be self-explanatory is the all_values = ... line, which:
creates an empty set()
produces the union() of that set with all the individual elements of the lists in rule.values() (note the use of the argument-unpacking * operator)
converts the result into a sorted() list.
If you still have the original text, you can avoid all that work by using e.g. all_values = sorted(set(original_text)) instead.
Here it is in action:
>>> triples({'a': ['b', 'd'], 'b': ['b', 'c'], 'c': ['a']})
[
('a', 'a', 0), ('a', 'b', 0), ('a', 'c', 1),
('b', 'a', 1), ('b', 'b', 1), ('b', 'c', 0),
('c', 'a', 0), ('c', 'b', 1), ('c', 'c', 0),
('d', 'a', 1), ('d', 'b', 0), ('d', 'c', 0)
]
I cannot think of a quick way other than iterating over the word's characters, counting the occurences in each list of the dictionary and summing it in the end:
alphabet = sorted(set("abbcad"))
rule = {'a': ['b', 'd'], 'b': ['b', 'c'], 'c': ['a']}
totalMatrix = []
for elem in alphabet:
total = 0
occurences = []
for key in rule.keys():
currentCount = rule[key].count(elem)
total += currentCount
occurences.append((elem,key,currentCount))
totalMatrix.append([elem, total] + occurences)
for elem in totalMatrix:
print(elem)
The content of totalMatrix will be:
['a', 1, ('a', 'a', 0), ('a', 'b', 0), ('a', 'c', 1)]
['b', 2, ('b', 'a', 1), ('b', 'b', 1), ('b', 'c', 0)]
['c', 1, ('c', 'a', 0), ('c', 'b', 1), ('c', 'c', 0)]
['d', 1, ('d', 'a', 1), ('d', 'b', 0), ('d', 'c', 0)]

Fastest way to find unique combinations of list

I'm trying to solve the general problem of getting the unique combinations from a list in Python
Mathematically from https://www.mathsisfun.com/combinatorics/combinations-permutations-calculator.html I can see that the formula for the number of combinations is n!/r!(n-r)! where n is the length of the sequence and r is the number to choose.
As shown by the following python where n is 4 and r is 2:
lst = 'ABCD'
result = list(itertools.combinations(lst, len(lst)/2))
print len(result)
6
The following is a helper function to show the issue I have:
def C(lst):
l = list(itertools.combinations(sorted(lst), len(lst)/2))
s = set(l)
print 'actual', len(l), l
print 'unique', len(s), list(s)
If I run this from iPython I can call it thus:
In [41]: C('ABCD')
actual 6 [('A', 'B'), ('A', 'C'), ('A', 'D'), ('B', 'C'), ('B', 'D'), ('C', 'D')]
unique 6 [('B', 'C'), ('C', 'D'), ('A', 'D'), ('A', 'B'), ('A', 'C'), ('B', 'D')]
In [42]: C('ABAB')
actual 6 [('A', 'A'), ('A', 'B'), ('A', 'B'), ('A', 'B'), ('A', 'B'), ('B', 'B')]
unique 3 [('A', 'B'), ('A', 'A'), ('B', 'B')]
In [43]: C('ABBB')
actual 6 [('A', 'B'), ('A', 'B'), ('A', 'B'), ('B', 'B'), ('B', 'B'), ('B', 'B')]
unique 2 [('A', 'B'), ('B', 'B')]
In [44]: C('AAAA')
actual 6 [('A', 'A'), ('A', 'A'), ('A', 'A'), ('A', 'A'), ('A', 'A'), ('A', 'A')]
unique 1 [('A', 'A')]
What I want to get is the unique count as shown above but doing a combinations and then set doesn't scale.
As when the length of lst which is n gets longer it slows down as the combinations get greater and greater.
Is there a way of using math or Python tricks to to solve the issue of counting the unique combinations ?
Here's some Python code based on the generating function approach outlined in this Math Forum article. For each letter appearing in the input we create a polynomial 1 + x + x^2 + ... + x^k, where k is the number of times that the letter appears. We then multiply those polynomials together: the nth coefficient of the resulting polynomial then tells you how many combinations of length n there are.
We'll represent a polynomial simply as a list of its (integer) coefficients, with the first coefficient representing the constant term, the next coefficient representing the coefficient of x, and so on. We'll need to be able to multiply such polynomials, so here's a function for doing so:
def polymul(p, q):
"""
Multiply two polynomials, represented as lists of coefficients.
"""
r = [0]*(len(p) + len(q) - 1)
for i, c in enumerate(p):
for j, d in enumerate(q):
r[i+j] += c*d
return r
With the above in hand, the following function computes the number of combinations:
from collections import Counter
from functools import reduce
def ncombinations(it, k):
"""
Number of combinations of length *k* of the elements of *it*.
"""
counts = Counter(it).values()
prod = reduce(polymul, [[1]*(count+1) for count in counts], [1])
return prod[k] if k < len(prod) else 0
Testing this on your examples:
>>> ncombinations("abcd", 2)
6
>>> ncombinations("abab", 2)
3
>>> ncombinations("abbb", 2)
2
>>> ncombinations("aaaa", 2)
1
And on some longer examples, demonstrating that this approach is feasible even for long-ish inputs:
>>> ncombinations("abbccc", 3) # the math forum example
6
>>> ncombinations("supercalifragilisticexpialidocious", 10)
334640
>>> from itertools import combinations # double check ...
>>> len(set(combinations(sorted("supercalifragilisticexpialidocious"), 10)))
334640
>>> ncombinations("supercalifragilisticexpialidocious", 20)
1223225
>>> ncombinations("supercalifragilisticexpialidocious", 34)
1
>>> ncombinations("supercalifragilisticexpialidocious", 35)
0
>>> from string import printable
>>> ncombinations(printable, 50) # len(printable)==100
100891344545564193334812497256
>>> from math import factorial
>>> factorial(100)//factorial(50)**2 # double check the result
100891344545564193334812497256
>>> ncombinations("abc"*100, 100)
5151
>>> factorial(102)//factorial(2)//factorial(100) # double check (bars and stars)
5151
Start with a regular recursive definition of combinations() but add a test to only recurse when the lead value at that level hasn't been used before:
def uniq_comb(pool, r):
""" Return an iterator over a all distinct r-length
combinations taken from a pool of values that
may contain duplicates.
Unlike itertools.combinations(), element uniqueness
is determined by value rather than by position.
"""
if r:
seen = set()
for i, item in enumerate(pool):
if item not in seen:
seen.add(item)
for tail in uniq_comb(pool[i+1:], r-1):
yield (item,) + tail
else:
yield ()
if __name__ == '__main__':
from itertools import combinations
pool = 'ABRACADABRA'
for r in range(len(pool) + 1):
assert set(uniq_comb(pool, r)) == set(combinations(pool, r))
assert dict.fromkeys(uniq_comb(pool, r)) == dict.fromkeys(combinations(pool, r))
It seems that this is called a multiset combination. I've faced the same problem and finally came up rewriting a function from sympy (here).
Instead of passing your iterable to something like itertools.combinations(p, r), you pass collections.Counter(p).most_common() to the following function to directly retrieve distinct combinations. It's a lot faster than filtering all combinations and also memory safe!
def counter_combinations(g, n):
if sum(v for k, v in g) < n or not n:
yield []
else:
for i, (k, v) in enumerate(g):
if v >= n:
yield [k]*n
v = n - 1
for v in range(min(n, v), 0, -1):
for j in counter_combinations(g[i + 1:], n - v):
rv = [k]*v + j
if len(rv) == n:
yield rv
Here is an example:
from collections import Counter
p = Counter('abracadabra').most_common()
print(p)
c = [_ for _ in counter_combinations(p, 4)]
print(c)
print(len(c))
Output:
[('a', 5), ('b', 2), ('r', 2), ('c', 1), ('d', 1)]
[['a', 'a', 'a', 'a'], ['a', 'a', 'a', 'b'], ['a', 'a', 'a', 'r'], ['a', 'a', 'a', 'c'], ['a', 'a', 'a', 'd'], ['a', 'a', 'b', 'b'], ['a', 'a', 'b', 'r'], ['a', 'a', 'b', 'c'], ['a', 'a', 'b', 'd'], ['a', 'a', 'r', 'r'], ['a', 'a', 'r', 'c'], ['a', 'a', 'r', 'd'], ['a', 'a', 'c', 'd'], ['a', 'b', 'b', 'r'], ['a', 'b', 'b', 'c'], ['a', 'b', 'b', 'd'], ['a', 'b', 'r', 'r'], ['a', 'b', 'r', 'c'], ['a', 'b', 'r', 'd'], ['a', 'b', 'c', 'd'], ['a', 'r', 'r', 'c'], ['a', 'r', 'r', 'd'], ['a', 'r', 'c', 'd'], ['b', 'b', 'r', 'r'], ['b', 'b', 'r', 'c'], ['b', 'b', 'r', 'd'], ['b', 'b', 'c', 'd'], ['b', 'r', 'r', 'c'], ['b', 'r', 'r', 'd'], ['b', 'r', 'c', 'd'], ['r', 'r', 'c', 'd']]
31

Find combinations with arrays and a combination pattern

I have arrays such as these, and each pattern designates a combination shape with each number representing the size of the combination.
pattern 0: [1, 1, 1, 1]
pattern 1: [2, 1, 1]
pattern 2: [3, 1]
pattern 3: [4]
...
I also have a char-valued list like below. len(chars) equals the sum of the upper array's value.
chars = ['A', 'B', 'C', 'D']
I want to find all combinations of chars following a given pattern. For example, for pattern 1, 4C2 * 2C1 * 1C1 is the number of combinations.
[['A', 'B'], ['C'], ['D']]
[['A', 'B'], ['D'], ['C']]
[['A', 'C'], ['B'], ['D']]
[['A', 'C'], ['D'], ['B']]
[['A', 'D'], ['B'], ['C']]
[['A', 'D'], ['C'], ['B']]
...
But I don't know how to create such combination arrays. Of course I know there are a lot of useful functions for combinations in python. But I don't know how to use them to create a combination array of combinations.
EDITED
I'm so sorry my explanation is confusing. I show a simple example.
pattern 0: [1, 1]
pattern 1: [2]
chars = ['A', 'B']
Then, the result should be like below. So first dimension should be permutation, but second dimension should be combination.
pat0: [['A'], ['B']]
pat0: [['B'], ['A']]
pat1: [['A', 'B']] # NOTE: [['B', 'A']] is same in my problem
You can use recursive function that takes the first number in pattern and generates all the combinations of that length from remaining items. Then recurse with remaining pattern & items and generated prefix. Once you have consumed all the numbers in pattern just yield the prefix all the way to caller:
from itertools import combinations
pattern = [2, 1, 1]
chars = ['A', 'B', 'C', 'D']
def patterns(shape, items, prefix=None):
if not shape:
yield prefix
return
prefix = prefix or []
for comb in combinations(items, shape[0]):
child_items = items[:]
for char in comb:
child_items.remove(char)
yield from patterns(shape[1:], child_items, prefix + [comb])
for pat in patterns(pattern, chars):
print(pat)
Output:
[('A', 'B'), ('C',), ('D',)]
[('A', 'B'), ('D',), ('C',)]
[('A', 'C'), ('B',), ('D',)]
[('A', 'C'), ('D',), ('B',)]
[('A', 'D'), ('B',), ('C',)]
[('A', 'D'), ('C',), ('B',)]
[('B', 'C'), ('A',), ('D',)]
[('B', 'C'), ('D',), ('A',)]
[('B', 'D'), ('A',), ('C',)]
[('B', 'D'), ('C',), ('A',)]
[('C', 'D'), ('A',), ('B',)]
[('C', 'D'), ('B',), ('A',)]
Note that above works only with Python 3 since it's using yield from.

How to turn multiple lists into a list of sublists where each sublist is made up of the same index items across all lists?

How to turn multiple lists into one list of sublists, where each sublist is made up of the items at the same index across the original lists?
lsta = ['a','b','c','d']
lstb = ['a','b','c','d']
lstc = ['a','b','c','d']
Desired_List = [['a','a','a'],['b','b','b'],['c','c','c'],['d','d','d']]
I can't seem to use zip here, so how would I do this?
List of list will give like this:
>>> [list(x) for x in zip(lsta, lstb, lstc)]
[['a', 'a', 'a'], ['b', 'b', 'b'], ['c', 'c', 'c'], ['d', 'd', 'd']]
>>>
Using zip, under duress:
>>> zip(lsta, lstb, lstc)
[('a', 'a', 'a'), ('b', 'b', 'b'), ('c', 'c', 'c'), ('d', 'd', 'd')]
If Python 3, you'll need to convert the zip to a list:
>>> list(zip(lsta, lstb, lstc))
[('a', 'a', 'a'), ('b', 'b', 'b'), ('c', 'c', 'c'), ('d', 'd', 'd')]

Given an iterable, how to apply a function in every possible combination?

Given the iterable [A, B, C] and the function f(x) I want to get the following:
[ A, B, C]
[ A, B, f(C)]
[ A, f(B), C]
[ A, f(B), f(C)]
[f(A), B, C]
[f(A), B, f(C)]
[f(A), f(B), C]
[f(A), f(B), f(C)]
Unfortunately I didn't find anything suitable in the itertools module.
>>> from itertools import product
>>> L = ["A", "B", "C"]
>>> def f(c): return c.lower()
...
>>> fL = [f(x) for x in L]
>>> for i in product(*zip(L, fL)):
... print i
...
('A', 'B', 'C')
('A', 'B', 'c')
('A', 'b', 'C')
('A', 'b', 'c')
('a', 'B', 'C')
('a', 'B', 'c')
('a', 'b', 'C')
('a', 'b', 'c')
Explanation:
Call f for each item in L to generate fL
>>> fL
['a', 'b', 'c']
Use zip to zip the two lists into pairs
>>> zip(L, fL)
[('A', 'a'), ('B', 'b'), ('C', 'c')]
Take the cartesian product of those tuples using itertools.product
product(*zip(L, fL))
is equivalent to
product(*[('A', 'a'), ('B', 'b'), ('C', 'c')])
and that is equivalent to
product(('A', 'a'), ('B', 'b'), ('C', 'c'))
looping over that product, gives exactly the result we need.
You can use itertools.combinations, like this
def f(char):
return char.lower()
iterable = ["A", "B", "C"]
indices = range(len(iterable))
from itertools import combinations
for i in range(len(iterable) + 1):
for items in combinations(indices, i):
print [f(iterable[j]) if j in items else iterable[j] for j in range(len(iterable))]
Output
['A', 'B', 'C']
['a', 'B', 'C']
['A', 'b', 'C']
['A', 'B', 'c']
['a', 'b', 'C']
['a', 'B', 'c']
['A', 'b', 'c']
['a', 'b', 'c']
import itertools
def func_combinations(f, l):
return itertools.product(*zip(l, map(f, l)))
Demo:
>>> for combo in func_combinations(str, range(3)):
... print combo
...
(0, 1, 2)
(0, 1, '2')
(0, '1', 2)
(0, '1', '2')
('0', 1, 2)
('0', 1, '2')
('0', '1', 2)
('0', '1', '2')
This function first computes f once for every element of the input. Then, it uses zip to turn the input and the list of f values into a list of input-output pairs. Finally, it uses itertools.product to produce each possible way to select either input or output.

Categories

Resources