Related
I have a list of numbers and I need all combinations without any member repeating its position in the list.
Example: If I have {1, 2, 3} then {3, 2, 1} is unacceptable because "2" is in the same position. The only good results are {3, 1, 2} and {2, 3, 1}. I need this on a larger scale, so far I have this:
import itertools
x = [1, 2, 3, 4, 5, 6, 7]
y = 5
comb = []
comb.extend(itertools.combinations(x,y))
print(comb)
This gives me all the combinations there are, but I need to eliminate those that have the same member at the same position, preferably not just when printing but in the comb list.
Not sure if I understood ... but maybe this is what you want ...
from collections import deque
from itertools import islice
def function(x, y):
Q = deque(x)
states = set()
for _ in range(len(x)):
states.add(tuple(islice(Q, 0, y)))
Q.appendleft(Q.pop())
return states
x = [1, 2, 3, 4, 5, 6, 7]
y = 5
resp = function(x, y)
print(resp)
>>> {(5, 6, 7, 1, 2), (1, 2, 3, 4, 5), (7, 1, 2, 3, 4), (2, 3, 4, 5, 6), (4, 5, 6, 7, 1), (6, 7, 1, 2, 3), (3, 4, 5, 6, 7)}
From the expected output you've written, it seems you are looking for permutations rather than combinations
import itertools
import numpy as np
x = [1, 2, 3]
y = 3
no_repeat_permutations = []
for candidate_perm in itertools.permutations(x,y):
for p in map(np.array, no_repeat_permutations):
if any(np.array(candidate_perm) == p):
break
else:
no_repeat_permutations.append(candidate_perm)
print(no_repeat_permutations)
>>> [(1, 2, 3), (2, 3, 1), (3, 1, 2)]
I am using numpy for element wise comparison, if any element-wise comparison in past results is True, we skip this permutations.
In a case we don't find such comparison we enter the else statement and save the permutation.
For further explanation about for-else see this SO question
given a list and exclusions elements, is it possible to ignore calculation of combinations that contains these elements ?
Example 1
Given l = [1, 2, 3, 4, 5], I want to calculate all combinations of size 4 and excluding combinations that contains (1, 3) before even calculated.
The results would be :
All results: Wanted results:
[1, 2, 3, 4] [1, 2, 4, 5]
[1, 2, 3, 5] [2, 3, 4, 5]
[1, 2, 4, 5]
[1, 3, 4, 5]
[2, 3, 4, 5]
All combinations that contained 1 and 3 have been removed.
Example 2
suggested by #Eric Duminil
the result for l = [1, 2, 3, 4, 5, 6], size 4 and
excluding (1, 2, 3) in second column
excluding (1, 2) in third column
All results: Wanted results 1 Wanted results 2
(Excluding [1, 2, 3]): (Excluding [1, 2])
[1, 2, 3, 4] [1, 2, 4, 5] [1, 3, 4, 5]
[1, 2, 3, 5] [1, 2, 4, 6] [1, 3, 4, 6]
[1, 2, 3, 6] [1, 2, 5, 6] [1, 3, 5, 6]
[1, 2, 4, 5] [1, 3, 4, 5] [1, 4, 5, 6]
[1, 2, 4, 6] [1, 3, 4, 6] [2, 3, 4, 5]
[1, 2, 5, 6] [1, 3, 5, 6] [2, 3, 4, 6]
[1, 3, 4, 5] [1, 4, 5, 6] [2, 3, 5, 6]
[1, 3, 4, 6] [2, 3, 4, 5] [2, 4, 5, 6]
[1, 3, 5, 6] [2, 3, 4, 6] [3, 4, 5, 6]
[1, 4, 5, 6] [2, 3, 5, 6]
[2, 3, 4, 5] [2, 4, 5, 6]
[2, 3, 4, 6] [3, 4, 5, 6]
[2, 3, 5, 6]
[2, 4, 5, 6]
[3, 4, 5, 6]
All combinations that contained 1 and 2 and 3 have been removed from wanted results 1
All combinations that contained 1 and 2 have been removed from wanted results 2
I have a much bigger combinations to compute but it takes a lot of time and I want to reduce this time using these exclusions.
Tried solutions
With method 1, the combinations are still calculated
With method 2, I tried to modify the combinations function but I could not find a proper way to ignore my exclusion list before calculated.
Method 1 | Method 2
|
def main(): | def combinations(iterable, r):
l = list(range(1, 6)) | pool = tuple(iterable)
comb = combinations(l, 4) | n = len(pool)
| if r > n:
for i in comb: | return
if set([1, 3]).issubset(i): | indices = list(range(r))
continue | yield tuple(pool[i] for i in indices)
else | while True:
process() | for i in reversed(range(r)):
| if indices[i] != i + n - r:
| break
| else:
| return
| indices[i] += 1
| for j in range(i+1, r):
| indices[j] = indices[j-1] + 1
| yield tuple(pool[i] for i in indices)
EDIT:
First of all, thank you all for your help, I forgot to give more details about constraints.
The order of the ouputs is not relevant, from example, if result is [1, 2, 4, 5] [2, 3, 4, 5] or [2, 3, 4, 5] [1, 2, 4, 5], it is not important.
The elements of the combinations should be (if possible) sorted, [1, 2, 4, 5] [2, 3, 4, 5] and not [2, 1, 5, 4] [3, 2, 4, 5] but it is not important since the combinations could be sorted after.
The exclusions list is a list of all items that should not appear in the combinations together. e.g If my exclusion list is (1, 2, 3), all combinations that contains 1 and 2 and 3 should not be calculated. However, combinations with 1 and 2 and not 3 are allowed. In that case, if I exclude combinations that contains (1, 2) and (1, 2, 3) it is completely useless since all combinations that will be filtered by (1, 2, 3) are already filtered by (1, 2)
Multiple exclude lists must be possible because I use multiple constraints on my combinations.
Tested answers
#tobias_k
This solution considers the exclusion list (1, 2, 3) as OR exclusion meaning (1, 2), (2, 3) and (1, 3) will be excluded if I understood well, this is useful in a case but not in my current problem, I modified the question to give more details, sorry for confusion. In your answer, I can't use only lists (1, 2) and (1, 3) as exclusion as you specified it. However the big advantage of this solution is to permit multiple exclusions.
#Kasramvd and #mikuszefski
Your solution is really close to what I want, if it does include multiple exclusion lists, it would be the answer.
Thanks
(Turns out this does not do exactly what OP wants. Still leaving this here as it might help others.)
To include mutually exclusive elements, you could wrap those in lists within the list, get the combinations of those, and then the product of the combinations of sub-lists:
>>> from itertools import combinations, product
>>> l = [[1, 3], [2], [4], [5]]
>>> [c for c in combinations(l, 4)]
[([1, 3], [2], [4], [5])]
>>> [p for c in combinations(l, 4) for p in product(*c)]
[(1, 2, 4, 5), (3, 2, 4, 5)]
A more complex example:
>>> l = [[1, 3], [2, 4, 5], [6], [7]]
>>> [c for c in combinations(l, 3)]
[([1, 3], [2, 4, 5], [6]),
([1, 3], [2, 4, 5], [7]),
([1, 3], [6], [7]),
([2, 4, 5], [6], [7])]
>>> [p for c in combinations(l, 3) for p in product(*c)]
[(1, 2, 6),
(1, 4, 6),
... 13 more ...
(4, 6, 7),
(5, 6, 7)]
This does not generate any "junk" combinations to be filtered out afterwards. However, it assumes that you want at most one element from each "exclusive" group, e.g. in the second example, it not only prevents combinations with 2,4,5, but also those with 2,4, 4,5, or 2,5. Also, it is not possible (or at least not easy) to have exclusively one of 1,3, and 1,5, but allow for 3,5. (It might be possible to extend it to those cases, but I'm not yet sure if and how.)
You can wrap this in a function, deriving the slightly different input format from your (presumed) format and returning an accordant generator expression. Here, lst is the list of elements, r the number of items per combinations, and exclude_groups a list of groups of mutually-exclusive elements:
from itertools import combinations, product
def comb_with_excludes(lst, r, exclude_groups):
ex_set = {e for es in exclude_groups for e in es}
tmp = exclude_groups + [[x] for x in lst if x not in ex_set]
return (p for c in combinations(tmp, r) for p in product(*c))
lst = [1, 2, 3, 4, 5, 6, 7]
excludes = [[1, 3], [2, 4, 5]]
for x in comb_with_excludes(lst, 3, excludes):
print(x)
(As it turned out that my previous answer does not really satisfy the constraints of the question, here's another one. I'm posting this as a separate answer, as the approach is vastly different and the original answer may still help others.)
You can implement this recursively, each time before recursing to add another element to the combinations checking whether that would violate one of the exclude-sets. This does not generate and invalid combinations, and it works with overlapping exclude-sets (like (1,3), (1,5)), and exclude-sets with more than two elements (like (2,4,5), allowing any combinations except all of them together).
def comb_with_excludes(lst, n, excludes, i=0, taken=()):
if n == 0:
yield taken # no more needed
elif i <= len(lst) - n:
t2 = taken + (lst[i],) # add current element
if not any(e.issubset(t2) for e in excludes):
yield from comb_with_excludes(lst, n-1, excludes, i+1, t2)
if i < len(lst) - n: # skip current element
yield from comb_with_excludes(lst, n, excludes, i+1, taken)
Example:
>>> lst = [1, 2, 3, 4, 5, 6]
>>> excludes = [{1, 3}, {1, 5}, {2, 4, 5}]
>>> list(comb_with_excludes(lst, 4, excludes))
[[1, 2, 4, 6], [2, 3, 4, 6], [2, 3, 5, 6], [3, 4, 5, 6]]
Well, I did time it now, and it turns out this is considerably slower than naively using itertools.combination in a generator expression with a filter, much like you already do:
def comb_naive(lst, r, excludes):
return (comb for comb in itertools.combinations(lst, r)
if not any(e.issubset(comb) for e in excludes))
Calculating the combinations in Python is just slower than using the library (which is probably implemented in C) and filtering the results afterwards. Depending on the amount of combinations that can be excluded, this might be faster in some cases, but to be honest, I have my doubts.
You could get better results if you can use itertools.combinations for subproblems, as in Kasramvd's answer, but for multiple, non-disjunct exclude sets that's more difficult. One way might be to separate the elements in the list into two sets: Those that have constraints, and those that don't. Then, use itertoolc.combinations for both, but check the constraints only for the combinations of those elements where they matter. You still have to check and filter the results, but only a part of them. (One caveat, though: The results are not generated in order, and the order of the elements within the yielded combinations is somewhat messed up, too.)
def comb_with_excludes2(lst, n, excludes):
wout_const = [x for x in lst if not any(x in e for e in excludes)]
with_const = [x for x in lst if any(x in e for e in excludes)]
k_min, k_max = max(0, n - len(wout_const)), min(n, len(with_const))
return (c1 + c2 for k in range(k_min, k_max)
for c1 in itertools.combinations(with_const, k)
if not any(e.issubset(c1) for e in excludes)
for c2 in itertools.combinations(wout_const, n - k))
This is already much better then the recursive pure-Python solution, but still not as good as the "naive" approach for the above example:
>>> lst = [1, 2, 3, 4, 5, 6]
>>> excludes = [{1, 3}, {1, 5}, {2, 4, 5}]
>>> %timeit list(comb_with_excludes(lst, 4, excludes))
10000 loops, best of 3: 42.3 µs per loop
>>> %timeit list(comb_with_excludes2(lst, 4, excludes))
10000 loops, best of 3: 22.6 µs per loop
>>> %timeit list(comb_naive(lst, 4, excludes))
10000 loops, best of 3: 16.4 µs per loop
However, the results depend very much on the input. For a larger list, with constraints only applying to a few of those elements, this approach is in fact faster then the naive one:
>>> lst = list(range(20))
>>> %timeit list(comb_with_excludes(lst, 4, excludes))
10 loops, best of 3: 15.1 ms per loop
>>> %timeit list(comb_with_excludes2(lst, 4, excludes))
1000 loops, best of 3: 558 µs per loop
>>> %timeit list(comb_naive(lst, 4, excludes))
100 loops, best of 3: 5.9 ms per loop
From an algorithmic perspective you can separate the excluded and the reset of the valid items and calculate the combinations of each set separately and just concatenate the result based on the desire length. This approach will entirely refuse of including all the excluded items at once in combination but will omit the actual order.
from itertools import combinations
def comb_with_exclude(iterable, comb_num, excludes):
iterable = tuple(iterable)
ex_len = len(excludes)
n = len(iterable)
if comb_num < ex_len or comb_num > n:
yield from combinations(iterable, comb_num)
else:
rest = [i for i in iterable if not i in excludes]
ex_comb_rang = range(0, ex_len)
rest_comb_range = range(comb_num, comb_num - ex_len, -1)
# sum of these pairs is equal to the comb_num
pairs = zip(ex_comb_rang, rest_comb_range)
for i, j in pairs:
for p in combinations(excludes, i):
for k in combinations(rest, j):
yield k + p
"""
Note that instead of those nested loops you could wrap the combinations within a product function like following:
for p, k in product(combinations(excludes, i), combinations(rest, j)):
yield k + p
"""
Demo:
l = [1, 2, 3, 4, 5, 6, 7, 8]
ex = [2, 5, 6]
print(list(comb_with_exclude(l, 6, ex)))
[(1, 3, 4, 7, 8, 2), (1, 3, 4, 7, 8, 5), (1, 3, 4, 7, 8, 6), (1, 3, 4, 7, 2, 5), (1, 3, 4, 8, 2, 5), (1, 3, 7, 8, 2, 5), (1, 4, 7, 8, 2, 5), (3, 4, 7, 8, 2, 5), (1, 3, 4, 7, 2, 6), (1, 3, 4, 8, 2, 6), (1, 3, 7, 8, 2, 6), (1, 4, 7, 8, 2, 6), (3, 4, 7, 8, 2, 6), (1, 3, 4, 7, 5, 6), (1, 3, 4, 8, 5, 6), (1, 3, 7, 8, 5, 6), (1, 4, 7, 8, 5, 6), (3, 4, 7, 8, 5, 6)]
l = [1, 2, 3, 4, 5]
ex = [1, 3]
print(list(comb_with_exclude(l, 4, ex)))
[(2, 4, 5, 1), (2, 4, 5, 3)]
Benckmark with other answers:
Results: this approach is faster that the others
# this answer
In [169]: %timeit list(comb_with_exclude(lst, 3, excludes[0]))
100000 loops, best of 3: 6.47 µs per loop
# tobias_k
In [158]: %timeit list(comb_with_excludes(lst, 3, excludes))
100000 loops, best of 3: 13.1 µs per loop
# Vikas Damodar
In [166]: %timeit list(combinations_exc(lst, 3))
10000 loops, best of 3: 148 µs per loop
# mikuszefski
In [168]: %timeit list(sub_without(lst, 3, excludes[0]))
100000 loops, best of 3: 12.52 µs per loop
I have made an attempt to edit combinations according to your requirements :
def combinations(iterable, r):
# combinations('ABCD', 2) --> AB AC AD BC BD CD
# combinations(range(4), 3) --> 012 013 023 123
pool = tuple(iterable)
n = len(pool)
if r > n:
return
indices = list(range(r))
# yield tuple(pool[i] for i in indices)
while True:
for i in reversed(range(r)):
if indices[i] != i + n - r:
break
else:
return
indices[i] += 1
for j in range(i+1, r):
indices[j] = indices[j-1] + 1
# print(tuple(pool[i] for i in indices ), "hai")
if 1 in tuple(pool[i] for i in indices ) and 3 in tuple(pool[i] for i in indices ):
pass
else:
yield tuple(pool[i] for i in indices)
d = combinations(list(range(1, 6)),4)
for i in d:
print(i)
It will return something like this :
(1, 2, 4, 5)
(2, 3, 4, 5)
I did the exclusion during the combination using the following code to save the second loop time. you just need to pass the indices of the excluded elements as a set.
update: working fiddle
from itertools import permutations
def combinations(iterable, r, combIndeciesExclusions=set()):
pool = tuple(iterable)
n = len(pool)
for indices in permutations(range(n), r):
if ( len(combIndeciesExclusions)==0 or not combIndeciesExclusions.issubset(indices)) and sorted(indices) == list(indices):
yield tuple(pool[i] for i in indices)
l = list(range(1, 6))
comb = combinations(l, 4, set([0,2]))
print list(comb)
I guess my answer is similar to some others here, but this was what I fiddled together in parallel
from itertools import combinations, product
"""
with help from
https://stackoverflow.com/questions/374626/how-can-i-find-all-the-subsets-of-a-set-with-exactly-n-elements
https://stackoverflow.com/questions/32438350/python-merging-two-lists-with-all-possible-permutations
https://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python
"""
def sub_without( S, m, forbidden ):
out = []
allowed = [ s for s in S if s not in forbidden ]
N = len( allowed )
for k in range( len( forbidden ) ):
addon = [ list( x ) for x in combinations( forbidden, k) ]
if N + k >= m:
base = [ list( x ) for x in combinations( allowed, m - k ) ]
leveltotal = [ [ item for sublist in x for item in sublist ] for x in product( base, addon ) ]
out += leveltotal
return out
val = sub_without( range(6), 4, [ 1, 3, 5 ] )
for x in val:
print sorted(x)
>>
[0, 1, 2, 4]
[0, 2, 3, 4]
[0, 2, 4, 5]
[0, 1, 2, 3]
[0, 1, 2, 5]
[0, 2, 3, 5]
[0, 1, 3, 4]
[0, 1, 4, 5]
[0, 3, 4, 5]
[1, 2, 3, 4]
[1, 2, 4, 5]
[2, 3, 4, 5]
Algorithmically you have to calculate the combination of the items within your list that are not among the excluded ones then add the respective combinations of the excluded items to the combination of the rest of items. This approach of course requires a lot of checking and need to keep track of the indexes which even if you do it in python it won't give you a notable difference in performance (known as drawbacks of Constraint satisfaction problem). (rather than just calculating them using combination and filtering the unwanted items out).
Therefor, I think this is the best way to go in most of the cases:
In [77]: from itertools import combinations, filterfalse
In [78]: list(filterfalse({1, 3}.issubset, combinations(l, 4)))
Out[78]: [(1, 2, 4, 5), (2, 3, 4, 5)]
I used this way to make all combinations:
import itertools
lst = [[1, 2, 3], [1,2,2,4]]
combs = []
for i in xrange(1, len(lst)+1):
combs.append(i)
els = [list(x) for x in itertools.combinations(lst, i)]
combs.append(els)
But what I want is that each list includes every possible combinations of elements inside. With the solution above, each element pairs are scattered. How can I solve it?
Are you looking for something like this?
import itertools
lst = [1, 2, 3], [1, 2, 2, 4]
combs = []
for i in range(len(lst)):
els = [list(x) for x in itertools.combinations(lst[i], 2)]
combs.append(els)
print(combs)
Output
[[[1, 2], [1, 3], [2, 3]], [[1, 2], [1, 2], [1, 4], [2, 2], [2, 4], [2, 4]]]
Since you said, you want - each element in its own list make pairs and put back in that list, so i am assuming that [1, 2, 3] should be converted to [[1, 2], [1, 3], [2, 3]]. The above example does the same thing!
Update
If you want to generate all possible combinations (of length greater than 1) from those lists, then you can do something like this.
import itertools
lst = [1, 2, 3], [1, 2, 2, 4]
combs = []
for i in range(len(lst)):
temp_list = []
for j in range(len(lst[i])+1):
if j < 2: # skipping zero and one length combinations
continue
els = [x for x in itertools.combinations(lst[i], j)]
temp_list.extend(els)
# remove duplicate combinations
new_list = []
[new_list.append(i) for i in temp_list if not new_list.count(i)]
combs.append(new_list)
print(combs)
Output
[[(1, 2), (1, 3), (2, 3), (1, 2, 3)], [(1, 2), (1, 4), (2, 2), (2, 4),
(1, 2, 2), (1, 2, 4), (2, 2, 4), (1, 2, 2, 4)]]
I'm trying to find a performant solution in Python that works like so:
>>> func([1,2,3], [1,2])
[(1,1), (1,2), (1,3), (2,2), (2,3)]
This is similar to itertools.combinations_with_replacement, except that it can take multiple iterables. It's also similar to itertools.product, except that it omits order-independent duplicate results.
All of the inputs will be prefixes of the same series (i.e. they all start with the same element and follow the same pattern, but might have different lengths).
The function must be able to take any number of iterables as input.
Given a set of lists A, B, C, ..., here is a sketch of an algorithm that generates those results.
assert len(A) <= len(B) <= len(C) <= ...
for i in 0..len(A)
for j in i..len(B)
for k in j..len(C)
.
.
.
yield A[i], B[j], C[k], ...
Things I can't do
Use itertools.product and filter the results. This has to be performant.
Use recursion. The function overhead would make it slower than using itertools.product and filtering for a reasonable number of iterables.
I suspect there's a way to do this with itertools, but I have no idea what it is.
EDIT: I'm looking for the solution that takes the least time.
EDIT 2: There seems to be some confusion about what I'm trying to optimize. I'll illustrate with an example.
>>> len(list(itertools.product( *[range(8)] * 5 )))
32768
>>> len(list(itertools.combinations_with_replacement(range(8), 5)))
792
The first line gives the number of order-dependent possibilities for rolling 5 8-sided dice. The second gives the number of order-independent possibilities. Regardless of how performant itertools.product is, it'll take 2 orders of magnitude more iterations to get a result than itertools.combinations_with_replacement. I'm trying to find a way to do something similar to itertools.combinations_with_replacement, but with multiple iterables that minimizes the number of iterations, or time performance. (product runs in whereas combinations_with_replacement runs in , where M is the number of sides on the die and N is the number of dice)
This solution hasn't recursion or filtering. It's trying to produce only ascending sequences of indices so it's usable only for prefixes of same collection. Also it's uses only indices for element identification so it's not enforces elements of series to be comparable or even hashable.
def prefixCombinations(coll,prefixes):
"produces combinations of elements of the same collection prefixes"
prefixes = sorted(prefixes) # does not impact result through it's unordered combinations
n = len(prefixes)
indices = [0]*n
while True:
yield tuple(coll[indices[i]] for i in range(n))
#searching backwards for non-maximum index
for i in range(n-1,-1,-1):
if indices[i] < prefixes[i] - 1 : break
# if all indices hits maximum - leave
else: break
level = indices[i] + 1
for i in range(i,n): indices[i] = level
examples are
>>> list(prefixCombinations([1,2,3,4,5], (3,2)))
[[1, 1], [1, 2], [1, 3], [2, 2], [2, 3]]
>>> list(prefixCombinations([1,2,3,4,5], (3,2,5)))
[[1, 1, 1], [1, 1, 2], [1, 1, 3], [1, 1, 4], [1, 1, 5], [1, 2, 2], [1, 2, 3], [1, 2, 4], [1, 2, 5], [1, 3, 3], [1, 3, 4], [1, 3, 5], [2, 2, 2], [2, 2, 3], [2, 2, 4], [2, 2, 5], [2, 3, 3], [2, 3, 4], [2, 3, 5]]
>>> from itertools import combinations_with_replacement
>>> tuple(prefixCombinations(range(10),[10]*4)) == tuple(combinations_with_replacement(range(10),4))
True
Since this is a generator it doesn't effectively change the performance (just wraps O(n) around itertools.product):
import itertools
def product(*args):
for a, b in itertools.product(*args):
if a >= b:
yield b, a
print list(product([1,2,3], [1,2]))
Output:
[(1, 1), (1, 2), (2, 2), (1, 3), (2, 3)]
Or even:
product = lambda a, b: ((y, x) for x in a for y in b if x >= y)
Here an implementation.
The idea is to use sorted containers to impose canonical order and avoid duplicates this way. So I'm not generating duplicates at one step and avoid need of filtering later.
It relies on "sortedcontainers" library that provides fast (as fast as C implementation) sorted containers. [I'm not affiliated to this library in any manner]
from sortedcontainers import SortedList as SList
#see at http://www.grantjenks.com/docs/sortedcontainers/
def order_independant_combination(*args):
filtered = 0
previous= set()
current = set()
for iterable in args:
if not previous:
for elem in iterable:
current.add(tuple([elem]))
else:
for elem in iterable:
for combination in previous:
newCombination = SList(combination)
newCombination.add(elem)
newCombination = tuple(newCombination)
if not newCombination in current:
current.add(newCombination)
else:
filtered += 1
previous = current
current = set()
if filtered != 0:
print("{0} duplicates have been filtered during geneeration process".format(filtered))
return list(SList(previous))
if __name__ == "__main__":
result = order_independant_combination(*[range(8)] * 5)
print("Generated a result of length {0} that is {1}".format(len(result), result))
Execution give:
[(1, 1), (1, 2), (1, 3), (2, 2), (2, 3)]
You can test adding more iterables as parameters, it works.
Hope it can at least helps you if not solve your problem.
Vaisse Arthur.
EDIT : to answer the comment. This is not a good analysis. Filtering duplicates during generation is far most effectives than using itertools.product and then filters duplicates result. In fact, eliminating duplicates result at one step avoid to generate duplicates solution in all the following steps.
Executing this:
if __name__ == "__main__":
result = order_independant_combination([1,2,3],[1,2],[1,2],[1,2])
print("Generated a result of length {0} that is {1}".format(len(result), result))
I got the following result :
9 duplicates have been filtered during geneeration process
Generated a result of length 9 that is [(1, 1, 1, 1), (1, 1, 1, 2), (1, 1, 1, 3), (1, 1, 2, 2), (1, 1, 2, 3), (1, 2, 2, 2), (1, 2, 2, 3), (2, 2, 2, 2), (2, 2, 2, 3)]
While using itertools I got this :
>>> import itertools
>>> c = list(itertools.product([1,2,3],[1,2],[1,2],[1,2]))
>>> c
[(1, 1, 1, 1), (1, 1, 1, 2), (1, 1, 2, 1), (1, 1, 2, 2), (1, 2, 1, 1), (1, 2, 1, 2), (1, 2, 2, 1), (1, 2, 2, 2), (2, 1, 1, 1), (2, 1, 1, 2), (2, 1, 2, 1), (2, 1, 2, 2), (2, 2, 1, 1), (2, 2, 1, 2), (2, 2, 2, 1), (2, 2, 2, 2), (3, 1, 1, 1), (3, 1, 1, 2), (3, 1, 2, 1), (3, 1, 2, 2), (3, 2, 1, 1), (3, 2, 1, 2), (3, 2, 2, 1), (3, 2, 2, 2)]
>>> len(c)
24
Simple calcul give this:
pruned generation : 9 result + 9 element filtered -> 18 element generated.
itertools : 24 element generated.
And the more element you give it, the more they are long, the more the difference will be important.
Example :
result = order_independant_combination([1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5])
print("Generated a result of length {0} that is {1}".format(len(result), result))
Result :
155 duplicates have been filtered during geneeration process
Generated a result of length 70 ...
Itertools :
>>> len(list(itertools.product([1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5])))
625
Difference of 400 elements.
EDIT 2 : with *range(8) * 5 it gives 2674 duplicates have been filtered during geneeration process. Generated a result of length 792...
I have a list similar to:
[1 2 1 4 5 2 3 2 4 5 3 1 4 2]
I want to create a list of x random elements from this list where none of the chosen elements are the same. The difficult part is that I would like to do this by using list comprehension...
So possible results if x = 3 would be:
[1 2 3]
[2 4 5]
[3 1 4]
[4 5 1]
etc...
Thanks!
I should have specified that I cannot convert the list to a set. Sorry!
I need the randomly selected numbers to be weighted. So if 1 appears 4 times in the list and 3 appears 2 times in the list, then 1 is twice as likely to be selected...
Disclaimer: the "use a list comprehension" requirement is absurd.
Moreover, if you want to use the weights, there are many excellent approaches listed at Eli Bendersky's page on weighted random sampling.
The following is inefficient, doesn't scale, etc., etc.
That said, it has not one but two (TWO!) list comprehensions, returns a list, never duplicates elements, and respects the weights in a sense:
>>> s = [1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
>>> [x for x in random.choice([p for c in itertools.combinations(s, 3) for p in itertools.permutations(c) if len(set(c)) == 3])]
[3, 1, 2]
>>> [x for x in random.choice([p for c in itertools.combinations(s, 3) for p in itertools.permutations(c) if len(set(c)) == 3])]
[5, 3, 4]
>>> [x for x in random.choice([p for c in itertools.combinations(s, 3) for p in itertools.permutations(c) if len(set(c)) == 3])]
[1, 5, 2]
.. or, as simplified by FMc:
>>> [x for x in random.choice([p for p in itertools.permutations(s, 3) if len(set(p)) == 3])]
[3, 5, 2]
(I'll leave the x for x in there, even though it hurts not to simply write list(random.choice(..)) or just leave it as a tuple..)
Generally, you don't want to do this sort of thing in a list comprehension -- It'll lead to much harder to read code. However, if you really must, we can write a completely horrible 1 liner:
>>> values = [random.randint(0,10) for _ in xrange(12)]
>>> values
[1, 10, 6, 6, 3, 9, 0, 1, 8, 9, 1, 2]
>>> # This is the 1 liner -- The other line was just getting us a list to work with.
>>> [(lambda x=random.sample(values,3):any(values.remove(z) for z in x) or x)() for _ in xrange(4)]
[[6, 1, 8], [1, 6, 10], [1, 0, 2], [9, 3, 9]]
Please never use this code -- I only post it for fun/academic reasons.
Here's how it works:
I create a function inside the list comprehension with a default argument of 3 randomly selected elements from the input list. Inside the function, I remove the elements from values so that they aren't available to be picked again. since list.remove returns None, I can use any(lst.remove(x) for x in ...) to remove the values and return False. Since any returns False, we hit the or clause which just returns x (the default value with 3 randomly selected items) when the function is called. All that is left then is to call the function and let the magic happen.
The one catch here is that you need to make sure that the number of groups you request (here I chose 4) multiplied by the number of items per group (here I chose 3) is less than or equal to the number of values in your input list. It may seem obvious, but it's probably worth mentioning anyway...
Here's another version where I pull shuffle into the list comprehension:
>>> lst = [random.randint(0,10) for _ in xrange(12)]
>>> lst
[3, 5, 10, 9, 10, 1, 6, 10, 4, 3, 6, 5]
>>> [lst[i*3:i*3+3] for i in xrange(shuffle(lst) or 4)]
[[6, 10, 6], [3, 4, 10], [1, 3, 5], [9, 10, 5]]
This is significantly better than my first attempt, however, most people would still need to stop, scratch their head a bit before they figured out what this code was doing. I still assert that it would be much better to do this in multiple lines.
If I'm understanding your question properly, this should work:
def weighted_sample(L, x):
# might consider raising some kind of exception of len(set(L)) < x
while True:
ans = random.sample(L, x)
if len(set(ans)) == x:
return ans
Then if you want many such samples you can just do something like:
[weighted_sample(L, x) for _ in range(num_samples)]
I have a hard time conceiving of a comprehension for the sampling logic that isn't just obfuscated. The logic is a bit too complicated. It sounds like something randomly tacked on to a homework assignment to me.
If you don't like infinite looping, I haven't tried it but I think this will work:
def weighted_sample(L, x):
ans = []
c = collections.Counter(L)
while len(ans) < x:
r = random.randint(0, sum(c.values())
for k in c:
if r < c[k]:
ans.append(k)
del c[k]
break
else:
r -= c[k]
else:
# maybe throw an exception since this should never happen on valid input
return ans
First of all, I hope your list might be like
[1,2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
so if you want to print the permutation from the given list as size 3, you can do as the following.
import itertools
l = [1,2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
for permutation in itertools.permutations(list(set(l)),3):
print permutation,
Output:
(1, 2, 3) (1, 2, 4) (1, 2, 5) (1, 3, 2) (1, 3, 4) (1, 3, 5) (1, 4, 2) (1, 4, 3) (1, 4, 5) (1, 5, 2) (1, 5, 3) (1, 5, 4) (2, 1, 3) (2, 1, 4) (2, 1, 5) (2, 3, 1) (2, 3, 4) (2, 3, 5) (2, 4, 1) (2, 4, 3) (2, 4, 5) (2, 5, 1) (2, 5, 3) (2, 5, 4) (3, 1, 2) (3, 1, 4) (3, 1, 5) (3, 2, 1) (3, 2, 4) (3, 2, 5) (3, 4, 1) (3, 4, 2) (3, 4, 5) (3, 5, 1) (3, 5, 2) (3, 5, 4) (4, 1, 2) (4, 1, 3) (4, 1, 5) (4, 2, 1) (4, 2, 3) (4, 2, 5) (4, 3, 1) (4, 3, 2) (4, 3, 5) (4, 5, 1) (4, 5, 2) (4, 5, 3) (5, 1, 2) (5, 1, 3) (5, 1, 4) (5, 2, 1) (5, 2, 3) (5, 2, 4) (5, 3, 1) (5, 3, 2) (5, 3, 4) (5, 4, 1) (5, 4, 2) (5, 4, 3)
Hope this helps. :)
>>> from random import shuffle
>>> L = [1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
>>> x=3
>>> shuffle(L)
>>> zip(*[L[i::x] for i in range(x)])
[(1, 3, 2), (2, 2, 1), (4, 5, 3), (1, 4, 4)]
You could also use a generator expression instead of the list comprehension
>>> zip(*(L[i::x] for i in range(x)))
[(1, 3, 2), (2, 2, 1), (4, 5, 3), (1, 4, 4)]
Starting with a way to do it without list compehensions:
import random
import itertools
alphabet = [1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
def alphas():
while True:
yield random.choice(alphabet)
def filter_unique(iter):
found = set()
for a in iter:
if a not in found:
found.add(a)
yield a
def dice(x):
while True:
yield itertools.islice(
filter_unique(alphas()),
x
)
for i, output in enumerate(dice(3)):
print list(output)
if i > 10:
break
The part, where list comprehensions have troubles is filter_unique() since list comprehension does not have 'memory' of what it did output. The possible solution would be to generate many outputs while the one of good quality is not found as #DSM suggested.
The slow, naive approach is:
import random
def pick_n_unique(l, n):
res = set()
while len(res) < n:
res.add(random.choice(l))
return list(res)
This will pick elements and only quit when it has n unique ones:
>>> pick_n_unique([1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2], 3)
[2, 3, 4]
>>> pick_n_unique([1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2], 3)
[3, 4, 5]
However it can get slow if, for example, you have a list with thirty 1s and one 2, since once it has a 1 it'll keep spinning until it finally hits a 2. The better is to count the number of occurrences of each unique element, choose a random one weighted by their occurrence count, remove that element from the count list, and repeat until you have the desired number of elements:
def weighted_choice(item__counts):
total_counts = sum(count for item, count in item__counts.items())
which_count = random.random() * total_counts
for item, count in item__counts.items():
which_count -= count
if which_count < 0:
return item
raise ValueError("Should never get here")
def pick_n_unique(items, n):
item__counts = collections.Counter(items)
if len(item__counts) < n:
raise ValueError(
"Can't pick %d values with only %d unique values" % (
n, len(item__counts))
res = []
for i in xrange(n):
choice = weighted_choice(item__counts)
res.append(choice)
del item__counts[choice]
return tuple(res)
Either way, this is a problem not well-suited to list comprehensions.
def sample(self, population, k):
n = len(population)
if not 0 <= k <= n:
raise ValueError("sample larger than population")
result = [None] * k
try:
selected = set()
selected_add = selected.add
for i in xrange(k):
j = int(random.random() * n)
while j in selected:
j = int(random.random() * n)
selected_add(j)
result[i] = population[j]
except (TypeError, KeyError): # handle (at least) sets
if isinstance(population, list):
raise
return self.sample(tuple(population), k)
return result
Above is a simplied version of the sample function Lib/random.py. I only removed some optimization code for small data sets. The codes tell us straightly how to implement a customized sample function:
get a random number
if the number have appeared before just abandon it and get a new one
repeat the above steps until you get all the sample numbers you want.
Then the real problem turns out to be how to get a random value from a list by weight.This could be by the original random.sample(population, 1) in the Python standard library (a little overkill here, but simple).
Below is an implementation, because duplicates represent weight in your given list, we can use int(random.random() * array_length) to get a random index of your array.
import random
arr = [1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
def sample_by_weight( population, k):
n = len(population)
if not 0 <= k <= len(set(population)):
raise ValueError("sample larger than population")
result = [None] * k
try:
selected = set()
selected_add = selected.add
for i in xrange(k):
j = population[int(random.random() * n)]
while j in selected:
j = population[int(random.random() * n)]
selected_add(j)
result[i] = j
except (TypeError, KeyError): # handle (at least) sets
if isinstance(population, list):
raise
return self.sample(tuple(population), k)
return result
[sample_by_weight(arr,3) for i in range(10)]
With the setup:
from random import shuffle
from collections import deque
l = [1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
This code:
def getSubLists(l,n):
shuffle(l) #shuffle l so the elements are in 'random' order
l = deque(l,len(l)) #create a structure with O(1) insert/pop at both ends
while l: #while there are still elements to choose
sample = set() #use a set O(1) to check for duplicates
while len(sample) < n and l: #until the sample is n long or l is exhausted
top = l.pop() #get the top value in l
if top in sample:
l.appendleft(top) #add it to the back of l for a later sample
else:
sample.add(top) #it isn't in sample already so use it
yield sample #yield the sample
You end up with:
for s in getSubLists(l,3):
print s
>>>
set([1, 2, 5])
set([1, 2, 3])
set([2, 4, 5])
set([2, 3, 4])
set([1, 4])