Remove combinations that contains some values before even calculated - python

given a list and exclusions elements, is it possible to ignore calculation of combinations that contains these elements ?
Example 1
Given l = [1, 2, 3, 4, 5], I want to calculate all combinations of size 4 and excluding combinations that contains (1, 3) before even calculated.
The results would be :
All results: Wanted results:
[1, 2, 3, 4] [1, 2, 4, 5]
[1, 2, 3, 5] [2, 3, 4, 5]
[1, 2, 4, 5]
[1, 3, 4, 5]
[2, 3, 4, 5]
All combinations that contained 1 and 3 have been removed.
Example 2
suggested by #Eric Duminil
the result for l = [1, 2, 3, 4, 5, 6], size 4 and
excluding (1, 2, 3) in second column
excluding (1, 2) in third column
All results: Wanted results 1 Wanted results 2
(Excluding [1, 2, 3]): (Excluding [1, 2])
[1, 2, 3, 4] [1, 2, 4, 5] [1, 3, 4, 5]
[1, 2, 3, 5] [1, 2, 4, 6] [1, 3, 4, 6]
[1, 2, 3, 6] [1, 2, 5, 6] [1, 3, 5, 6]
[1, 2, 4, 5] [1, 3, 4, 5] [1, 4, 5, 6]
[1, 2, 4, 6] [1, 3, 4, 6] [2, 3, 4, 5]
[1, 2, 5, 6] [1, 3, 5, 6] [2, 3, 4, 6]
[1, 3, 4, 5] [1, 4, 5, 6] [2, 3, 5, 6]
[1, 3, 4, 6] [2, 3, 4, 5] [2, 4, 5, 6]
[1, 3, 5, 6] [2, 3, 4, 6] [3, 4, 5, 6]
[1, 4, 5, 6] [2, 3, 5, 6]
[2, 3, 4, 5] [2, 4, 5, 6]
[2, 3, 4, 6] [3, 4, 5, 6]
[2, 3, 5, 6]
[2, 4, 5, 6]
[3, 4, 5, 6]
All combinations that contained 1 and 2 and 3 have been removed from wanted results 1
All combinations that contained 1 and 2 have been removed from wanted results 2
I have a much bigger combinations to compute but it takes a lot of time and I want to reduce this time using these exclusions.
Tried solutions
With method 1, the combinations are still calculated
With method 2, I tried to modify the combinations function but I could not find a proper way to ignore my exclusion list before calculated.
Method 1 | Method 2
|
def main(): | def combinations(iterable, r):
l = list(range(1, 6)) | pool = tuple(iterable)
comb = combinations(l, 4) | n = len(pool)
| if r > n:
for i in comb: | return
if set([1, 3]).issubset(i): | indices = list(range(r))
continue | yield tuple(pool[i] for i in indices)
else | while True:
process() | for i in reversed(range(r)):
| if indices[i] != i + n - r:
| break
| else:
| return
| indices[i] += 1
| for j in range(i+1, r):
| indices[j] = indices[j-1] + 1
| yield tuple(pool[i] for i in indices)
EDIT:
First of all, thank you all for your help, I forgot to give more details about constraints.
The order of the ouputs is not relevant, from example, if result is [1, 2, 4, 5] [2, 3, 4, 5] or [2, 3, 4, 5] [1, 2, 4, 5], it is not important.
The elements of the combinations should be (if possible) sorted, [1, 2, 4, 5] [2, 3, 4, 5] and not [2, 1, 5, 4] [3, 2, 4, 5] but it is not important since the combinations could be sorted after.
The exclusions list is a list of all items that should not appear in the combinations together. e.g If my exclusion list is (1, 2, 3), all combinations that contains 1 and 2 and 3 should not be calculated. However, combinations with 1 and 2 and not 3 are allowed. In that case, if I exclude combinations that contains (1, 2) and (1, 2, 3) it is completely useless since all combinations that will be filtered by (1, 2, 3) are already filtered by (1, 2)
Multiple exclude lists must be possible because I use multiple constraints on my combinations.
Tested answers
#tobias_k
This solution considers the exclusion list (1, 2, 3) as OR exclusion meaning (1, 2), (2, 3) and (1, 3) will be excluded if I understood well, this is useful in a case but not in my current problem, I modified the question to give more details, sorry for confusion. In your answer, I can't use only lists (1, 2) and (1, 3) as exclusion as you specified it. However the big advantage of this solution is to permit multiple exclusions.
#Kasramvd and #mikuszefski
Your solution is really close to what I want, if it does include multiple exclusion lists, it would be the answer.
Thanks

(Turns out this does not do exactly what OP wants. Still leaving this here as it might help others.)
To include mutually exclusive elements, you could wrap those in lists within the list, get the combinations of those, and then the product of the combinations of sub-lists:
>>> from itertools import combinations, product
>>> l = [[1, 3], [2], [4], [5]]
>>> [c for c in combinations(l, 4)]
[([1, 3], [2], [4], [5])]
>>> [p for c in combinations(l, 4) for p in product(*c)]
[(1, 2, 4, 5), (3, 2, 4, 5)]
A more complex example:
>>> l = [[1, 3], [2, 4, 5], [6], [7]]
>>> [c for c in combinations(l, 3)]
[([1, 3], [2, 4, 5], [6]),
([1, 3], [2, 4, 5], [7]),
([1, 3], [6], [7]),
([2, 4, 5], [6], [7])]
>>> [p for c in combinations(l, 3) for p in product(*c)]
[(1, 2, 6),
(1, 4, 6),
... 13 more ...
(4, 6, 7),
(5, 6, 7)]
This does not generate any "junk" combinations to be filtered out afterwards. However, it assumes that you want at most one element from each "exclusive" group, e.g. in the second example, it not only prevents combinations with 2,4,5, but also those with 2,4, 4,5, or 2,5. Also, it is not possible (or at least not easy) to have exclusively one of 1,3, and 1,5, but allow for 3,5. (It might be possible to extend it to those cases, but I'm not yet sure if and how.)
You can wrap this in a function, deriving the slightly different input format from your (presumed) format and returning an accordant generator expression. Here, lst is the list of elements, r the number of items per combinations, and exclude_groups a list of groups of mutually-exclusive elements:
from itertools import combinations, product
def comb_with_excludes(lst, r, exclude_groups):
ex_set = {e for es in exclude_groups for e in es}
tmp = exclude_groups + [[x] for x in lst if x not in ex_set]
return (p for c in combinations(tmp, r) for p in product(*c))
lst = [1, 2, 3, 4, 5, 6, 7]
excludes = [[1, 3], [2, 4, 5]]
for x in comb_with_excludes(lst, 3, excludes):
print(x)

(As it turned out that my previous answer does not really satisfy the constraints of the question, here's another one. I'm posting this as a separate answer, as the approach is vastly different and the original answer may still help others.)
You can implement this recursively, each time before recursing to add another element to the combinations checking whether that would violate one of the exclude-sets. This does not generate and invalid combinations, and it works with overlapping exclude-sets (like (1,3), (1,5)), and exclude-sets with more than two elements (like (2,4,5), allowing any combinations except all of them together).
def comb_with_excludes(lst, n, excludes, i=0, taken=()):
if n == 0:
yield taken # no more needed
elif i <= len(lst) - n:
t2 = taken + (lst[i],) # add current element
if not any(e.issubset(t2) for e in excludes):
yield from comb_with_excludes(lst, n-1, excludes, i+1, t2)
if i < len(lst) - n: # skip current element
yield from comb_with_excludes(lst, n, excludes, i+1, taken)
Example:
>>> lst = [1, 2, 3, 4, 5, 6]
>>> excludes = [{1, 3}, {1, 5}, {2, 4, 5}]
>>> list(comb_with_excludes(lst, 4, excludes))
[[1, 2, 4, 6], [2, 3, 4, 6], [2, 3, 5, 6], [3, 4, 5, 6]]
Well, I did time it now, and it turns out this is considerably slower than naively using itertools.combination in a generator expression with a filter, much like you already do:
def comb_naive(lst, r, excludes):
return (comb for comb in itertools.combinations(lst, r)
if not any(e.issubset(comb) for e in excludes))
Calculating the combinations in Python is just slower than using the library (which is probably implemented in C) and filtering the results afterwards. Depending on the amount of combinations that can be excluded, this might be faster in some cases, but to be honest, I have my doubts.
You could get better results if you can use itertools.combinations for subproblems, as in Kasramvd's answer, but for multiple, non-disjunct exclude sets that's more difficult. One way might be to separate the elements in the list into two sets: Those that have constraints, and those that don't. Then, use itertoolc.combinations for both, but check the constraints only for the combinations of those elements where they matter. You still have to check and filter the results, but only a part of them. (One caveat, though: The results are not generated in order, and the order of the elements within the yielded combinations is somewhat messed up, too.)
def comb_with_excludes2(lst, n, excludes):
wout_const = [x for x in lst if not any(x in e for e in excludes)]
with_const = [x for x in lst if any(x in e for e in excludes)]
k_min, k_max = max(0, n - len(wout_const)), min(n, len(with_const))
return (c1 + c2 for k in range(k_min, k_max)
for c1 in itertools.combinations(with_const, k)
if not any(e.issubset(c1) for e in excludes)
for c2 in itertools.combinations(wout_const, n - k))
This is already much better then the recursive pure-Python solution, but still not as good as the "naive" approach for the above example:
>>> lst = [1, 2, 3, 4, 5, 6]
>>> excludes = [{1, 3}, {1, 5}, {2, 4, 5}]
>>> %timeit list(comb_with_excludes(lst, 4, excludes))
10000 loops, best of 3: 42.3 µs per loop
>>> %timeit list(comb_with_excludes2(lst, 4, excludes))
10000 loops, best of 3: 22.6 µs per loop
>>> %timeit list(comb_naive(lst, 4, excludes))
10000 loops, best of 3: 16.4 µs per loop
However, the results depend very much on the input. For a larger list, with constraints only applying to a few of those elements, this approach is in fact faster then the naive one:
>>> lst = list(range(20))
>>> %timeit list(comb_with_excludes(lst, 4, excludes))
10 loops, best of 3: 15.1 ms per loop
>>> %timeit list(comb_with_excludes2(lst, 4, excludes))
1000 loops, best of 3: 558 µs per loop
>>> %timeit list(comb_naive(lst, 4, excludes))
100 loops, best of 3: 5.9 ms per loop

From an algorithmic perspective you can separate the excluded and the reset of the valid items and calculate the combinations of each set separately and just concatenate the result based on the desire length. This approach will entirely refuse of including all the excluded items at once in combination but will omit the actual order.
from itertools import combinations
def comb_with_exclude(iterable, comb_num, excludes):
iterable = tuple(iterable)
ex_len = len(excludes)
n = len(iterable)
if comb_num < ex_len or comb_num > n:
yield from combinations(iterable, comb_num)
else:
rest = [i for i in iterable if not i in excludes]
ex_comb_rang = range(0, ex_len)
rest_comb_range = range(comb_num, comb_num - ex_len, -1)
# sum of these pairs is equal to the comb_num
pairs = zip(ex_comb_rang, rest_comb_range)
for i, j in pairs:
for p in combinations(excludes, i):
for k in combinations(rest, j):
yield k + p
"""
Note that instead of those nested loops you could wrap the combinations within a product function like following:
for p, k in product(combinations(excludes, i), combinations(rest, j)):
yield k + p
"""
Demo:
l = [1, 2, 3, 4, 5, 6, 7, 8]
ex = [2, 5, 6]
print(list(comb_with_exclude(l, 6, ex)))
[(1, 3, 4, 7, 8, 2), (1, 3, 4, 7, 8, 5), (1, 3, 4, 7, 8, 6), (1, 3, 4, 7, 2, 5), (1, 3, 4, 8, 2, 5), (1, 3, 7, 8, 2, 5), (1, 4, 7, 8, 2, 5), (3, 4, 7, 8, 2, 5), (1, 3, 4, 7, 2, 6), (1, 3, 4, 8, 2, 6), (1, 3, 7, 8, 2, 6), (1, 4, 7, 8, 2, 6), (3, 4, 7, 8, 2, 6), (1, 3, 4, 7, 5, 6), (1, 3, 4, 8, 5, 6), (1, 3, 7, 8, 5, 6), (1, 4, 7, 8, 5, 6), (3, 4, 7, 8, 5, 6)]
l = [1, 2, 3, 4, 5]
ex = [1, 3]
print(list(comb_with_exclude(l, 4, ex)))
[(2, 4, 5, 1), (2, 4, 5, 3)]
Benckmark with other answers:
Results: this approach is faster that the others
# this answer
In [169]: %timeit list(comb_with_exclude(lst, 3, excludes[0]))
100000 loops, best of 3: 6.47 µs per loop
# tobias_k
In [158]: %timeit list(comb_with_excludes(lst, 3, excludes))
100000 loops, best of 3: 13.1 µs per loop
# Vikas Damodar
In [166]: %timeit list(combinations_exc(lst, 3))
10000 loops, best of 3: 148 µs per loop
# mikuszefski
In [168]: %timeit list(sub_without(lst, 3, excludes[0]))
100000 loops, best of 3: 12.52 µs per loop

I have made an attempt to edit combinations according to your requirements :
def combinations(iterable, r):
# combinations('ABCD', 2) --> AB AC AD BC BD CD
# combinations(range(4), 3) --> 012 013 023 123
pool = tuple(iterable)
n = len(pool)
if r > n:
return
indices = list(range(r))
# yield tuple(pool[i] for i in indices)
while True:
for i in reversed(range(r)):
if indices[i] != i + n - r:
break
else:
return
indices[i] += 1
for j in range(i+1, r):
indices[j] = indices[j-1] + 1
# print(tuple(pool[i] for i in indices ), "hai")
if 1 in tuple(pool[i] for i in indices ) and 3 in tuple(pool[i] for i in indices ):
pass
else:
yield tuple(pool[i] for i in indices)
d = combinations(list(range(1, 6)),4)
for i in d:
print(i)
It will return something like this :
(1, 2, 4, 5)
(2, 3, 4, 5)

I did the exclusion during the combination using the following code to save the second loop time. you just need to pass the indices of the excluded elements as a set.
update: working fiddle
from itertools import permutations
def combinations(iterable, r, combIndeciesExclusions=set()):
pool = tuple(iterable)
n = len(pool)
for indices in permutations(range(n), r):
if ( len(combIndeciesExclusions)==0 or not combIndeciesExclusions.issubset(indices)) and sorted(indices) == list(indices):
yield tuple(pool[i] for i in indices)
l = list(range(1, 6))
comb = combinations(l, 4, set([0,2]))
print list(comb)

I guess my answer is similar to some others here, but this was what I fiddled together in parallel
from itertools import combinations, product
"""
with help from
https://stackoverflow.com/questions/374626/how-can-i-find-all-the-subsets-of-a-set-with-exactly-n-elements
https://stackoverflow.com/questions/32438350/python-merging-two-lists-with-all-possible-permutations
https://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python
"""
def sub_without( S, m, forbidden ):
out = []
allowed = [ s for s in S if s not in forbidden ]
N = len( allowed )
for k in range( len( forbidden ) ):
addon = [ list( x ) for x in combinations( forbidden, k) ]
if N + k >= m:
base = [ list( x ) for x in combinations( allowed, m - k ) ]
leveltotal = [ [ item for sublist in x for item in sublist ] for x in product( base, addon ) ]
out += leveltotal
return out
val = sub_without( range(6), 4, [ 1, 3, 5 ] )
for x in val:
print sorted(x)
>>
[0, 1, 2, 4]
[0, 2, 3, 4]
[0, 2, 4, 5]
[0, 1, 2, 3]
[0, 1, 2, 5]
[0, 2, 3, 5]
[0, 1, 3, 4]
[0, 1, 4, 5]
[0, 3, 4, 5]
[1, 2, 3, 4]
[1, 2, 4, 5]
[2, 3, 4, 5]

Algorithmically you have to calculate the combination of the items within your list that are not among the excluded ones then add the respective combinations of the excluded items to the combination of the rest of items. This approach of course requires a lot of checking and need to keep track of the indexes which even if you do it in python it won't give you a notable difference in performance (known as drawbacks of Constraint satisfaction problem). (rather than just calculating them using combination and filtering the unwanted items out).
Therefor, I think this is the best way to go in most of the cases:
In [77]: from itertools import combinations, filterfalse
In [78]: list(filterfalse({1, 3}.issubset, combinations(l, 4)))
Out[78]: [(1, 2, 4, 5), (2, 3, 4, 5)]

Related

How to split a list into two based on a value?

I am trying to create a function which can separate a list into two new lists based on a value (in this case 3.5).
The code I have tried to make so far makes a list of the first values in the main list (the ones I want to compare). This list is [1,1,3,1,4,4,5,1] I now want to create two lists. This would be [1,1,3,1,1] and [4,4,5]. However, I cannot use > to compare the different values in the list and am unsure how to do so.
As I said above, I'm confused about what your code is trying to do, but you can split a list like this.
my_list=[1,1,3,1,4,4,5,1]
my_val=3.5 #value to split on
list1=[x for x in my_list if x>my_val]
list2=[x for x in my_list if x<my_val]
EDIT
To make this work for a list of tuples, based on their first value, you can do the same but with a slight modification
my_list = [(1, 4, 3, 0),
(1, 7, 6, 0),
(3, 8, 7, 0),
(1, 1, 9, 0),
(4, 1, 1, 0),
(4, 3, 8, 1),
(5, 4, 2, 1),
(1, 7, 7, 1)]
list1=[x for x in my_list if x[0]>my_val]
list2=[x for x in my_list if x[0]<my_val]
The itertools module documentation provides a series of recipes for common tasks. One of them is a partition function.
from itertools import tee, filterfalse
def partition(pred, iterable):
"Use a predicate to partition entries into false entries and true entries"
# partition(is_odd, range(10)) --> 0 2 4 6 8 and 1 3 5 7 9
t1, t2 = tee(iterable)
return filterfalse(pred, t1), filter(pred, t2)
my_list=[1,1,3,1,4,4,5,1]
t1, t2 = partition(lambda x: x > 3.5, my_list)
list1 = list(t1) # [1, 1, 3, 1, 1]
list2 = list(t2) # [4, 4, 5]
You can have a function to split your list for a value.
def split_list(val, _list):
list1 = []
list2 = []
for _x in _list:
(list1 if _x <= val else list2).append(_x)
return list1, list2
# Your example
print(split_list(3.5, [1, 1, 3, 1, 4, 4, 5, 1]))
#brings output
#([1, 1, 3, 1, 1], [4, 4, 5])
And call it repeated for a data set as you said
# For a data set like yours
my_list = [(1, 4, 3, 0),
(1, 7, 6, 0),
(3, 8, 7, 0),
(1, 1, 9, 0),
(4, 1, 1, 0),
(4, 3, 8, 1),
(5, 4, 2, 1),
(1, 7, 7, 1)]
my_list_1 = []
my_list_2 = []
for x in my_list:
g, le = split_list(3.5, x)
my_list_1.append(g)
my_list_2.append(le)
print(my_list_1)
print(my_list_2)
# Separates the list to
[[1, 3, 0], [1, 0], [3, 0], [1, 1, 0], [1, 1, 0], [3, 1], [2, 1], [1, 1]]
[[4], [7, 6], [8, 7], [9], [4], [4, 8], [5, 4], [7, 7]]

Generating all the increasing subsequences

Given an array of integers, how can we generate all the increasing subsequences such that all of them have same length ?
Example: given this list
l = [1, 2, 4, 5, 3, 6]
The answer should be if we consider subsequences of length 4:
[1, 2, 3, 6]
[1, 2, 4, 5]
[1, 2, 4, 6]
[1, 2, 5, 6]
[1, 4, 5, 6]
[2, 4, 5, 6]
from itertools import combinations
# the second item of the tuple is the position of the element in the array
zeta = [(1, 1), (2, 2), (4, 3), (5, 4), (3, 5), (6, 6)]
comb = combinations(sorted(zeta, key=lambda x: x[0]), 4)
def verif(x):
l = []
for k in x:
l.append(k[1])
for i in range(len(l)-1):
if l[i+1]-l[i] < 0:
return 0
return 1
for i in comb:
if verif(list(i)):
print(i)
I want a better approach like dynamic programming solution, because obviously my solution is very slow for bigger list of integers. Is LIS problem helpful in this situation?

Swap two values randomly in list

I have the following list:
a = [1, 2, 5, 4, 3, 6]
And I want to know how I can swap any two values at a time randomly within a list regardless of position within the list. Below are a few example outputs on what I'm thinking about:
[1, 4, 5, 2, 3, 6]
[6, 2, 3, 4, 5, 1]
[1, 6, 5, 4, 3, 2]
[1, 3, 5, 4, 2, 6]
Is there a way to do this in Python 2.7? My current code is like this:
import random
n = len(a)
if n:
i = random.randint(0,n-1)
j = random.randint(0,n-1)
a[i] += a[j]
a[j] = a[i] - a[j]
a[i] -= a[j]
The issue with the code I currently have, however, is that it starts setting all values to zero given enough swaps and iterations, which I do not want; I want the values to stay the same in the array, but do something like 2opt and only switch around two with each swap.
You are over-complicating it, it seems. Just randomly sample two indices from the list, then swap the values at those indicies:
>>> def swap_random(seq):
... idx = range(len(seq))
... i1, i2 = random.sample(idx, 2)
... seq[i1], seq[i2] = seq[i2], seq[i1]
...
>>> a
[1, 2, 5, 4, 3, 6]
>>> swap_random(a)
>>> a
[1, 2, 3, 4, 5, 6]
>>> swap_random(a)
>>> a
[1, 2, 6, 4, 5, 3]
>>> swap_random(a)
>>> a
[1, 2, 6, 5, 4, 3]
>>> swap_random(a)
>>> a
[6, 2, 1, 5, 4, 3]
Note, I used the Python swap idiom, which doesn't require an intermediate variable. It is equivalent to:
temp = seq[i1]
seq[i1] = seq[i2]
seq[i2] = temp

Creating a new list based on their position

So there is a list which contains
'3,2,5,4,1','3,1,2,5,4','2,5,1,4,3'
These numbers are part of the same list, HOWEVER they are strings in a list(ie. list 1)
and from this, you say that for the "first row", 3 occurs at position 0, 2 occurs at position 1, 5 at 2 etc.
For the "second row", 3 occurs at position 0, 1 occurs at position 1, 2 occurs at position 2 etc.
I would like to create a loop or anything at all (besides using imported functions) to create a final list which looks like
0: [3, 3, 2]
1: [2, 1, 5]
2: [5, 2, 1]
3: [4, 5, 4]
4: [1, 4, 3]
Transposition of a two-dimensional list can simply be done using zip()
In [1]: l = [[3,2,5,4,1],
...: [3,1,2,5,4],
...: [2,5,1,4,3]]
In [2]: t = list(zip(*l))
In [3]: t
Out[3]: [(3, 3, 2), (2, 1, 5), (5, 2, 1), (4, 5, 4), (1, 4, 3)]
To output that in the format described above:
In [4]: for n,line in enumerate(t):
...: print("{}: {}".format(n, list(line)))
...:
0: [3, 3, 2]
1: [2, 1, 5]
2: [5, 2, 1]
3: [4, 5, 4]
4: [1, 4, 3]
Single line of code using Dictionary comprehension and List comprehension :
>>> { col:[row[col] for row in l] for col in range(len(l[0])) }
=> {0: [3, 3, 2], 1: [2, 1, 5], 2: [5, 2, 1], 3: [4, 5, 4], 4: [1, 4, 3]}
#driver values :
IN : l = [[3,2,5,4,1],
[3,1,2,5,4],
[2,5,1,4,3]]
NOTE to OP : what your output suggests is a Dictionary by looking at its structure. A list cannot be defined in the same manner.
EDIT : Since the OP's list is a list of strings, first convert that to a list of int using map and then continue as above.
>>> l = ['3,2,5,4,1', '3,1,2,5,4', '2,5,1,4,3']
>>> l = [list(map(int,s.split(','))) for s in l]
>>> l
=> [[3, 2, 5, 4, 1], [3, 1, 2, 5, 4], [2, 5, 1, 4, 3]]

Python Random List Comprehension

I have a list similar to:
[1 2 1 4 5 2 3 2 4 5 3 1 4 2]
I want to create a list of x random elements from this list where none of the chosen elements are the same. The difficult part is that I would like to do this by using list comprehension...
So possible results if x = 3 would be:
[1 2 3]
[2 4 5]
[3 1 4]
[4 5 1]
etc...
Thanks!
I should have specified that I cannot convert the list to a set. Sorry!
I need the randomly selected numbers to be weighted. So if 1 appears 4 times in the list and 3 appears 2 times in the list, then 1 is twice as likely to be selected...
Disclaimer: the "use a list comprehension" requirement is absurd.
Moreover, if you want to use the weights, there are many excellent approaches listed at Eli Bendersky's page on weighted random sampling.
The following is inefficient, doesn't scale, etc., etc.
That said, it has not one but two (TWO!) list comprehensions, returns a list, never duplicates elements, and respects the weights in a sense:
>>> s = [1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
>>> [x for x in random.choice([p for c in itertools.combinations(s, 3) for p in itertools.permutations(c) if len(set(c)) == 3])]
[3, 1, 2]
>>> [x for x in random.choice([p for c in itertools.combinations(s, 3) for p in itertools.permutations(c) if len(set(c)) == 3])]
[5, 3, 4]
>>> [x for x in random.choice([p for c in itertools.combinations(s, 3) for p in itertools.permutations(c) if len(set(c)) == 3])]
[1, 5, 2]
.. or, as simplified by FMc:
>>> [x for x in random.choice([p for p in itertools.permutations(s, 3) if len(set(p)) == 3])]
[3, 5, 2]
(I'll leave the x for x in there, even though it hurts not to simply write list(random.choice(..)) or just leave it as a tuple..)
Generally, you don't want to do this sort of thing in a list comprehension -- It'll lead to much harder to read code. However, if you really must, we can write a completely horrible 1 liner:
>>> values = [random.randint(0,10) for _ in xrange(12)]
>>> values
[1, 10, 6, 6, 3, 9, 0, 1, 8, 9, 1, 2]
>>> # This is the 1 liner -- The other line was just getting us a list to work with.
>>> [(lambda x=random.sample(values,3):any(values.remove(z) for z in x) or x)() for _ in xrange(4)]
[[6, 1, 8], [1, 6, 10], [1, 0, 2], [9, 3, 9]]
Please never use this code -- I only post it for fun/academic reasons.
Here's how it works:
I create a function inside the list comprehension with a default argument of 3 randomly selected elements from the input list. Inside the function, I remove the elements from values so that they aren't available to be picked again. since list.remove returns None, I can use any(lst.remove(x) for x in ...) to remove the values and return False. Since any returns False, we hit the or clause which just returns x (the default value with 3 randomly selected items) when the function is called. All that is left then is to call the function and let the magic happen.
The one catch here is that you need to make sure that the number of groups you request (here I chose 4) multiplied by the number of items per group (here I chose 3) is less than or equal to the number of values in your input list. It may seem obvious, but it's probably worth mentioning anyway...
Here's another version where I pull shuffle into the list comprehension:
>>> lst = [random.randint(0,10) for _ in xrange(12)]
>>> lst
[3, 5, 10, 9, 10, 1, 6, 10, 4, 3, 6, 5]
>>> [lst[i*3:i*3+3] for i in xrange(shuffle(lst) or 4)]
[[6, 10, 6], [3, 4, 10], [1, 3, 5], [9, 10, 5]]
This is significantly better than my first attempt, however, most people would still need to stop, scratch their head a bit before they figured out what this code was doing. I still assert that it would be much better to do this in multiple lines.
If I'm understanding your question properly, this should work:
def weighted_sample(L, x):
# might consider raising some kind of exception of len(set(L)) < x
while True:
ans = random.sample(L, x)
if len(set(ans)) == x:
return ans
Then if you want many such samples you can just do something like:
[weighted_sample(L, x) for _ in range(num_samples)]
I have a hard time conceiving of a comprehension for the sampling logic that isn't just obfuscated. The logic is a bit too complicated. It sounds like something randomly tacked on to a homework assignment to me.
If you don't like infinite looping, I haven't tried it but I think this will work:
def weighted_sample(L, x):
ans = []
c = collections.Counter(L)
while len(ans) < x:
r = random.randint(0, sum(c.values())
for k in c:
if r < c[k]:
ans.append(k)
del c[k]
break
else:
r -= c[k]
else:
# maybe throw an exception since this should never happen on valid input
return ans
First of all, I hope your list might be like
[1,2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
so if you want to print the permutation from the given list as size 3, you can do as the following.
import itertools
l = [1,2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
for permutation in itertools.permutations(list(set(l)),3):
print permutation,
Output:
(1, 2, 3) (1, 2, 4) (1, 2, 5) (1, 3, 2) (1, 3, 4) (1, 3, 5) (1, 4, 2) (1, 4, 3) (1, 4, 5) (1, 5, 2) (1, 5, 3) (1, 5, 4) (2, 1, 3) (2, 1, 4) (2, 1, 5) (2, 3, 1) (2, 3, 4) (2, 3, 5) (2, 4, 1) (2, 4, 3) (2, 4, 5) (2, 5, 1) (2, 5, 3) (2, 5, 4) (3, 1, 2) (3, 1, 4) (3, 1, 5) (3, 2, 1) (3, 2, 4) (3, 2, 5) (3, 4, 1) (3, 4, 2) (3, 4, 5) (3, 5, 1) (3, 5, 2) (3, 5, 4) (4, 1, 2) (4, 1, 3) (4, 1, 5) (4, 2, 1) (4, 2, 3) (4, 2, 5) (4, 3, 1) (4, 3, 2) (4, 3, 5) (4, 5, 1) (4, 5, 2) (4, 5, 3) (5, 1, 2) (5, 1, 3) (5, 1, 4) (5, 2, 1) (5, 2, 3) (5, 2, 4) (5, 3, 1) (5, 3, 2) (5, 3, 4) (5, 4, 1) (5, 4, 2) (5, 4, 3)
Hope this helps. :)
>>> from random import shuffle
>>> L = [1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
>>> x=3
>>> shuffle(L)
>>> zip(*[L[i::x] for i in range(x)])
[(1, 3, 2), (2, 2, 1), (4, 5, 3), (1, 4, 4)]
You could also use a generator expression instead of the list comprehension
>>> zip(*(L[i::x] for i in range(x)))
[(1, 3, 2), (2, 2, 1), (4, 5, 3), (1, 4, 4)]
Starting with a way to do it without list compehensions:
import random
import itertools
alphabet = [1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
def alphas():
while True:
yield random.choice(alphabet)
def filter_unique(iter):
found = set()
for a in iter:
if a not in found:
found.add(a)
yield a
def dice(x):
while True:
yield itertools.islice(
filter_unique(alphas()),
x
)
for i, output in enumerate(dice(3)):
print list(output)
if i > 10:
break
The part, where list comprehensions have troubles is filter_unique() since list comprehension does not have 'memory' of what it did output. The possible solution would be to generate many outputs while the one of good quality is not found as #DSM suggested.
The slow, naive approach is:
import random
def pick_n_unique(l, n):
res = set()
while len(res) < n:
res.add(random.choice(l))
return list(res)
This will pick elements and only quit when it has n unique ones:
>>> pick_n_unique([1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2], 3)
[2, 3, 4]
>>> pick_n_unique([1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2], 3)
[3, 4, 5]
However it can get slow if, for example, you have a list with thirty 1s and one 2, since once it has a 1 it'll keep spinning until it finally hits a 2. The better is to count the number of occurrences of each unique element, choose a random one weighted by their occurrence count, remove that element from the count list, and repeat until you have the desired number of elements:
def weighted_choice(item__counts):
total_counts = sum(count for item, count in item__counts.items())
which_count = random.random() * total_counts
for item, count in item__counts.items():
which_count -= count
if which_count < 0:
return item
raise ValueError("Should never get here")
def pick_n_unique(items, n):
item__counts = collections.Counter(items)
if len(item__counts) < n:
raise ValueError(
"Can't pick %d values with only %d unique values" % (
n, len(item__counts))
res = []
for i in xrange(n):
choice = weighted_choice(item__counts)
res.append(choice)
del item__counts[choice]
return tuple(res)
Either way, this is a problem not well-suited to list comprehensions.
def sample(self, population, k):
n = len(population)
if not 0 <= k <= n:
raise ValueError("sample larger than population")
result = [None] * k
try:
selected = set()
selected_add = selected.add
for i in xrange(k):
j = int(random.random() * n)
while j in selected:
j = int(random.random() * n)
selected_add(j)
result[i] = population[j]
except (TypeError, KeyError): # handle (at least) sets
if isinstance(population, list):
raise
return self.sample(tuple(population), k)
return result
Above is a simplied version of the sample function Lib/random.py. I only removed some optimization code for small data sets. The codes tell us straightly how to implement a customized sample function:
get a random number
if the number have appeared before just abandon it and get a new one
repeat the above steps until you get all the sample numbers you want.
Then the real problem turns out to be how to get a random value from a list by weight.This could be by the original random.sample(population, 1) in the Python standard library (a little overkill here, but simple).
Below is an implementation, because duplicates represent weight in your given list, we can use int(random.random() * array_length) to get a random index of your array.
import random
arr = [1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
def sample_by_weight( population, k):
n = len(population)
if not 0 <= k <= len(set(population)):
raise ValueError("sample larger than population")
result = [None] * k
try:
selected = set()
selected_add = selected.add
for i in xrange(k):
j = population[int(random.random() * n)]
while j in selected:
j = population[int(random.random() * n)]
selected_add(j)
result[i] = j
except (TypeError, KeyError): # handle (at least) sets
if isinstance(population, list):
raise
return self.sample(tuple(population), k)
return result
[sample_by_weight(arr,3) for i in range(10)]
With the setup:
from random import shuffle
from collections import deque
l = [1, 2, 1, 4, 5, 2, 3, 2, 4, 5, 3, 1, 4, 2]
This code:
def getSubLists(l,n):
shuffle(l) #shuffle l so the elements are in 'random' order
l = deque(l,len(l)) #create a structure with O(1) insert/pop at both ends
while l: #while there are still elements to choose
sample = set() #use a set O(1) to check for duplicates
while len(sample) < n and l: #until the sample is n long or l is exhausted
top = l.pop() #get the top value in l
if top in sample:
l.appendleft(top) #add it to the back of l for a later sample
else:
sample.add(top) #it isn't in sample already so use it
yield sample #yield the sample
You end up with:
for s in getSubLists(l,3):
print s
>>>
set([1, 2, 5])
set([1, 2, 3])
set([2, 4, 5])
set([2, 3, 4])
set([1, 4])

Categories

Resources