Generate all combinations of success sets in Python - python

There are k treatments and N total tests to distribute among the treatments, which is called a plan. For a fixed plan, I want to output in Python all the possible success sets.
Question:
For example, if doctors are testing headache medicine, if k=2 types of treatments (i.e. Aspirin and Ibuprofen) and N=3 total tests, one plan could be (1 test for Aspirin, 2 tests for Ibuprofen). For that plan, how do I output all possible combinations of 0-1 successful tests of Aspirin and 0-2 successful tests for Ibuprofen? One successful test means that when a patient with a headache is given Aspirin, the Aspirin cures their headache.
Please post an answer with python code, NOT a math answer.
Desired output is a list w/n a list that has [# successes for treatment 1, # successes of treatment 2]:
[ [0,0], [0,1], [0,2], [1,0], [1,1], [1,2] ]
It would be great if yield could be used because the list above could be really long and I don't want to store the whole list in memory, which would increase computation time.
Below I have the code for enumerating all possible combinations of N balls in A boxes, which should be similar to creating all possible success sets I think, but I'm not sure how.
Code
#Return list of tuples of all possible plans (n1,..,nk), where N = total # of tests = balls, K = # of treatments = boxes
#Code: Glyph, http://stackoverflow.com/questions/996004/enumeration-of-combinations-of-n-balls-in-a-boxes
def ballsAndBoxes(balls, boxes, boxIndex=0, sumThusFar=0):
if boxIndex < (boxes - 1):
for counter in range(balls + 1 - sumThusFar):
for rest in ballsAndBoxes(balls, boxes,
boxIndex + 1,
sumThusFar + counter):
yield (counter,) + rest
else:
yield (balls - sumThusFar,)

Generating the plans is a partition problem, but generating the success sets for a given plan only requires generating the Cartesian product of a set of ranges.
from itertools import product
def success_sets(plan):
return product(*map(lambda n: range(n + 1), plan))
plan = [1, 2]
for s in success_sets(plan):
print(s)
# (0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)
Since itertools.product returns a generator, the entire list will not be stored in memory as requested.

I am not sure exactly what you're trying to achieve. But combinations can be generated using itertools.
from itertools import combinations
#You can add an extra loop for all treatments
for j in range(1, N): #N is number of tests
for i in combinations(tests, r = j):
indexes = set(i)
df_cur = tests[indexes] #for tests i am using a pandas df
if :# condition for success
#actions
else:
#other_actions

Related

Get certain number of elements from Itertools.combinations

I need to split the total number of elements in iterator :
tot= itertools.combinations(dict1.keys(), 2) into 3 parts.
The size of dict1 = 285056
Total combinations possible = 40billion
My goal is to somehow divide these 40billion into 3 parts of 13.5 billion elements each to process on different processors parallely. At the moment i am naively iterating the 40billion and dumping pickle files when i reach 13.5 billion which isnt efficient as each 13.5 billion pickle is 160gb on disk (much larger when loaded in memory)
So is there any way I could iterate the 40billion till 13.5billionth element in one code and then start from 13.6 billionth element in code 2 and so on without iteration like i did.
Below code i use to get certain number of elements from combinations iterable.
def grouper(n, iterable):
it = iter(iterable)
while True:
chunk = tuple(itertools.islice(it, n))
if not chunk:
return
yield chunk
for first_chunk in grouper(1350000000,tot ):
It is easy to create this kind of split with itertools.
Given a set of elements, we can test if the first part of
the generated combination belongs to the computation for
machine i.
In the code below, I show a crude solution for this, with
the code in the for loop intended to be split over 3 machines.
Machine i will run the code for the i-th segment of
the keys for the first element of the combination,
combined with the full set for the second element.
The combinations are supposed to be processed in the line
where cnt2 is calculated. Replace that with the kind of for
loop you want to process your combinations with.
Compared with generating and storing all combinations, this
solution does not store any, but it will (internally) generate
all. But what's a couple of billion combinations between friends?
import itertools
def is_not_for_machine(i, t):
""" t is in the set if first element
in my_set_prefix[i] for machine i """
if my_set_prefix[i][0] <= t[0] < my_set_prefix[i][1]:
return False
return True
my_set_prefix = []
for i in range(3):
my_set_prefix.append((len(my_keys)*i//3, len(my_keys)*(i+1)//3))
print(f"== partition: {my_set_prefix}")
my_keys = range(12)
all = itertools.combinations(my_keys, 2)
cnt = len([_ for _ in all])
print(f"== total set size {cnt}")
for i in range(3):
all = itertools.combinations(my_keys, 2)
cnt2 = len([_ for _ in itertools.filterfalse(lambda t: is_not_for_machine(i, t), all)])
print(f"== set size for prefix {my_set_prefix[i]}: {cnt2}")
The output shows that some load balancing might be necessary,
since this partition is "triangular descending", with the
highest count for the first combinations.
== partition: [(0, 4), (4, 8), (8, 12)]
== total set size 66
== set size for prefix (0, 4): 38
== set size for prefix (4, 8): 22
== set size for prefix (8, 12): 6
why not directly use the math.comb command to get the number of combinations?
Just go to the question there.

Find ways to get to a number of levels

I got this problem on CoderByte. The requirement was to find a number of ways. I found solutions for that in StackOverflow and other sites. But moving ahead, I need all possible ways as well to reach the Nth step.
Problem description: There is a staircase of N steps and you can climb either 1 or 2 steps at a time. You need to count and return the total number of unique ways to climb the staircase. The order of steps taken matters.
For Example,
Input: N = 3
Output: 3
Explanation: There are 3 unique ways of climbing a staircase of 3 steps :{1,1,1}, {2,1} and {1,2}
Note: There might be another case that a person can take 2 or 3 or 4 steps at a time (I know that's realistically not possible but trying to add scalability to the input steps in the code)
I'm unable to find the right logic to get all the ways possible. It's useful if I get the solution in Python, but it's not a strict requirement though.
Here's a minimal solution using itertools library:
from itertools import permutations, chain
solve = lambda n: [(1,)*n] + list(set(chain(*[permutations((2,)*i + (1,)*(n-2*i)) for i in range(1, n//2+1)])))
For your example input:
> solve(3)
[(1, 1, 1), (1, 2), (2, 1)]
How it works?
It's easier to see what's happening if we take a step backwards:
def solve(n):
combinations = [(1,)*n]
for i in range(1, n//2+1):
combinations.extend(permutations((2,)*i + (1,)*(n-2*i)))
return list(set(combinations))
The most trivial case is the one where you take one step at a time, so n steps: (1,)*n. Then we can look for how many double steps could we take at most, and that's the floor of n divided by 2: n//2. Then we iterate over the possible double steps: try to add a double step each iteration (2,)*i, filling the remaining space with single steps (1,)*(n-2*i).
The function permutations from itertools will generate all the possible permutations of single and double steps for that iteration. With an input of (1,1,2), it will generate (1,1,2), (1,2,1) and (2,1,1). At the end we use the trick of converting the result to a set in order to remove duplicates, then converting it back into a list.
Generalization for any amount and length of steps (not optimal!)
One liner:
from itertools import permutations, chain, combinations_with_replacement
solve = lambda n, steps: list(set(chain(*[permutations(sequence) for sequence in chain(*[combinations_with_replacement(steps, r) for r in range(n//min(steps)+1)]) if sum(sequence) == n])))
Example output:
> solve(8, [2,3])
[(3, 2, 3), (2, 3, 3), (2, 2, 2, 2), (3, 3, 2)]
Easier to read version:
def solve(n, steps):
result = []
for sequence_length in range(n//min(steps)+1):
sequences = combinations_with_replacement(steps, sequence_length)
for sequence in sequences:
if sum(sequence) == n:
result.extend(permutations(sequence))
return list(set(result))
def solve(n) :
if (n == 0):
return [[]]
else:
left_results = []
right_results = []
if (n > 0):
left_results = solve(n - 1)
for res in left_results: # Add the current step to every result
res.append(1)
if (n > 1):
right_results = solve(n - 2)
for res in right_results: # Same above
res.append(2)
return left_results + right_results
I think there is a better way to do this using dynamic programming but I don't know how to do that. Hope it helps anyway.

Efficient enumeration of non-negative integer composition

I would like to write a function my_func(n,l) that, for some positive integer n, efficiently enumerates the ordered non-negative integer composition* of length l (where l is greater than n). For example, I want my_func(2,3) to return [[0,0,2],[0,2,0],[2,0,0],[1,1,0],[1,0,1],[0,1,1]].
My initial idea was to use existing code for positive integer partitions (e.g. accel_asc() from this post), extend the positive integer partitions by a couple zeros and return all permutations.
def my_func(n, l):
for ip in accel_asc(n):
nic = numpy.zeros(l, dtype=int)
nic[:len(ip)] = ip
for p in itertools.permutations(nic):
yield p
The output of this function is wrong, because every non-negative integer composition in which a number appears twice (or multiple times) appears several times in the output of my_func. For example, list(my_func(2,3)) returns [(1, 1, 0), (1, 0, 1), (1, 1, 0), (1, 0, 1), (0, 1, 1), (0, 1, 1), (2, 0, 0), (2, 0, 0), (0, 2, 0), (0, 0, 2), (0, 2, 0), (0, 0, 2)].
I could correct this by generating a list of all non-negative integer compositions, removing repeated entries, and then returning a remaining list (instead of a generator). But this seems incredibly inefficient and will likely run into memory issues. What is a better way to fix this?
EDIT
I did a quick comparison of the solutions offered in answers to this post and to another post that cglacet has pointed out in the comments.
On the left, we have the l=2*n and on the right we have l=n+1. In these two cases, user2357112's second solutions is faster than the others, when n<=5. For n>5, solutions proposed by user2357112, Nathan Verzemnieks, and AndyP are more or less tied. But the conclusions could be different when considering other relationships between l and n.
..........
*I originally asked for non-negative integer partitions. Joseph Wood correctly pointed out that I am in fact looking for integer compositions, because the order of numbers in a sequence matters to me.
Use the stars and bars concept: pick positions to place l-1 bars between n stars, and count how many stars end up in each section:
import itertools
def diff(seq):
return [seq[i+1] - seq[i] for i in range(len(seq)-1)]
def generator(n, l):
for combination in itertools.combinations_with_replacement(range(n+1), l-1):
yield [combination[0]] + diff(combination) + [n-combination[-1]]
I've used combinations_with_replacement instead of combinations here, so the index handling is a bit different from what you'd need with combinations. The code with combinations would more closely match a standard treatment of stars and bars.
Alternatively, a different way to use combinations_with_replacement: start with a list of l zeros, pick n positions with replacement from l possible positions, and add 1 to each of the chosen positions to produce an output:
def generator2(n, l):
for combination in itertools.combinations_with_replacement(range(l), n):
output = [0]*l
for i in combination:
output[i] += 1
yield output
Starting from a simple recursive solution, which has the same problem as yours:
def nn_partitions(n, l):
if n == 0:
yield [0] * l
else:
for part in nn_partitions(n - 1, l):
for i in range(l):
new = list(part)
new[i] += 1
yield new
That is, for each partition for the next lower number, for each place in that partition, add 1 to the element in that place. It yields the same duplicates yours does. I remembered a trick for a similar problem, though: when you alter a partition p for n into one for n+1, fix all the elements of p to the left of the element you increase. That is, keep track of where p was modified, and never modify any of p's "descendants" to the left of that. Here's the code for that:
def _nn_partitions(n, l):
if n == 0:
yield [0] * l, 0
else:
for part, start in _nn_partitions(n - 1, l):
for i in range(start, l):
new = list(part)
new[i] += 1
yield new, i
def nn_partitions(n, l):
for part, _ in _nn_partitions(n, l):
yield part
It's very similar - there's just the extra parameter passed along at each step, so I added wrapper to remove that for the caller.
I haven't tested it extensively, but this appears to be reasonably fast - about 35 microseconds for nn_partitions(3, 5) and about 18s for nn_partitions(10, 20) (which yields just over 20 million partitions). (The very elegant solution from user2357112 takes about twice as long for the smaller case and about four times as long for the larger one. Edit: this refers to the first solution from that answer; the second one is faster than mine under some circumstances and slower under others.)

Python Get Random Unique N Pairs

Say I have a range(1, n + 1). I want to get m unique pairs.
What I found is, if the number of pairs is close to n(n-1)/2 (maxiumum number of pairs), one can't simply generate random pairs everytime because they will start overriding eachother. I'm looking for a somewhat lazy solution, that will be very efficient (in Python's world).
My attempt so far:
def get_input(n, m):
res = str(n) + "\n" + str(m) + "\n"
buffet = range(1, n + 1)
points = set()
while len(points) < m:
x, y = random.sample(buffet, 2)
points.add((x, y)) if x > y else points.add((y, x)) # meeh
for (x, y) in points:
res += "%d %d\n" % (x, y);
return res
You can use combinations to generate all pairs and use sample to choose randomly. Admittedly only lazy in the "not much to type" sense, and not in the use a generator not a list sense :-)
from itertools import combinations
from random import sample
n = 100
sample(list(combinations(range(1,n),2)),5)
If you want to improve performance you can make it lazy by studying this
Python random sample with a generator / iterable / iterator
the generator you want to sample from is this: combinations(range(1,n)
Here is an approach which works by taking a number in the range 0 to n*(n-1)/2 - 1 and decodes it to a unique pair of items in the range 0 to n-1. I used 0-based math for convenience, but you could of course add 1 to all of the returned pairs if you want:
import math
import random
def decode(i):
k = math.floor((1+math.sqrt(1+8*i))/2)
return k,i-k*(k-1)//2
def rand_pair(n):
return decode(random.randrange(n*(n-1)//2))
def rand_pairs(n,m):
return [decode(i) for i in random.sample(range(n*(n-1)//2),m)]
For example:
>>> >>> rand_pairs(5,8)
[(2, 1), (3, 1), (4, 2), (2, 0), (3, 2), (4, 1), (1, 0), (4, 0)]
The math is hard to easily explain, but the k in the definition of decode is obtained by solving a quadratic equation which gives the number of triangular numbers which are <= i, and where i falls in the sequence of triangular numbers tells you how to decode a unique pair from it. The interesting thing about this decode is that it doesn't use n at all but implements a one-to-one correspondence from the set of natural numbers (starting at 0) to the set of all pairs of natural numbers.
I don't think any thing on your line can improve. After all, as your m get closer and closer to the limit n(n-1)/2, you have thinner and thinner chance to find the unseen pair.
I would suggest to split into two cases: if m is small, use your random approach. But if m is large enough, try
pairs = list(itertools.combination(buffet,2))
ponits = random.sample(pairs, m)
Now you have to determine the threshold of m that determines which code path it should go. You need some math here to find the right trade off.

Python: Given a set of N elements, choose k at random, m times

Given a set of N elements, I want to choose m random, non-repeating subsets of k elements.
If I was looking to generate all the N choose k combinations, I could
have used itertools.combination, so one way to do what I m asking would be:
import numpy as np
import itertools
n=10
A = np.arange(n)
k=4
m=5
result = np.random.permutation([x for x in itertools.permutations(A,k)])[:m]
print(result)
The problem is of course that this code first generates all the possible permutations, and that this can be quite expensive.
Another suboptimal solution would be to choose each time a single permutation at random (e.g. choose-at-random-from-combinations, then sort to get permutation), and discard it if it has already been selected.
Is there a better way to do this?
Your second solution seems to be the only practical way to do it. It will work well unless k is close to n and m is "large", in which case there will be more repetitions.
I added a count of the tries needed to get the samples we need. For m=50, with n=10 and k=4, it takes usually less than 60 tries. You can see how it goes with the size of your population and your samples.
You can use random.sample to get a list of k values without replacement, then sort it and turn it into a tuple. So, we can use a set for keeping only unique results.
import random
n = 10
A = list(range(n))
k = 4
m = 5
samples = set()
tries = 0
while len(samples) < m:
samples.add(tuple(sorted(random.sample(A, k))))
tries += 1
print(samples)
print(tries)
# {(1, 4, 5, 9), (0, 3, 6, 8), (0, 4, 7, 8), (3, 5, 7, 9), (1, 2, 3, 4)}
# 6
# 6 tries this time !
The simplest way to do it is to random.shuffle(range) then take first k elements (need to be repeated until m valid samples are collected).
Of course this procedure cannot guarantee unique samples. You are to check a new sample against your historical hash if you really need it.
Since Pyton2.3, random.sample(range, k) can be used to produce a sample in a more efficient way

Categories

Resources