How to find peaks of minimal length efficiently? - python

I have list/array of integers, call a subarray a peak if it goes up and then goes down. For example:
[5,5,4,5,4]
contains
[4,5,4]
which is a peak.
Also consider
[6,5,4,4,4,4,4,5,6,7,7,7,7,7,6]
which contains
[6,7,7,7,7,7,6]
which is a peak.
The problem
Given an input list, I would like to find all the peaks contained in it of minimal length and report them. In the example above, [5,6,7,7,7,7,7,6] is also a peak but we remove the first element and it remains a peak so we don't report it.
So for input list:
L = [5,5,5,5,4,5,4,5,6,7,8,8,8,8,8,9,9,8]
we would return
[4,5,4] and [8,9,9,8] only.
I am having problems devising a nice algorithm for this. Any help would be hugely appreciated.

Using itertools
Here is a short solution using itertools.groupby to detect peaks. The groups identifying peaks are then unpacked to yield the actual sequence.
from itertools import groupby, islice
l = [1, 2, 1, 2, 2, 0, 0]
fst, mid, nxt = groupby(l), islice(groupby(l), 1, None), islice(groupby(l), 2, None)
peaks = [[f[0], *m[1], n[0]] for f, m, n in zip(fst, mid, nxt) if f[0] < m[0] > n[0]]
print(peaks)
Output
[[1, 2, 1], [1, 2, 2, 0]]
Using a loop (faster)
The above solution is elegant but since three instances of groupby are created, the list is traversed three times.
Here is a solution using a single traversal.
def peaks(lst):
first = 0
last = 1
while last < len(lst) - 1:
if lst[first] < lst[last] == lst[last+1]:
last += 1
elif lst[first] < lst[last] > lst[last+1]:
yield lst[first:last+2]
first = last + 1
last += 2
else:
first = last
last += 1
l = [1, 2, 1, 2, 2, 0, 0]
print(list(peaks(l)))
Output
[[1, 2, 1], [1, 2, 2, 0]]
Notes on benchmark
Upon benchmarking with timeit, I noticed an increase in performance of about 20% for the solution using a loop. For short lists the overhead of groupby could bring that number up to 40%. The benchmark was done on Python 3.6.

Related

lookup function for repeated elements in an array

I have a list-like python object of positive integers and I want to get which locations on that list have repeated values. For example
if input is [0,1,1] the function should return [1,2] because the value of 1, which is the element at position 1 and 2 of the input array appears twice. Similarly:
[0,13,13] should return [[1, 2]]
[0,1,2,1,3,4,2,2] should return [[1, 3], [2, 6, 7]] because 1 appears twice, at positions [1, 3] of the input array and 2 appears 3 times at positions [2, 6, 7]
[1, 2, 3] should return an empty array []
What I have written is this:
def get_locations(labels):
out = []
label_set = set(labels)
for label in list(label_set):
temp = [i for i, j in enumerate(labels) if j == label]
if len(temp) > 1:
out.append(np.array(temp))
return np.array(out)
While it works ok for small input arrays it gets too slow when size grows. For instance, The code below on my pc, skyrockets from 0.14secs when n=1000 to 12secs when n = 10000
from timeit import default_timer as timer
start = timer()
n = 10000
a = np.arange(n)
b = np.append(a, a[-1]) # append the last element to the end
out = get_locations(b)
end = timer()
print(out)
print(end - start) # Time in seconds
How can I speed this up please? Any ideas highly appreciated
Your nested loop results in O(n ^ 2) in time complexity. You can instead create a dict of lists to map indices to each label, and extract the sub-lists of the dict only if the length of the sub-list is greater than 1, which reduces the time complexity to O(n):
def get_locations(labels):
positions = {}
for index, label in enumerate(labels):
positions.setdefault(label, []).append(index)
return [indices for indices in positions.values() if len(indices) > 1]
so that get_locations([0, 1, 2, 1, 3, 4, 2, 2]) returns:
[[1, 3], [2, 6, 7]]
Your code is slow because of the nested for-loop. You can solve this in a more efficient way by using another data structure:
from collections import defaultdict
mylist = [0,1,2,1,3,4,2,2]
output = defaultdict(list)
# Loop once over mylist, store the indices of all unique elements
for i, el in enumerate(mylist):
output[el].append(i)
# Filter out elements that occur only once
output = {k:v for k, v in output.items() if len(v) > 1}
This produces the following output for your example b:
{1: [1, 3], 2: [2, 6, 7]}
You can turn this result into the desired format:
list(output.values())
> [[1, 3], [2, 6, 7]]
Know however that this relies on the dictionary being insertion ordered, which is only the case as of python 3.6.
Heres a code i implemented. It runs in linear time:
l = [0,1,2,1,3,4,2,2]
dict1 = {}
for j,i in enumerate(l): # O(n)
temp = dict1.get(i) # O(1) most cases
if not temp:
dict1[i] = [j]
else:
dict1[i].append(j) # O(1)
print([item for item in dict1.values() if len(item) > 1]) # O(n)
Output:
[[1, 3], [2, 6, 7]]
This is essentially a time-complexity issue. Your algorithm has nested for loops that iterate through the list twice, so the time complexity is of the order of n^2, where n is the size of the list. So when you multiply the size of the list by 10 (from 1,000 to 10,000), you see an approximate time increase of 10^2 = 100. This is why it goes from 0.14 s to 12 s.
Here is a simple solution with no extra libraries required:
def get_locations(labels):
locations = {}
for index, label in enumerate(labels):
if label in locations:
locations[label].append(index)
else:
locations[label] = [index]
return [locations[i] for i in locations if len(locations[i]) > 1]
Since the for loops are not nested, the time complexity is approximately 2n, so you should see about a 4-times increase in time whenever the problem size is doubled.
you can try using "Counter" function from "collections" module
from collections import Counter
list1 = [1,1,2,3,4,4,4]
Counter(list1)
you will get an output similar to this
Counter({4: 3, 1: 2, 2: 1, 3: 1})

Find the permutations that sums to the three smallest numbers

I asked the same thing yesterday but was finding a hard time finding the right sentence to describe my problem, so I deleted it. But here it is again.
Let us say that we have 3 lists:
list1 = [1, 2]
list2 = [2, 3]
list3 = [1]
Let us say I want to find the 3 permutations of these list, which when added together, it results in the smallest number possible. So here, the permutations that we want would be:
1,2,1
2,2,1
1,3,1
Because the sum of the numbers on each permutation creates the smallest numbers possible.
2,3,1
Will not be a part of the solution since the sum is larger than the other three, thus, not a part of the three smallest.
Of course, using itertools and list all the permutations, and add the numbers on each permutation would be the most obvious solution, but I was wondering if there is a more efficient algorithm for this? Considering It should be able to take 1000 lists.
NOTE: If the number of list is N, then i would need to find N permutations. Thus, if there are 3 lists, I find the 3 smallest permutations.
PRECONDITIONS:
-A part of the precondition is that all of these lists are sorted.
-The number of elements on all list is 2N-1, to deal with the case where only one list have more than 1 element.
-All of the lists are sorted from smallest.
Since the lists are sorted, the smallest element in each list is the first one, the sum of which gives us the "minimal sum permutation". Picking any element except from the first one is going to increase the sum value.
We start off by calculating the difference between element i and the first one for each list. For example, for the lists [1, 3, 4, 8] and [3, 9, 12, 15], these differences would be [2, 3, 7] and [6, 9, 12] respectively. We keep them separate in cost_lists, because they will be needed later on. But in cost_global, we pool them all together and by sorting them in ascending order, we find a solution where for all lists but one we choose the minimal value. To keep track which element from which list will give us the next minimum sum, we group the difference values with both the index of the list it comes from and which element in that list it is.
However, this is not a complete approach. It is possible, for example, that taking the next value from two lists incurs a smaller cost than taking the next value from one list. So, we have to search for the product of the combinations for k = 2, 3, ..., N. Doing that normally would result to N**N complexity, but we can take some really good shortcuts.
From the partial solution above, we have a list of the minimal costs in order. Since we want only the first N minimal sums, we check what the cost value of the Nth permutation is (threshold). So, when we search for a group of two next values, we can safely ignore their sum if it exceeds our current threshold. And since the difference values within lists are in ascending order, once we cross the threshold, we can instantly exit the loop. Similarly, if we haven't found any new combinations within the threshold for k = 2, it is pointless to look for k > 2. Considering that most likely the smallest sum costs will be the result of a single nonminimal value, or a few small ones (unless most lists have massive differences between sequential values), we are bound to exit these loops rather quickly. The code I came up to achieve this is fairly ugly, but it effectively does the same as
for k in xrange(2, len(lists)):
for comb in itertools.combinations(cost_lists, k):
for group in itertools.product(*comb):
if sum(g[0] for g in group) <= threshold:
cost_global.append(group)
except that we exit the loops as soon as we guarantee not to find any results, lest we pointlessly shift through an innumerable number of combinations/products which are over the threshold.
def filter_cost(cost_lists, threshold):
cost = [[i for i in ilist if i[0] <= threshold] for ilist in cost_lists]
# the algorithm requires that we remove any lists that have become empty
return [ilist for ilist in cost if ilist]
def _combi(cost_lists, k, start, depth, subtotal, threshold):
if depth == k:
for i in xrange(start, len(cost_lists)):
for value in cost_lists[i]:
if value[0] + subtotal > threshold:
break
yield (value,)
else:
for i in xrange(start, len(cost_lists)):
for value in cost_lists[i]:
if value[0] + subtotal > threshold:
break
for c in _combi(cost_lists, k, i+1, depth+1,
value[0]+subtotal, threshold):
yield (value,) + c
def combinations_product(cost_lists, k, threshold):
for i in xrange(len(cost_lists)-k+1):
for value in cost_lists[i]:
if value[0] > threshold:
break
for comb in _combi(cost_lists, k, i+1, 2, value[0], threshold):
temp = (value,) + comb
cost, ilists, ith_items = zip(*temp)
yield sum(cost), ilists, ith_items
def find_smallest_sum_permutations(lists):
minima = [min(x) for x in lists]
cost_local = []
cost_global = []
for i, ilist in enumerate(lists):
if len(ilist) > 1:
first = ilist[0]
diff = [(num-first, i, j) for j, num in enumerate(ilist[1:], 1)]
cost_local.append(diff)
cost_global.extend(diff)
cost_global.sort()
threshold_index = len(lists) - 2
cost_threshold = cost_global[threshold_index][0]
cost_local = filter_cost(cost_local, cost_threshold)
for k in xrange(2, len(lists)):
group_combinations = tuple(combinations_product(cost_local, k,
cost_threshold))
if group_combinations:
cost_global.extend(group_combinations)
cost_global.sort()
cost_threshold = cost_global[threshold_index][0]
cost_local = filter_cost(cost_local, cost_threshold)
else:
break
permutations = [minima]
for k in xrange(N-1):
_, ilist, ith_item = cost_global[k]
if type(ilist) == int:
permutation = [minima[i]
if i != ilist else lists[ilist][ith_item]
for i in xrange(N)]
else:
# multiple nonminimal values combination
mapping = dict(zip(ilist, ith_item))
permutation = [minima[i]
if i not in mapping else lists[i][mapping[i]]
for i in xrange(N)]
permutations.append(permutation)
return permutations
Examples
Example in the question.
>>> lists = [
[1, 2],
[2, 3],
[1],
]
>>> for p in find_smallest_sum_permutations(lists):
... print p, sum(p)
[1, 2, 1] 4
[2, 2, 1] 5
[1, 3, 1] 5
Example I had generated with random lists.
>>> import random
>>> N = 5
>>> random.seed(1024)
>>> lists = [sorted(random.sample(range(10*N), 2*N-1)) for _ in xrange(N)]
>>> for p in find_smallest_sum_permutations(lists):
... print p, sum(p)
[4, 4, 1, 6, 0] 15
[4, 6, 1, 6, 0] 17
[4, 4, 3, 6, 0] 17
[4, 4, 1, 6, 4] 19
[4, 6, 3, 6, 0] 19
Example by user2357112 which had caught a glaring error in my previous iteration.
>>> lists = [
[1, 2, 30, 40],
[1, 2, 30, 40],
[10, 20, 30, 40],
[10, 20, 30, 40],
]
>>> for p in find_smallest_sum_permutations(lists):
... print p, sum(p)
[1, 1, 10, 10] 22
[2, 1, 10, 10] 23
[1, 2, 10, 10] 23
[2, 2, 10, 10] 24
The trick is to only generate the combinations that might possibly be needed, and store them in a heap. Each one that you pull out is the smallest one you have not yet seen. And the fact that THAT combination has been pulled out tells you that there are new ones which might also be small.
See https://docs.python.org/2/library/heapq.html for how to use a heap. We also need code for generating combinations. And with that, here is working code for getting the first n combinations for any list of lists:
import heapq
# Helper class for storing combinations.
class ListSelector:
def __init__(self, lists, indexes):
self.lists = lists
self.indexes = indexes
def value(self):
answer = 0
for i in range(0, len(self.lists)):
answer = answer + self.lists[i][self.indexes[i]]
return answer
def values(self):
return [self.lists[i][self.indexes[i]] for i in range(0, len(self.lists))]
# These are the next combinations. We are willing to increment any
# leading 0, or the first non-zero value. This will provide one and
# only one path to each possible combination.
def next_selectors(self):
lists = self.lists
indexes = self.indexes
selectors = []
for i in range(0, len(lists)):
if len(lists[i]) <= indexes[i] + 1:
if 0 == indexes[i]:
continue
else:
break
new_indexes = [
indexes[j] + (0 if j != i else 1)
for j in range(0, len(lists))]
selectors.append(ListSelector(lists, new_indexes))
if 0 < indexes[i]:
break
return selectors
# This will just return an iterator over all combinations, from smallest
# to largest. It does NOT generate them until needed.
def combinations(lists):
sel = ListSelector(lists, [0 for _ in range(len(lists))])
upcoming = [(sel.value(), sel)]
while len(upcoming):
value, sel = heapq.heappop(upcoming)
yield sel
for next_sel in sel.next_selectors():
heapq.heappush(upcoming, (next_sel.value(), next_sel))
# This just gets the first n of them. (It will return less if less.)
def smallest_n_combinations(n, lists):
i = 0
for sel in combinations(lists):
yield sel
i = i + 1
if i == n:
break
# Example usage
lists = [
[1, 2, 5],
[2, 3, 4],
[1]]
for sel in smallest_n_combinations(3, lists):
print(sel.value(), sel.values(), sel.indexes)
(This could be made more efficient for a long list of lists with tricks like caching the value inside of ListSelector and calculating it incrementally for new ones.)

Python Itertools questions [duplicate]

I've got a the following "bars and stars" algorithm, implemented in Python, which prints out all decomposition of a sum into 3 bins, for sums going from 0 to 5.
I'd like to generalise my code so it works with N bins (where N less than the max sum i.e 5 here).
The pattern is if you have 3 bins you need 2 nested loops, if you have N bins you need N-1 nested loops.
Can someone think of a generic way of writing this, possibly not using loops?
# bars and stars algorithm
N=5
for n in range(0,N):
x=[1]*n
for i in range(0,(len(x)+1)):
for j in range(i,(len(x)+1)):
print sum(x[0:i]), sum(x[i:j]), sum(x[j:len(x)])
If this isn't simply a learning exercise, then it's not necessary for you to roll your own algorithm to generate the partitions: Python's standard library already has most of what you need, in the form of the itertools.combinations function.
From Theorem 2 on the Wikipedia page you linked to, there are n+k-1 choose k-1 ways of partitioning n items into k bins, and the proof of that theorem gives an explicit correspondence between the combinations and the partitions. So all we need is (1) a way to generate those combinations, and (2) code to translate each combination to the corresponding partition. The itertools.combinations function already provides the first ingredient. For the second, each combination gives the positions of the dividers; the differences between successive divider positions (minus one) give the partition sizes. Here's the code:
import itertools
def partitions(n, k):
for c in itertools.combinations(range(n+k-1), k-1):
yield [b-a-1 for a, b in zip((-1,)+c, c+(n+k-1,))]
# Example usage
for p in partitions(5, 3):
print(p)
And here's the output from running the above code.
[0, 0, 5]
[0, 1, 4]
[0, 2, 3]
[0, 3, 2]
[0, 4, 1]
[0, 5, 0]
[1, 0, 4]
[1, 1, 3]
[1, 2, 2]
[1, 3, 1]
[1, 4, 0]
[2, 0, 3]
[2, 1, 2]
[2, 2, 1]
[2, 3, 0]
[3, 0, 2]
[3, 1, 1]
[3, 2, 0]
[4, 0, 1]
[4, 1, 0]
[5, 0, 0]
Another recursive variant, using a generator function, i.e. instead of right away printing the results, it yields them one after another, to be printed by the caller.
The way to convert your loops into a recursive algorithm is as follows:
identify the "base case": when there are no more bars, just print the stars
for any number of stars in the first segment, recursively determine the possible partitions of the rest, and combine them
You can also turn this into an algorithm to partition arbitrary sequences into chunks:
def partition(seq, n, min_size=0):
if n == 0:
yield [seq]
else:
for i in range(min_size, len(seq) - min_size * n + 1):
for res in partition(seq[i:], n-1, min_size):
yield [seq[:i]] + res
Example usage:
for res in partition("*****", 2):
print "|".join(res)
Take it one step at a time.
First, remove the sum() calls. We don't need them:
N=5
for n in range(0,N):
x=[1]*n
for i in range(0,(n+1)): # len(x) == n
for j in range(i,(n+1)):
print i, j - i, n - j
Notice that x is an unused variable:
N=5
for n in range(0,N):
for i in range(0,(n+1)):
for j in range(i,(n+1)):
print i, j - i, n - j
Time to generalize. The above algorithm is correct for N stars and three bars, so we just need to generalize the bars.
Do this recursively. For the base case, we have either zero bars or zero stars, which are both trivial. For the recursive case, run through all the possible positions of the leftmost bar and recurse in each case:
from __future__ import print_function
def bars_and_stars(bars=3, stars=5, _prefix=''):
if stars == 0:
print(_prefix + ', '.join('0'*(bars+1)))
return
if bars == 0:
print(_prefix + str(stars))
return
for i in range(stars+1):
bars_and_stars(bars-1, stars-i, '{}{}, '.format(_prefix, i))
For bonus points, we could change range() to xrange(), but that will just give you trouble when you port to Python 3.
This can be solved recursively in the following approach:
#n bins, k stars,
def F(n,k):
#n bins, k stars, list holds how many elements in current assignment
def aux(n,k,list):
if n == 0: #stop clause
print list
elif n==1: #making sure all stars are distributed
list[0] = k
aux(0,0,list)
else: #"regular" recursion:
for i in range(k+1):
#the last bin has i stars, set them and recurse
list[n-1] = i
aux(n-1,k-i,list)
aux(n,k,[0]*n)
The idea is to "guess" how many stars are in the last bin, assign them, and recurse to a smaller problem with less stars (as much that were assigned) and one less bin.
Note: It is easy to replace the line
print list
with any output format you desire when the number of stars in each bin is set.
Here is a nonrecursive algorithm that replicates the "bars and stars" nested loop approach. This assumes the bars all start on the right, and finish on the left (bins going from [x,0,0,...] to [0,0,..,x]). There will always be a zero in the first bin when a loop finishes, so you can follow the logic and match it to "bars and stars."
def combos(nbins, qty):
bins = [0]*nbins
bins[0] = qty #starting bin quantities
while True:
yield bins
if bins[-1] == qty:
return #last combo, we're done!
#leftmost bar movement (inner loop)
if bins[0] > 0:
bins[0] -= 1
bins[1] += 1
else:
#bump next bar in nested loops
#i.e., find first nonzero entry, and split it
nz = 1
while bins[nz] == 0:
nz +=1
bins[0]=bins[nz]-1
bins[nz+1] += 1
bins[nz] = 0
Here is the result of 4 bins, quantity 3:
for m in combos(4, 3):
print(m)
[3, 0, 0, 0]
[2, 1, 0, 0]
[1, 2, 0, 0]
[0, 3, 0, 0]
[2, 0, 1, 0]
[1, 1, 1, 0]
[0, 2, 1, 0]
[1, 0, 2, 0]
[0, 1, 2, 0]
[0, 0, 3, 0]
[2, 0, 0, 1]
[1, 1, 0, 1]
[0, 2, 0, 1]
[1, 0, 1, 1]
[0, 1, 1, 1]
[0, 0, 2, 1]
[1, 0, 0, 2]
[0, 1, 0, 2]
[0, 0, 1, 2]
[0, 0, 0, 3]
I needed to solve the same problem and found this post, but I really wanted a non-recursive general-purpose algorithm that didn't rely on itertools and couldn't find one, so came up with this.
By default, the generator produces the sequence in either lexical order (as the earlier recursive example) but can also produce the reverse-order sequence by setting the "reversed" flag.
def StarsAndBars(bins, stars, reversed=False):
if bins < 1 or stars < 1:
raise ValueError("Number of bins and objects must both be greater than or equal to 1.")
if bins == 1:
yield stars,
return
bars = [ ([0] * bins + [ stars ], 1) ]
if reversed:
while len(bars)>0:
b = bars.pop()
if b[1] == bins:
yield tuple(b[0][y] - b[0][y-1] for y in range(1, bins+1))
else:
bar = b[0][:b[1]]
for x in range(b[0][b[1]], stars+1):
newBar = bar + [ x ] * (bins - b[1]) + [ stars ]
bars.append( (newBar, b[1]+1) )
bars = [ ([0] * bins + [ stars ], 1) ]
else:
while len(bars)>0:
newBars = []
for b in bars:
for x in range(b[0][-2], stars+1):
newBar = b[0][1:bins] + [ x, stars ]
if b[1] < bins-1 and x > 0:
newBars.append( (newBar, b[1]+1) )
yield tuple(newBar[y] - newBar[y-1] for y in range(1, bins+1))
bars = newBars
This problem can also be solved somewhat less verbosely than the previous answers with a list comprehension:
from numpy import array as ar
from itertools import product
number_of_stars = M
number_of_bins = N
decompositions = ar([ar(i) for i in product(range(M+1), repeat=N) if sum(i)==M])
Here the itertools.product() produces a list containing the Cartesian product of the list range(M+1) with itself, where the product has been applied (repeats=)N times. The if statement removes the combinations where the number don't add up to the number of stars, for example one of the combinations is of 0 with 0 with 0 or [0,0,0].
If we're happy with a list of lists then we can simply remove the np.array()'s (just ar for brevity in the example). Here's an example output for 3 stars in 3 bins:
array([[0, 0, 3],
[0, 1, 2],
[0, 2, 1],
[0, 3, 0],
[1, 0, 2],
[1, 1, 1],
[1, 2, 0],
[2, 0, 1],
[2, 1, 0],
[3, 0, 0]])
I hope this answer helps!
Since I found the code in most answers quite hard to follow i.e. asking myself how the shown algorithms relate to the actual problem of stars and bars let's do this step by step:
First we define a function to insert a bar | into a string stars at a given position p:
def insert_bar(stars, p):
head, tail = stars[:p], stars[p:]
return head + '|' + tail
Usage:
insert_bar('***', 1) # returns '*|**'
To insert multiple bars at different positions e.g. (1,3) a simple way is to use reduce (from functools)
reduce(insert_bar, (1,3), '***') # returns '*|*|*'
If we branch the definition of insert_bar to handle both cases we get a nice and reusable function to insert any number of bars into a string of stars
def insert_bars(stars, p):
if type(p) is int:
head, tail = stars[:p], stars[p:]
return head + '|' + tail
else:
return reduce(insert_bar, p, stars)
As #Mark Dickinson explaind in his answer itertools.combinations lets us produce the n+k-1 choose k-1 combinations of bar positions.
What is now left to do is to create a string of '*' of length n, insert the bars at the given positions, split the string at the bars and calculate the length of each resulting bin. The implementation below is thus literally a verbatim translation of the problem statement into code
def partitions(n, k):
for positions in itertools.combinations(range(n+k-1), k-1):
yield [len(bin) for bin in insert_bars(n*"*", positions).split('|')]
anyone looking for the specific case of k=2 can save ALOT of time by simply creating a range and stacking it with the reverse. Comparing versus accepted answer.
n = 500000
%timeit np.array([[i,j] for i,j in partitions(n,2)])
>>> 396 ms ± 13.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
rng = np.arange(n+1)
np.vstack([rng, rng[::-1]]).T
>>> 2.91 ms ± 190 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
And they are indeed equivalent.
it2k = np.array([[i,j] for i,j in partitions(n,2)])
rng = np.arange(n+1)
np2k = np.vstack([rng, rng[::-1]]).T
(np2k == it2k).all()
>>> True

General bars and stars

I've got a the following "bars and stars" algorithm, implemented in Python, which prints out all decomposition of a sum into 3 bins, for sums going from 0 to 5.
I'd like to generalise my code so it works with N bins (where N less than the max sum i.e 5 here).
The pattern is if you have 3 bins you need 2 nested loops, if you have N bins you need N-1 nested loops.
Can someone think of a generic way of writing this, possibly not using loops?
# bars and stars algorithm
N=5
for n in range(0,N):
x=[1]*n
for i in range(0,(len(x)+1)):
for j in range(i,(len(x)+1)):
print sum(x[0:i]), sum(x[i:j]), sum(x[j:len(x)])
If this isn't simply a learning exercise, then it's not necessary for you to roll your own algorithm to generate the partitions: Python's standard library already has most of what you need, in the form of the itertools.combinations function.
From Theorem 2 on the Wikipedia page you linked to, there are n+k-1 choose k-1 ways of partitioning n items into k bins, and the proof of that theorem gives an explicit correspondence between the combinations and the partitions. So all we need is (1) a way to generate those combinations, and (2) code to translate each combination to the corresponding partition. The itertools.combinations function already provides the first ingredient. For the second, each combination gives the positions of the dividers; the differences between successive divider positions (minus one) give the partition sizes. Here's the code:
import itertools
def partitions(n, k):
for c in itertools.combinations(range(n+k-1), k-1):
yield [b-a-1 for a, b in zip((-1,)+c, c+(n+k-1,))]
# Example usage
for p in partitions(5, 3):
print(p)
And here's the output from running the above code.
[0, 0, 5]
[0, 1, 4]
[0, 2, 3]
[0, 3, 2]
[0, 4, 1]
[0, 5, 0]
[1, 0, 4]
[1, 1, 3]
[1, 2, 2]
[1, 3, 1]
[1, 4, 0]
[2, 0, 3]
[2, 1, 2]
[2, 2, 1]
[2, 3, 0]
[3, 0, 2]
[3, 1, 1]
[3, 2, 0]
[4, 0, 1]
[4, 1, 0]
[5, 0, 0]
Another recursive variant, using a generator function, i.e. instead of right away printing the results, it yields them one after another, to be printed by the caller.
The way to convert your loops into a recursive algorithm is as follows:
identify the "base case": when there are no more bars, just print the stars
for any number of stars in the first segment, recursively determine the possible partitions of the rest, and combine them
You can also turn this into an algorithm to partition arbitrary sequences into chunks:
def partition(seq, n, min_size=0):
if n == 0:
yield [seq]
else:
for i in range(min_size, len(seq) - min_size * n + 1):
for res in partition(seq[i:], n-1, min_size):
yield [seq[:i]] + res
Example usage:
for res in partition("*****", 2):
print "|".join(res)
Take it one step at a time.
First, remove the sum() calls. We don't need them:
N=5
for n in range(0,N):
x=[1]*n
for i in range(0,(n+1)): # len(x) == n
for j in range(i,(n+1)):
print i, j - i, n - j
Notice that x is an unused variable:
N=5
for n in range(0,N):
for i in range(0,(n+1)):
for j in range(i,(n+1)):
print i, j - i, n - j
Time to generalize. The above algorithm is correct for N stars and three bars, so we just need to generalize the bars.
Do this recursively. For the base case, we have either zero bars or zero stars, which are both trivial. For the recursive case, run through all the possible positions of the leftmost bar and recurse in each case:
from __future__ import print_function
def bars_and_stars(bars=3, stars=5, _prefix=''):
if stars == 0:
print(_prefix + ', '.join('0'*(bars+1)))
return
if bars == 0:
print(_prefix + str(stars))
return
for i in range(stars+1):
bars_and_stars(bars-1, stars-i, '{}{}, '.format(_prefix, i))
For bonus points, we could change range() to xrange(), but that will just give you trouble when you port to Python 3.
This can be solved recursively in the following approach:
#n bins, k stars,
def F(n,k):
#n bins, k stars, list holds how many elements in current assignment
def aux(n,k,list):
if n == 0: #stop clause
print list
elif n==1: #making sure all stars are distributed
list[0] = k
aux(0,0,list)
else: #"regular" recursion:
for i in range(k+1):
#the last bin has i stars, set them and recurse
list[n-1] = i
aux(n-1,k-i,list)
aux(n,k,[0]*n)
The idea is to "guess" how many stars are in the last bin, assign them, and recurse to a smaller problem with less stars (as much that were assigned) and one less bin.
Note: It is easy to replace the line
print list
with any output format you desire when the number of stars in each bin is set.
Here is a nonrecursive algorithm that replicates the "bars and stars" nested loop approach. This assumes the bars all start on the right, and finish on the left (bins going from [x,0,0,...] to [0,0,..,x]). There will always be a zero in the first bin when a loop finishes, so you can follow the logic and match it to "bars and stars."
def combos(nbins, qty):
bins = [0]*nbins
bins[0] = qty #starting bin quantities
while True:
yield bins
if bins[-1] == qty:
return #last combo, we're done!
#leftmost bar movement (inner loop)
if bins[0] > 0:
bins[0] -= 1
bins[1] += 1
else:
#bump next bar in nested loops
#i.e., find first nonzero entry, and split it
nz = 1
while bins[nz] == 0:
nz +=1
bins[0]=bins[nz]-1
bins[nz+1] += 1
bins[nz] = 0
Here is the result of 4 bins, quantity 3:
for m in combos(4, 3):
print(m)
[3, 0, 0, 0]
[2, 1, 0, 0]
[1, 2, 0, 0]
[0, 3, 0, 0]
[2, 0, 1, 0]
[1, 1, 1, 0]
[0, 2, 1, 0]
[1, 0, 2, 0]
[0, 1, 2, 0]
[0, 0, 3, 0]
[2, 0, 0, 1]
[1, 1, 0, 1]
[0, 2, 0, 1]
[1, 0, 1, 1]
[0, 1, 1, 1]
[0, 0, 2, 1]
[1, 0, 0, 2]
[0, 1, 0, 2]
[0, 0, 1, 2]
[0, 0, 0, 3]
I needed to solve the same problem and found this post, but I really wanted a non-recursive general-purpose algorithm that didn't rely on itertools and couldn't find one, so came up with this.
By default, the generator produces the sequence in either lexical order (as the earlier recursive example) but can also produce the reverse-order sequence by setting the "reversed" flag.
def StarsAndBars(bins, stars, reversed=False):
if bins < 1 or stars < 1:
raise ValueError("Number of bins and objects must both be greater than or equal to 1.")
if bins == 1:
yield stars,
return
bars = [ ([0] * bins + [ stars ], 1) ]
if reversed:
while len(bars)>0:
b = bars.pop()
if b[1] == bins:
yield tuple(b[0][y] - b[0][y-1] for y in range(1, bins+1))
else:
bar = b[0][:b[1]]
for x in range(b[0][b[1]], stars+1):
newBar = bar + [ x ] * (bins - b[1]) + [ stars ]
bars.append( (newBar, b[1]+1) )
bars = [ ([0] * bins + [ stars ], 1) ]
else:
while len(bars)>0:
newBars = []
for b in bars:
for x in range(b[0][-2], stars+1):
newBar = b[0][1:bins] + [ x, stars ]
if b[1] < bins-1 and x > 0:
newBars.append( (newBar, b[1]+1) )
yield tuple(newBar[y] - newBar[y-1] for y in range(1, bins+1))
bars = newBars
This problem can also be solved somewhat less verbosely than the previous answers with a list comprehension:
from numpy import array as ar
from itertools import product
number_of_stars = M
number_of_bins = N
decompositions = ar([ar(i) for i in product(range(M+1), repeat=N) if sum(i)==M])
Here the itertools.product() produces a list containing the Cartesian product of the list range(M+1) with itself, where the product has been applied (repeats=)N times. The if statement removes the combinations where the number don't add up to the number of stars, for example one of the combinations is of 0 with 0 with 0 or [0,0,0].
If we're happy with a list of lists then we can simply remove the np.array()'s (just ar for brevity in the example). Here's an example output for 3 stars in 3 bins:
array([[0, 0, 3],
[0, 1, 2],
[0, 2, 1],
[0, 3, 0],
[1, 0, 2],
[1, 1, 1],
[1, 2, 0],
[2, 0, 1],
[2, 1, 0],
[3, 0, 0]])
I hope this answer helps!
Since I found the code in most answers quite hard to follow i.e. asking myself how the shown algorithms relate to the actual problem of stars and bars let's do this step by step:
First we define a function to insert a bar | into a string stars at a given position p:
def insert_bar(stars, p):
head, tail = stars[:p], stars[p:]
return head + '|' + tail
Usage:
insert_bar('***', 1) # returns '*|**'
To insert multiple bars at different positions e.g. (1,3) a simple way is to use reduce (from functools)
reduce(insert_bar, (1,3), '***') # returns '*|*|*'
If we branch the definition of insert_bar to handle both cases we get a nice and reusable function to insert any number of bars into a string of stars
def insert_bars(stars, p):
if type(p) is int:
head, tail = stars[:p], stars[p:]
return head + '|' + tail
else:
return reduce(insert_bar, p, stars)
As #Mark Dickinson explaind in his answer itertools.combinations lets us produce the n+k-1 choose k-1 combinations of bar positions.
What is now left to do is to create a string of '*' of length n, insert the bars at the given positions, split the string at the bars and calculate the length of each resulting bin. The implementation below is thus literally a verbatim translation of the problem statement into code
def partitions(n, k):
for positions in itertools.combinations(range(n+k-1), k-1):
yield [len(bin) for bin in insert_bars(n*"*", positions).split('|')]
anyone looking for the specific case of k=2 can save ALOT of time by simply creating a range and stacking it with the reverse. Comparing versus accepted answer.
n = 500000
%timeit np.array([[i,j] for i,j in partitions(n,2)])
>>> 396 ms ± 13.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
rng = np.arange(n+1)
np.vstack([rng, rng[::-1]]).T
>>> 2.91 ms ± 190 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
And they are indeed equivalent.
it2k = np.array([[i,j] for i,j in partitions(n,2)])
rng = np.arange(n+1)
np2k = np.vstack([rng, rng[::-1]]).T
(np2k == it2k).all()
>>> True

Find all possible subsets that sum up to a given number

I'm learning Python and I have a problem with this seems to be simple task.
I want to find all possible combination of numbers that sum up to a given number.
for example: 4 -> [1,1,1,1] [1,1,2] [2,2] [1,3]
I pick the solution which generate all possible subsets (2^n) and then yield just those that sum is equal to the number. I have a problem with the condition. Code:
def allSum(number):
#mask = [0] * number
for i in xrange(2**number):
subSet = []
for j in xrange(number):
#if :
subSet.append(j)
if sum(subSet) == number:
yield subSet
for i in allSum(4):
print i
BTW is it a good approach?
Here's some code I saw a few years ago that does the trick:
>>> def partitions(n):
if n:
for subpart in partitions(n-1):
yield [1] + subpart
if subpart and (len(subpart) < 2 or subpart[1] > subpart[0]):
yield [subpart[0] + 1] + subpart[1:]
else:
yield []
>>> print list(partitions(4))
[[1, 1, 1, 1], [1, 1, 2], [2, 2], [1, 3], [4]]
Additional References:
http://mathworld.wolfram.com/Partition.html
http://en.wikipedia.org/wiki/Partition_(number_theory)
http://www.site.uottawa.ca/~ivan/F49-int-part.pdf
Here is an alternate approach which works by taking a list of all 1s and recursively collapsing it by adding subsequent elements, this should be more efficient than generating all possible subsets:
def allSum(number):
def _collapse(lst):
yield lst
while len(lst) > 1:
lst = lst[:-2] + [lst[-2] + lst[-1]]
for prefix in _collapse(lst[:-1]):
if not prefix or prefix[-1] <= lst[-1]:
yield prefix + [lst[-1]]
return list(_collapse([1] * number))
>>> allSum(4)
[[1, 1, 1, 1], [1, 1, 2], [2, 2], [1, 3], [4]]
>>> allSum(5)
[[1, 1, 1, 1, 1], [1, 1, 1, 2], [1, 2, 2], [1, 1, 3], [2, 3], [1, 4], [5]]
You can strip off the last value if you don't want the trivial case. If you will just be looping over the results remove the list call and just return the generator.
This is equivalent to the problem described in this question and can use a similar solution.
To elaborate:
def allSum(number):
for solution in possibilites(range(1, number+1), number):
expanded = []
for value, qty in zip(range(1, number+1), solution):
expanded.extend([value]*qty)
yield expanded
That translates this question into that question and back again.
That solution doesn't work, right? It will never add a number to a subset more than once, so you will never get, for example, [1,1,2]. It will never skip a number, either, so you will never get, for example, [1,3].
So the problem with your solution is twofold: One, you are not actually generating all possible subsets in the range 1..number. Two, The set of all subsets will exclude things that you should be including, because it will not allow a number to appear more than once.
This kind of problem can be generalized as a search problem. Imagine that the numbers you want to try are nodes on a tree, and then you can use depth-first search to find all paths through the tree that represent a solution. It's an infinitely large tree, but luckily, you never need to search all of it.

Categories

Resources