Generating random numbers to obtain a fixed sum(python) [duplicate] - python

This question already has answers here:
Generate random numbers summing to a predefined value
(7 answers)
Closed 4 years ago.
I have the following list:
Sum=[54,1536,36,14,9,360]
I need to generate 4 other lists, where each list will consist of 6 random numbers starting from 0, and the numbers will add upto the values in sum. For eg;
l1=[a,b,c,d,e,f] where a+b+c+d+e+f=54
l2=[g,h,i,j,k,l] where g+h+i+j+k+l=1536
and so on upto l6. And I need to do this in python. Can it be done?

Generating a list of random numbers that sum to a certain integer is a very difficult task. Keeping track of the remaining quantity and generating items sequentially with the remaining available quantity results in a non-uniform distribution, where the first numbers in the series are generally much larger than the others. On top of that, the last one will always be different from zero because the previous items in the list will never sum up to the desired total (random generators usually use open intervals in the maximum). Shuffling the list after generation might help a bit but won't generally give good results either.
A solution could be to generate random numbers and then normalize the result, eventually rounding it if you need them to be integers.
import numpy as np
totals = np.array([54,1536,36,14]) # don't use Sum because sum is a reserved keyword and it's confusing
a = np.random.random((6, 4)) # create random numbers
a = a/np.sum(a, axis=0) * totals # force them to sum to totals
# Ignore the following if you don't need integers
a = np.round(a) # transform them into integers
remainings = totals - np.sum(a, axis=0) # check if there are corrections to be done
for j, r in enumerate(remainings): # implement the correction
step = 1 if r > 0 else -1
while r != 0:
i = np.random.randint(6)
if a[i,j] + step >= 0:
a[i, j] += step
r -= step
Each column of a represents one of the lists you want.
Hope this helps.

This might not be the most efficient way but it will work
totals = [54, 1536, 36, 14]
nums = []
x = np.random.randint(0, i, size=(6,))
for i in totals:
while sum(x) != i: x = np.random.randint(0, i, size=(6,))
nums.append(x)
print(nums)
[array([ 3, 19, 21, 11, 0, 0]), array([111, 155, 224, 511, 457,
78]), array([ 8, 5, 4, 12, 2, 5]), array([3, 1, 3, 2, 1, 4])]
This is a way more efficient way to do this
totals = [54,1536,36,14,9,360, 0]
nums = []
for i in totals:
if i == 0:
nums.append([0 for i in range(6)])
continue
total = i
temp = []
for i in range(5):
val = np.random.randint(0, total)
temp.append(val)
total -= val
temp.append(total)
nums.append(temp)
print(nums)
[[22, 4, 16, 0, 2, 10], [775, 49, 255, 112, 185, 160], [2, 10, 18, 2,
0, 4], [10, 2, 1, 0, 0, 1], [8, 0, 0, 0, 0, 1], [330, 26, 1, 0, 2, 1],
[0, 0, 0, 0, 0, 0]]

Related

How to get index of multiple, possibly different, elements in numpy?

I have a numpy array with many rows in it that look roughly as follows:
0, 50, 50, 2, 50, 1, 50, 99, 50, 50
50, 2, 1, 50, 50, 50, 98, 50, 50, 50
0, 50, 50, 98, 50, 1, 50, 50, 50, 50
0, 50, 50, 50, 50, 99, 50, 50, 2, 50
2, 50, 50, 0, 98, 1, 50, 50, 50, 50
I am given a variable n<50. Each row, of length 10, has the following in it:
Every number from 0 to n, with one possibly missing. In the example above, n=2.
Possibly a 98, which will be in the place of the missing number, if there is a number missing.
Possibly a 99, which will be in the place of the missing number, if there is a number missing, and there is not already a 98.
Many 50's.
What I want to get is an array with all the indices of the 0s in the first row, all the indices of the 1s in the second row, all the indices of the 2s in the third row, etc. For the above example, my desired output is this:
0, 6, 0, 0, 3
5, 2, 5, 5, 5
3, 1, 3, 8, 0
You may have noticed the catch: sometimes, exactly one of the numbers is replaced either by a 98, or a 99. It's pretty easy to write a for loop which determines which number, if any, was replaced, and uses that to get the array of indices.
Is there a way to do this with numpy?
The follwing numpy solution rather aggressively uses the assumptions listed in OP. If they are not 100% guaranteed some more checks may be in order.
The mildly clever bit (even if I say so myself) here is to use the data array itself for finding the right destinations of their indices. For example, all the 2's need their indices stored in row 2 of the output array. Using this we can bulk store most of the indices in a single operation.
Example input is in array data:
n = 2
y,x = data.shape
out = np.empty((y,n+1),int)
# find 98 falling back to 99 if necessary
# and fill output array with their indices
# if neither exists some nonsense will be written but that does no harm
# most of this will be overwritten later
out.T[...] = ((data-98)&127).argmin(axis=1)
# find n+1 lowest values in each row
idx = data.argpartition(n,axis=1)[:,:n+1]
# construct auxiliary indexer
yr = np.arange(y)[:,None]
# put indices of low values where they belong
out[yr,data[yr,idx[:,:-1]]] = idx[:,:-1]
# ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
# the clever bit
# rows with no missing number still need the last value
nomiss, = (data[range(y),idx[:,n]] == n).nonzero()
out[nomiss,n] = idx[nomiss,n]
# admire
print(out.T)
outputs:
[[0 6 0 0 3]
[5 2 5 5 5]
[3 1 3 8 0]]
I don't think you're getting away without a for-loop here. But here's how you could go about it.
For each number in n, find all of the locations where it is known. Example:
locations = np.argwhere(data == 1)
print(locations)
[[0 5]
[1 2]
[2 5]
[4 5]]
You can then turn this into a map for easy lookup per number in n:
known = {
i: dict(np.argwhere(data == i))
for i in range(n + 1)
}
pprint(known)
{0: {0: 0, 2: 0, 3: 0, 4: 3},
1: {0: 5, 1: 2, 2: 5, 4: 5},
2: {0: 3, 1: 1, 3: 8, 4: 0}}
Do the same for the unknown numbers:
unknown = dict(np.argwhere((data == 98) | (data == 99)))
pprint(unknown)
{0: 7, 1: 6, 2: 3, 3: 5, 4: 4}
And now for each location in the result, you can lookup the index in the known list and fallback to the unknown.
result = np.array(
[
[known[i].get(j, unknown.get(j)) for j in range(len(data))]
for i in range(n + 1)
]
)
print(result)
[[0 6 0 0 3]
[5 2 5 5 5]
[3 1 3 8 0]]
Bonus: Getting fancy with dictionary constructor and unpacking:
from collections import OrderedDict
unknown = np.argwhere((data == 98) | (data == 99))
results = np.array([
[*OrderedDict((*unknown, *np.argwhere(data == i))).values()]
for i in range(n + 1)
])
print(results)

Clustering a list with nearest values without sorting

I have a list like this
tst = [1,3,4,6,8,22,24,25,26,67,68,70,72,0,0,0,0,0,0,0,4,5,6,36,38,36,31]
I want to group the elements from above list into separate groups/lists based on the difference between the consecutive elements in the list (differing by 1 or 2 or 3).
I have tried following code
def slice_when(predicate, iterable):
i, x, size = 0, 0, len(iterable)
while i < size-1:
if predicate(iterable[i], iterable[i+1]):
yield iterable[x:i+1]
x = i + 1
i += 1
yield iterable[x:size]
tst = [1,3,4,6,8,22,24,25,26,67,68,70,72,0,0,0,0,0,0,0,4,5,6,36,38,36,31]
slices = slice_when(lambda x,y: (y - x > 2), tst)
whola=(list(slices))
I got this results
[[1, 3, 4, 6, 8], [22, 24, 25, 26], [67, 68, 70, 72, 0, 0, 0, 0, 0, 0, 0], [4, 5, 6], [36, 38, 36, 31]]
In 3rd list it doesn't separate the sequence of zeros into another list. Any kind of help highly appreciate. Thank you
I guess this is what you want?
tst = [1,3,4,6,8,22,24,25,26,67,68,70,72,0,0,0,0,0,0,0,4,5,6,36,38,36,31]
slices = slice_when(lambda x,y: (abs(y - x) > 2), tst) # Use abs!
whola=(list(slices))
print(whola)

Map an element in a multi-dimension array to its index

I am using the function get_tuples(length, total) from here
to generate an array of all tuples of given length and sum, an example and the function are shown below. After I have created the array I need to find a way to return the indices of a given number of elements in the array. I was able to do that using .index() by changing the array to a list, as shown below. However, this solution or another solution that is also based on searching (for example using np.where) takes a lot of time to find the indices. Since all elements in the array (array s in the example) are different, I was wondering if we can construct a one-to-one mapping, i.e., a function such that given the element in the array it returns the index of the element by doing some addition and multiplication on the values of this element. Any ideas if that is possible? Thanks!
import numpy as np
def get_tuples(length, total):
if length == 1:
yield (total,)
return
for i in range(total + 1):
for t in get_tuples(length - 1, total - i):
yield (i,) + t
#example
s = np.array(list(get_tuples(4, 20)))
# array s
In [1]: s
Out[1]:
array([[ 0, 0, 0, 20],
[ 0, 0, 1, 19],
[ 0, 0, 2, 18],
...,
[19, 0, 1, 0],
[19, 1, 0, 0],
[20, 0, 0, 0]])
#example of element to find the index for. (Note in reality this is 1000+ elements)
elements_to_find =np.array([[ 0, 0, 0, 20],
[ 0, 0, 7, 13],
[ 0, 5, 5, 10],
[ 0, 0, 5, 15],
[ 0, 2, 4, 14]])
#change array to list
s_list = s.tolist()
#find the indices
indx=[s_list.index(i) for i in elements_to_find.tolist()]
#output
In [2]: indx
Out[2]: [0, 7, 100, 5, 45]
Here is a formula that calculates the index based on the tuple alone, i.e. it needn't see the full array. To compute the index of an N-tuple it needs to evaluate N-1 binomial coefficients. The following implementation is (part-) vectorized, it accepts ND-arrays but the tuples must be in the last dimension.
import numpy as np
from scipy.special import comb
# unfortunately, comb with option exact=True is not vectorized
def bc(N,k):
return np.round(comb(N,k)).astype(int)
def get_idx(s):
N = s.shape[-1] - 1
R = np.arange(1,N)
ps = s[...,::-1].cumsum(-1)
B = bc(ps[...,1:-1]+R,1+R)
return bc(ps[...,-1]+N,N) - ps[...,0] - 1 - B.sum(-1)
# OP's generator
def get_tuples(length, total):
if length == 1:
yield (total,)
return
for i in range(total + 1):
for t in get_tuples(length - 1, total - i):
yield (i,) + t
#example
s = np.array(list(get_tuples(4, 20)))
# compute each index
r = get_idx(s)
# expected: 0,1,2,3,...
assert (r == np.arange(len(r))).all()
print("all ok")
#example of element to find the index for. (Note in reality this is 1000+ elements)
elements_to_find =np.array([[ 0, 0, 0, 20],
[ 0, 0, 7, 13],
[ 0, 5, 5, 10],
[ 0, 0, 5, 15],
[ 0, 2, 4, 14]])
print(get_idx(elements_to_find))
Sample run:
all ok
[ 0 7 100 5 45]
How to derive formula:
Use stars and bars to express the full partition count #part(N,k) (N is total, k is length) as a single binomial coefficient (N + k - 1) choose (k - 1).
Count back-to-front: It is not hard to verify that after the i-th full iteration of the outer loop of OP's generator exactly #part(N-i,k) have not yet been enumerated. Indeed, what's left are all partitions p1+p2+... = N with p1>=i; we can write p1=q1+i such that q1+p2+... = N-i and this latter partition is constraint-free so we can use 1. to count.
You can use binary search to make the search a lot faster.
Binary search makes the search O(log(n)) rather than O(n) (using Index)
We do not need to sort the tuples since they are already sorted by the generator
import bisect
def get_tuples(length, total):
" Generates tuples "
if length == 1:
yield (total,)
return
yield from ((i,) + t for i in range(total + 1) for t in get_tuples(length - 1, total - i))
def find_indexes(x, indexes):
if len(indexes) > 100:
# Faster to generate all indexes when we have a large
# number to check
d = dict(zip(x, range(len(x))))
return [d[tuple(i)] for i in indexes]
else:
return [bisect.bisect_left(x, tuple(i)) for i in indexes]
# Generate tuples (in this case 4, 20)
x = list(get_tuples(4, 20))
# Tuples are generated in sorted order [(0,0,0,20), ...(20,0,0,0)]
# which allows binary search to be used
indexes = [[ 0, 0, 0, 20],
[ 0, 0, 7, 13],
[ 0, 5, 5, 10],
[ 0, 0, 5, 15],
[ 0, 2, 4, 14]]
y = find_indexes(x, indexes)
print('Found indexes:', *y)
print('Indexes & Tuples:')
for i in y:
print(i, x[i])
Output
Found indexes: 0 7 100 5 45
Indexes & Tuples:
0 (0, 0, 0, 20)
7 (0, 0, 7, 13)
100 (0, 5, 5, 10)
5 (0, 0, 5, 15)
45 (0, 2, 4, 14)
Performance
Scenario 1--Tuples already computed and we just want to find the index of certain tuples
For instance x = list(get_tuples(4, 20)) has already been perform.
Search for
indexes = [[ 0, 0, 0, 20],
[ 0, 0, 7, 13],
[ 0, 5, 5, 10],
[ 0, 0, 5, 15],
[ 0, 2, 4, 14]]
Binary Search
%timeit find_indexes(x, indexes)
100000 loops, best of 3: 11.2 µs per loop
Calculates the index based on the tuple alone (courtesy #PaulPanzer approach)
%timeit get_idx(indexes)
10000 loops, best of 3: 92.7 µs per loop
In this scenario, binary search is ~8x faster when tuples have already been pre-computed.
Scenario 2--the tuples have not been pre-computed.
%%timeit
import bisect
def find_indexes(x, t):
" finds the index of each tuple in list t (assumes x is sorted) "
return [bisect.bisect_left(x, tuple(i)) for i in t]
# Generate tuples (in this case 4, 20)
x = list(get_tuples(4, 20))
indexes = [[ 0, 0, 0, 20],
[ 0, 0, 7, 13],
[ 0, 5, 5, 10],
[ 0, 0, 5, 15],
[ 0, 2, 4, 14]]
y = find_indexes(x, indexes)
100 loops, best of 3: 2.69 ms per loop
#PaulPanzer approach is the same timing in this scenario (92.97 us)
=> #PaulPanzer approach ~29 times faster when the tuples don't have to be computed
Scenario 3--Large number of indexes (#PJORR)
A large number of random indexes is generated
x = list(get_tuples(4, 20))
xnp = np.array(x)
indices = xnp[np.random.randint(0,len(xnp), 2000)]
indexes = indices.tolist()
%timeit find_indexes(x, indexes)
#Result: 1000 loops, best of 3: 1.1 ms per loop
%timeit get_idx(indices)
#Result: 1000 loops, best of 3: 716 µs per loop
In this case, we are #PaulPanzer is 53% faster

Eliminating Consecutive Numbers

If you have a range of numbers from 1-49 with 6 numbers to choose from, there are nearly 14 million combinations. Using my current script, I currently have only 7.2 million combinations remaining. Of the 7.2 million remaining combinations, I want to eliminate all 3, 4, 5, 6, dual, and triple consecutive numbers.
Example:
3 consecutive: 1, 2, 3, x, x, x
4 consecutive: 3, 4, 5, 6, x, x
5 consecutive: 4, 5, 6, 7, 8, x
6 consecutive: 5, 6, 7, 8, 9, 10
double separate consecutive: 1, 2, 5, 6, 14, 18
triple separate consecutive: 1, 2, 9, 10, 22, 23
Note: combinations such as 1, 2, 12, 13, 14, 15 must also be eliminated or else they conflict with the rule that double and triple consecutive combinations to be eliminated.
I'm looking to find how many combinations of the 7.2 million remaining combinations have zero consecutive numbers (all mixed) and only 1 consecutive pair.
Thank you!
import functools
_MIN_SUM = 120
_MAX_SUM = 180
_MIN_NUM = 1
_MAX_NUM = 49
_NUM_CHOICES = 6
_MIN_ODDS = 2
_MAX_ODDS = 4
#functools.lru_cache(maxsize=None)
def f(n, l, s = 0, odds = 0):
if s > _MAX_SUM or odds > _MAX_ODDS:
return 0
if n == 0 :
return int(s >= _MIN_SUM and odds >= _MIN_ODDS)
return sum(f(n-1, i+1, s+i, odds + i % 2) for i in range(l, _MAX_NUM+1))
result = f(_NUM_CHOICES, _MIN_NUM)
print('Number of choices = {}'.format(result))
While my answer should work, I think someone might be able to offer a faster solution.
Consider the following code:
not_allowed = []
for x in range(48):
not_allowed.append([x, x+1, x+2])
# not_allowed = [ [0,1,2], [1,2,3], ... [11,12,13], ... [47,48,49] ]
my_numbers = [[1, 2, 5, 9, 11, 33], [1, 3, 7, 8, 9, 31], [12, 13, 14, 15, 23, 43]]
for x in my_numbers:
for y in not_allowed:
if set(y) <= set(x): # if [1,2,3] is a subset of [1,2,5,9,11,33], etc.
# drop x
This code will remove all instances that contain double consecutive numbers, which is all you really need to check for, because triple, quadruple, etc. all imply double consecutive. Try implementing this and let me know how it works.
The easiest approach is probably to generate and filter. I used numpy to try to vectorize as much of this as I could:
import numpy as np
from itertools import combinations
combos = np.array(list(combinations(range(1, 50), 6))) # build all combos
# combos is shape (13983816, 6)
filt = np.where(np.bincount(np.where(np.abs(
np.subtract(combos[:, :-1], combos[:, 1:])) == 1)[0]) <= 1)[0] # magic!
filtered = combos[filt]
# filtered is shape (12489092, 6)
Breaking down that "magic" line
First we subtract the first five items in the list from the last five items to get the differences between them. We do this for the entire set of combinations in one shot with np.subtract(combos[:, :-1], combos[:, 1:]). Note that itertools.combinations produces sorted combinations, on which this depends.
Next we take the absolute value of these differences to make sure we only look at positive distances between numbers with np.abs(...).
Next we grab the indicies from this operation for the entire dataset that indicate a difference of 1 (consecutive numbers) with np.where(... == 1)[0]. Note that np.where returns a tuple where the first item are all of the rows, and the second item are all of the corresponding columns for our condition. This is important because any row value that shows up more than once tells us that we have more than one consecutive number in that row!
So we count how many times each row shows up in our results with np.bincount(...), which will return something like [5, 4, 4, 4, 3, 2, 1, 0] indicating how many consecutive pairs are in each row of our combinations dataset.
Finally we grab only the row numbers where there are 0 or 1 consecutive values with np.where(... <= 1)[0].
I am returning way more combinations than you seem to indicate, but I feel fairly confident that this is working. By all means, poke holes in it in the comments and I will see if I can find fixes!
Bonus, because it's all vectorized, it's super fast!

Finding consecutive duplicates and listing their indexes of where they occur in python

I have a list in python for example:
mylist = [1,1,1,1,1,1,1,1,1,1,1,
0,0,1,1,1,1,0,0,0,0,0,
1,1,1,1,1,1,1,1,0,0,0,0,0,0]
my goal is to find where there are five or more zeros in a row and then list the indexes of where this happens, for example the output for this would be:
[17,21][30,35]
here is what i have tried/seen in other questions asked on here:
def zero_runs(a):
# Create an array that is 1 where a is 0, and pad each end with an extra 0.
iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
absdiff = np.abs(np.diff(iszero))
# Runs start and end where absdiff is 1.
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
return ranges
runs = zero_runs(list)
this gives output:
[0,10]
[11,12]
...
which is basically just listing indexes of all duplicates, how would i go about separating this data into what i need
You could use itertools.groupby, it will identify the contiguous groups in the list:
from itertools import groupby
lst = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]
groups = [(k, sum(1 for _ in g)) for k, g in groupby(lst)]
cursor = 0
result = []
for k, l in groups:
if not k and l >= 5:
result.append([cursor, cursor + l - 1])
cursor += l
print(result)
Output
[[17, 21], [30, 35]]
Your current attempt is very close. It returns all of the runs of consecutive zeros in an array, so all you need to accomplish is adding a check to filter runs of less than 5 consecutive zeros out.
def threshold_zero_runs(a, threshold):
iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
absdiff = np.abs(np.diff(iszero))
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
m = (np.diff(ranges, 1) >= threshold).ravel()
return ranges[m]
array([[17, 22],
[30, 36]], dtype=int64)
Use the shift operator on the array. Compare the shifted version with the original. Where they do not match, you have a transition. You then need only to identify adjacent transitions that are at least 5 positions apart.
Can you take it from there?
Another way using itertools.groupby and enumerate.
First find the zeros and the indices:
from operator import itemgetter
from itertools import groupby
zerosList = [
list(map(itemgetter(0), g))
for i, g in groupby(enumerate(mylist), key=itemgetter(1))
if not i
]
print(zerosList)
#[[11, 12], [17, 18, 19, 20, 21], [30, 31, 32, 33, 34, 35]]
Now just filter zerosList:
runs = [[x[0], x[-1]] for x in zerosList if len(x) >= 5]
print(runs)
#[[17, 21], [30, 35]]

Categories

Resources