Problem Description:
I'm working on making a function which gives me a definition for a particular combination of several descriptors based on a single index. My inputs are a set of raw features X = [feat0,feat1,feat2,feat3,feat4], a list of powers to be used pow = [1,2,3], and a list of group sizes sizes = [1,3,5]. A valid output might look like the following:
feat0^2 * feat4^3 * feat1^1
This output is valid because feat0, feat4, and feat1 exist within X, their powers exist within pow, and the number of features being combined is in sizes.
Invalid edge cases include:
values which don't exist in X, powers not in pow, and combination sizes not in sizes
combinations that are identical to another are invalid: feat0^2 * feat1^3 and feat1^3 * feat0^2 are the same
combinations that include multiples of the same feature are invalid: feat0^1 * feat0^3 * feat2^2 is invalid
under the hood I'm encoding these groupings as lists of tuples. So feat0^2 * feat4^3 * feat1^1 would be represented as [(0,2), (4,3), (1,1)], where the first element in the tuple is the feature index, and the second is the power.
Question:
my question is, how can I create a 1 to 1 mapping of a particular combination to an index i? I would like to get the number of possible combinations, and be able to plug in an integer i to a function, and have that function generate a particular combination. Something like this:
X = [0.123, 0.111, 11, -5]
pow = [1,2,3]
sizes = [1,3]
#getting total number of combinations
numCombos = get_num_combos(X,pow,sizes)
#getting a random index corresponding to a grouping
i = random.randint(0, numCombos)
#getting grouping
grouping = generate_grouping(i, X, pow, sizes)
print(grouping)
Resulting in something like
[(0,1), (1,2), (3,1)]
So far, figuring out the generation when not accounting for the various edge cases wasn't too hard, but I'm at a loss for how to account for edge cases 2 and 3; making it guaranteed that no value of i is algebraically equivalent to any other value of i, and that the same feature does not appear multiple times in a grouping.
Current Progress
#computes the n choose k of a list and a size
def get_num_groupings(n, k):
return int(math.factorial(n)/(math.factorial(k)*math.factorial(n-k)))
import numpy as np
import bisect
i = 150
n = 5
m = 3
sizes = [1, 3, 5]
#computing the number of elements in each group length
numElements = [m**k * get_num_groupings(n, k) for k in sizes]
#index bins for each group size
bins = list(np.cumsum(numElements))[:-1]
#getting the current group size
binIdx = bisect.bisect_left(bins,i)
curSize = sizes[binIdx]
#adding idx 0 to bins
bins = [0]+bins
#getting the location of i in the bin
z = i - bins[binIdx]
#getting the product index and combination rank
pi = z // m**k
ci = z % m**k
#getting the indexes of the powers
pidx = [(pi // m**(curSize - (num+1)))%m for num in range(curSize)]
#getting the indexes of the features
#TODO cidx = unrank(i, range(n))
This is based on the Mad Physicist's answer. Though I haven't figured out how to get cidx yet. Some of the variable names are rewritten for my own understanding. To my knowledge this implimentation works by logically separating the combinations of variables and which powers they each have. So far, I can get the powers from an index i, and once unrank is ironed out I should be able to get the indexes for which features are used.
Let's look at a slightly different problem that's closely related to what to want: generate all the possible valid combinations.
If you choose a size and a power, finding all possible combinations of features is fairly straightforward:
from itertools import combinations, product
n = len(X)
m = len(powers)
k = size = ... # e.g. 3
pow = ... # e.g. [1, 2, 3]
The iterator of unique combinations of features is given by
def elements(X, size, pow):
for x in combinations(X, size):
yield sum(e**p for p, e in zip(pow, x))
The equivalent one-liner would he
(sum(e**p for p, e in zip(pow, x)) for x in combinations(X, size))
This generator has exactly n choose k unique elements. These elements meet all your conditions by definition.
Now you can loop over all possible sizes and product of powers to get all the options:
def all_features(X, sizes, powers):
for size in sizes:
for pow in product(powers, repeat=size):
for x in combinations(X, size):
yield sum(e**p for p, e in zip(pow, x))
The total number of elements is the sum for each k of m**k * n choose k.
Now that you've counted the possibilities, you can compute the mapping of element to index and vice versa, using a combinatorial number system. Sample ranking and unranking functions for combinations are shown here. You can use them after you adjust the index for the size and power bins.
To show what I mean, assume you have three functions (given in the linked answer):
choose(n, k) computes n choose k
rank(combo) accepts the ordered indices of a specific commination and returns the rank.
unrank(ind, k) accepts a rank and size, and returns the k indices of the corresponding combination.
You can then compute the offsets of each size group and the step for each power within that group. Let's work through your concrete example with n = 5, m = 3, and sizes = [1, 3, 5].
The number of elements for each size is given by
elements = [m**k * choose(n, k) for k in sizes]
The total number of possible arrangements is sum(elements):
3**1 * choose(5, 1) + 3**3 * choose(5, 3) + 3**5 * choose(5, 5) = 3 * 5 + 27 * 10 + 243 * 1 = 15 + 270 + 243 = 528
The cumulative sum is useful to convert between index and element:
cumsum = [0, 15, 285]
When you get an index, you can check which bin it falls in using bisect.
Let's say you were given index = 55. Since 15 < 55 < 285, your offset is 15, size = 3. Within the size = 3 group, you have an offset of z = 55 - 15 = 40.
Within the k = 3 group, there are m**k = 3**3 = 27 power products. The index of the product is pi = z // m**k and the combination rank is ci = z % m**k.
So the indices of the power are given by
pidx = [(pi // m**(k - 1)) % m, (pi // m**(k - 2)) % m, ...]
Similarly, the indices of the combination are given by
cidx = unrank(ci, k)
You can convert all these indices into a value using something like
sum(X[q]**powers[p] for p, q in zip(pidx, cidx))
Related
I need to get the statistical expected value of a n choose k drawing in a sorted array.
As an example, let's consider I want to choose 2 elements from the following sorted array
[1, 2, 3]
The set of all possible combinations is the following:
(1, 2)
(1, 3)
(2, 3)
So the expected value of the first element is (1 + 1 + 2) / 3 = 1.33, and the expected value of the second element is (2 + 3 + 3) = 2.67
Here is a function that works with a bruteforce approach for doing that, but it is too slow to be used on large arrays.
Is there a smarter/faster way?
import itertools
import math
def combinations_expected_value(arr, k):
sums = [0] * k
l = math.comb(len(arr), k)
for comb in itertools.combinations(arr, k):
for i in range(k):
sums[i] += comb[i]
return [sums[i] / l for i in range(k)]
Thank you!
For each position in the combination, the possible values are a subset of the list starting at the position and up to the last k-p-1 element. e.g. for combinations of 6 in 1..100, position 3 can only contain values 3..96
For each of the positon/value pairs, the number of occurrences will be the product of combinations of left side elements and combinations of right side elements.
For example, for combinations of 6 elements within a list of 1..100, the number of times 45 will appear at the third position is the combinations of 2 in 1..44 times the combinations of 3 in 46..100. So we will have C(44,2) * C(55,3) * 45 for that positon/value pair.
You can repeat this calculation for each positon/value pair to obtain a total for each position in the output combinations. Then divide these totals by the number of combinations to get the expected value:
from math import comb
def countComb(N,k):
result = [0]*k
for p in range(k): # p is count on the left
q = k-p-1 # q is count on the right
for i in range(p,len(N)-q):
left = comb(i,p) # combinations on the left >= 1
right = comb(len(N)-i-1,q) # combinations on the right >= 1
result[p] += left * right * N[i]
return result
def combProb(N,k):
Cnk = comb(len(N),k)
return [S/Cnk for S in countComb(N,k)]
Output:
print(countComb([1,2,3],2)) # [4, 8]
print(combProb([1,2,3],2)) # [1.3333333333333333, 2.6666666666666665]
print(countComb([1,2,3,4,5],3)) # [15, 30, 45]
print(combProb([1,2,3,4,5],3)) # [1.5, 3.0, 4.5]
# test with large number of combinations:
print(countComb(list(range(1,301)),7))
[1521500803497675, 3043001606995350, 4564502410493025,
6086003213990700, 7607504017488375, 9129004820986050,
10650505624483725]
print(combProb(list(range(1,301)),7))
[37.625, 75.25, 112.875, 150.5, 188.125, 225.75, 263.375]
You are given four arrays A, B, C, D each of size N.
Find maximum value (M) of given below expression
M = max(|A[i] - A[j]| + |B[i] - B[j]| + |C[i] - C[j]| + |D[i] - D[j]| + |i -j|)
Where 1 <= i < j <= N <br />
and here |x| refers to the absolute value of x.
Constraints
2 <= N <= 10^5
1 <= Ai,Bi,Ci,Di <= 10^9
Input: N,A,B,C,D
Output: M
Ex.-
Input-
5
5,7,6,3,9
7,9,2,7,5
1,9,9,3,3
8,4,1,10,5
Output-
24
Question picture
I have tried this way
def max_value(arr1,arr2,arr3,arr4, n):
res = 0;
# Iterating two for loop,
# one for i and another for j.
for i in range(n):
for j in range(n):
temp= abs(arr1[i] - arr1[j]) + abs(arr2[i] - arr2[j]) + abs(arr3[i] - arr3[j]) + abs(arr4[i] - arr4[j]) + abs(i - j)
if res>temp:
res = res
else:
res = temp
return res;
This is O(n^2).
But I want a better time complexity solution. This will not work for higher values of N.
Here is solution for single array
One can generalize the solution for a single array that you showed. Given a number K of arrays, including the array of indices, one can make 2**K possible combinations of arrays to get rid of the absolute values. It is then easy to just take the max and min of each of these combinations separately and compare them. This is order O(Kn*2^K), much better than the original O(Kn^2) for the values you report.
Here is a code that works on an arbitrary number of input arrays.
import numpy as np
def run(n, *args):
aux = np.arange(n)
K = len(args) + 1
rows = 2 ** K
x = np.zeros((rows, n))
for i in range(rows):
temp = 0
for m, a in enumerate(args):
temp += np.array(a) * ((-1) ** int(f"{i:0{K}b}"[-(1+m)]))
temp += aux * ((-1) ** int(f"{i:0{K}b}"[-K]))
x[i] = temp
x_max = np.max(x, axis=-1)
x_min = np.min(x, axis=-1)
res = np.max(x_max - x_min)
return res
The for loop maybe deserves more explanation: in order to make all possible combinations of absolute values, I assign each combination to an integer and rely on the binary representation of this integer to choose which ones of the K vectors must be taken negative.
Idea for faster solution
If you are only interested in the maximum of M you could search for the minimum and maximum value of A, B,C, D and i-j.Let's say i_Amax is the i index for the maximum of A.
Now you find the value of B[i_Amax], C[i_Amax].... and the same for i_Amin and calculate M with the differences of the max and min value.
You repeated the step before with the index for the maximum value of B, so i_Bmax and calculate M, you repeat until you gone through A,B,C,D and i-j
You now should have five terms and one of them should be the maximum
If you don't have a clear minimum or maximum you have to calculate the indeces for all the possible minimums and maximums.
I think it should find any maximum and is faster than n^2, especially for big n, but I have not implemented it myself, so you have to think it through to check whether I made a logical error and one can not find every maximum with that idea.
I hope that helps!
I have two numpy arrays a and b. I have a definition that construct an array c whose elements are all the possible sums of different elements of a.
import numpy as np
def Sumarray(a):
n = len(a)
sumarray = np.array([0]) # Add a default zero element
for k in range(2,n+1):
full = np.mgrid[k*(slice(n),)]
nd_triu_idx = full[:,(np.diff(full,axis=0)>0).all(axis=0)]
sumarray = np.append(sumarray, a[nd_triu_idx].sum(axis=0))
return sumarray
a = np.array([1,2,6,8])
c = Sumarray(a)
print(d)
I then perform a subsetsum between an element of c and b: isSubsetSum returns the elements of b that when summed gives c[1]. Let's say that I get
c[0] = b[2] + b[3]
Then I want to remove:
the elements b[2], b[3] (easy bit), and
the elements of a that when summed gave c[0]
As you can see from the definition, Sumarray, the order of sums of different elements of a are preserved, so I need to realise some mapping.
The function isSubsetSum is given by
def _isSubsetSum(numbers, n, x, indices):
if (x == 0):
return True
if (n == 0 and x != 0):
return False
# If last element is greater than x, then ignore it
if (numbers[n - 1] > x):
return _isSubsetSum(numbers, n - 1, x, indices)
# else, check if x can be obtained by any of the following
found = _isSubsetSum(numbers, n - 1, x, indices)
if found: return True
indices.insert(0, n - 1)
found = _isSubsetSum(numbers, n - 1, x - numbers[n - 1], indices)
if not found: indices.pop(0)
return found
def isSubsetSum(numbers, x):
indices = []
found = _isSubsetSum(numbers, len(numbers), x, indices)
return indices if found else None
As you are iterating over all possible numbers of terms, you could as well directly generate all possible subsets.
These can be conveniently encoded as numbers 0,1,2,... by means of their binary representations: O means no terms at all, 1 means only the first term, 2 means only the second, 3 means the first and the second and so on.
Using this scheme it becomes very easy to recover the terms from the sum index because all we need to do is obtain the binary representation:
UPDATE: we can suppress 1-term-sums with a small amount of extra code:
import numpy as np
def find_all_subsums(a,drop_singletons=False):
n = len(a)
assert n<=32 # this gives 4G subsets, and we have to cut somewhere
# compute the smallest integer type with enough bits
dt = f"<u{1<<((n-1)>>3).bit_length()}"
# the numbers 0 to 2^n encode all possible subsets of an n
# element set by means of their binary representation
# each bit corresponds to one element number k represents the
# subset consisting of all elements whose bit is set in k
rng = np.arange(1<<n,dtype=dt)
if drop_singletons:
# one element subsets correspond to powers of two
rng = np.delete(rng,1<<np.arange(n))
# np.unpackbits transforms bytes to their binary representation
# given the a bitvector b we can compute the corresponding subsum
# as b dot a, to do it in bulk we can mutliply the matrix of
# binary rows with a
return np.unpackbits(rng[...,None].view('u1'),
axis=1,count=n,bitorder='little') # a
def show_terms(a,idx,drop_singletons=False):
n = len(a)
if drop_singletons:
# we must undo the dropping of powers of two to get an index
# that is easy to translate. One can check that the following
# formula does the trick
idx += (idx+idx.bit_length()).bit_length()
# now we can simply use the binary representation
return a[np.unpackbits(np.asarray(idx,dtype='<u8')[None].view('u1'),
count=n,bitorder='little').view('?')]
example = np.logspace(1,7,7,base=3)
ss = find_all_subsums(example,True)
# check every single sum
for i,s in enumerate(ss):
assert show_terms(example,i,True).sum() == s
# print one example
idx = 77
print(ss[idx],"="," + ".join(show_terms(example.astype('U'),idx,True)))
Sample run:
2457.0 = 27.0 + 243.0 + 2187.0
I would very much like to generate n random integer numbers between two values (min, max) whose sum is equal to a given number m.
Note: I found similar questions in StackOverflow; however, they do not address exactly this problem (use of Dirichlet function and thus numbers between 0 and 1).
Example: I need 8 random numbers (integers) between 0 and 24 where the sum of the 8 generated numbers must be equal to 24.
Any help is appreciated. Thanks.
Well, you could use integer distribution which naturally sums to some fixed number - Multinomial one.
Just shift forth and back, and it should work automatically
Code
import numpy as np
def multiSum(n, p, maxv):
while True:
v = np.random.multinomial(n, p, size=1)
q = v[0]
a, = np.where(q > maxv) # are there any values above max
if len(a) == 0: # accept only samples below or equal to maxv
return q
N = 8
S = 24
p = np.full((N), 1.0/np.float64(N))
mean = S / N
start = 0
stop = 24
n = N*mean - N*start
h = np.zeros((stop-start), dtype=np.int64)
print(h)
for k in range(0, 10000):
ns = multiSum(n, p, stop-start) + start # result in [0...24]
#print(np.sum(ns))
for v in ns:
h[v-start] += 1
print(h)
this is a case of partition number theory . here is solution .
def partition(n,k,l, m):
if k < 1:
raise StopIteration
if k == 1:
if n <= m and n>=l :
yield (n,)
raise StopIteration
for i in range(l,m+1):
for result in partition(n-i,k-1,i,m):
yield result+(i,)
n = 24 # sum value
k = 8 # partition size
l = 0 # range min value
m = 24 # range high value
result = list(partition(n,k,l,m ))
this will give all the combinations that satisfy the conditions.
ps this is quite slow as this is giving all the cases for that partition size.
This is one possible solution which is based on this answer. it seems the dirichlet method is only functional for between 0 and 1. Full credit should be given to the original answer. I will be happy to delete it once you comment that it served your purpose.
Don't forget to upvote the original answer.
target = 24
x = np.random.randint(0, target, size=(8,))
while sum(x) != target:
x = np.random.randint(0, target, size=(8,))
print(x)
# [3 7 0 6 7 0 0 1]
I'm a stumped on how to speed up my algorithm which sums multiples in a given range. This is for a problem on codewars.com here is a link to the problem
codewars link
Here's the code and i'll explain what's going on in the bottom
import itertools
def solution(number):
return multiples(3, number) + multiples(5, number) - multiples(15, number)
def multiples(m, count):
l = 0
for i in itertools.count(m, m):
if i < count:
l += i
else:
break
return l
print solution(50000000) #takes 41.8 seconds
#one of the testers takes 50000000000000000000000000000000000000000 as input
# def multiples(m, count):
# l = 0
# for i in xrange(m,count ,m):
# l += i
# return l
so basically the problem ask the user return the sum of all the multiples of 3 and 5 within a number. Here are the testers.
test.assert_equals(solution(10), 23)
test.assert_equals(solution(20), 78)
test.assert_equals(solution(100), 2318)
test.assert_equals(solution(200), 9168)
test.assert_equals(solution(1000), 233168)
test.assert_equals(solution(10000), 23331668)
my program has no problem getting the right answer. The problem arises when the input is large. When pass in a number like 50000000 it takes over 40 seconds to return the answer. One of the inputs i'm asked to take is 50000000000000000000000000000000000000000, which a is huge number. That's also the reason why i'm using itertools.count() I tried using xrange in my first attempt but range can't handle numbers larger than a c type long. I know the slowest part the problem is the multiples method...yet it is still faster then my first attempt using list comprehension and checking whether i % 3 == 0 or i % 5 == 0, any ideas guys?
This solution should be faster for large numbers.
def solution(number):
number -= 1
a, b, c = number // 3, number // 5, number // 15
asum, bsum, csum = a*(a+1) // 2, b*(b+1) // 2, c*(c+1) // 2
return 3*asum + 5*bsum - 15*csum
Explanation:
Take any sequence from 1 to n:
1, 2, 3, 4, ..., n
And it's sum will always be given by the formula n(n+1)/2. This can be proven easily if you consider that the expression (1 + n) / 2 is just a shortcut for computing the average, or Arithmetic mean of this particular sequence of numbers. Because average(S) = sum(S) / length(S), if you take the average of any sequence of numbers and multiply it by the length of the sequence, you get the sum of the sequence.
If we're given a number n, and we want the sum of the multiples of some given k up to n, including n, we want to find the summation:
k + 2k + 3k + 4k + ... xk
where xk is the highest multiple of k that is less than or equal to n. Now notice that this summation can be factored into:
k(1 + 2 + 3 + 4 + ... + x)
We are given k already, so now all we need to find is x. If x is defined to be the highest number you can multiply k by to get a natural number less than or equal to n, then we can get the number x by using Python's integer division:
n // k == x
Once we find x, we can find the sum of the multiples of any given k up to a given n using previous formulas:
k(x(x+1)/2)
Our three given k's are 3, 5, and 15.
We find our x's in this line:
a, b, c = number // 3, number // 5, number // 15
Compute the summations of their multiples up to n in this line:
asum, bsum, csum = a*(a+1) // 2, b*(b+1) // 2, c*(c+1) // 2
And finally, multiply their summations by k in this line:
return 3*asum + 5*bsum - 15*csum
And we have our answer!