Modified cumulative sum of numbers in a list - python

I want to create new list according cumulative sums of numbers in a list. Input is ideal - can be splitting to subset, sum of each subset is equal. Length of subset is not equal. Number of subset is input.
Each subset of output represents increment integers [0,1,2,3,...], which replace original input. Quantity of integers is number of subsets.
Example:
number of subsets = 2
input = [1, 4, 5]
#cumsum = [1, 5, 10]
subsets = [1,5], [10]
output-subsets = [0,0], [1]
output = [0, 0, 1]
Example1:
number of subsets = 4
input = [1, 2, 3, 4, 2, 5, 1, 6]
#cumsum = [1, 3, 6, 10, 12, 17, 18, 24]
subsets = [1,3,6], [10, 12],[17, 18], [24]
output-subsets = [0, 0, 0], [1, 1], [2, 2], [3]
output = [0, 0, 0, 1, 1, 2, 2, 3]
number of subsets = 2
input = [1, 2, 3, 4, 2, 5, 1, 6]
#cumsum = [1, 3, 6, 10, 12, 17, 18, 24]
subsets = [1, 3, 6, 10, 12],[17, 18, 24]
output-subsets = [0, 0, 0, 0, 0], [1, 1, 1]
output = [0, 0, 0, 0, 0, 1, 1, 1]
I try modified SO question:
def changelist(lis, t):
total = 0
s = sum(lis)
subset = s/t
for x in lis:
total += x
i= 1
if(total <= subset):
i = 0
yield i
#changelist([input array], number of subset)
print list(changelist([1, 2, 3, 4, 2, 5, 1, 6], 4))
but only first subset is correct:
output = [0, 0, 0, 1, 1, 1, 1, 1]
I think numpy.array_split is problematic strange behaviour of numpy array_split.
I would really love any kind of explanation or help.

This should solve your problem:
def changelist (l, t):
subset = sum(l) / t
current, total = 0, 0
for x in l:
total += x
if total > subset:
current, total = current + 1, x
yield current
Examples:
>>> list(changelist([1, 4, 5], 2))
[0, 0, 1]
>>> list(changelist([1, 2, 3, 4, 2, 5, 1, 6], 4))
[0, 0, 0, 1, 1, 2, 2, 3]
>>> list(changelist([1, 2, 3, 4, 2, 5, 1, 6], 2))
[0, 0, 0, 0, 0, 1, 1, 1]
How does it work?
current stores the "id" of the current subset, total the sum of the current subset.
For each element x in your initial list l, you add its value to the current total, if this total is greater than the expected sum of each subset (subset in my code), then you know that you are in the next subset (current = current + 1) and you "reset" the total of the current subset to the actuel element (total = x).

You can use NumPy here after converting the input to an array for a vectorized solution, assuming N as the number of subsets, as listed here -
def modified_cumsum(input,N):
A = np.asarray(input).cumsum()
return np.append(False,np.in1d(A,(1+np.arange(N))*A[-1]/N))[:-1].cumsum()
Sample runs -
In [31]: N = 2 #number of subsets
...: input = [1, 4, 5]
...:
In [32]: modified_cumsum(input,N)
Out[32]: array([0, 0, 1])
In [33]: N = 4 #number of subsets
...: input = [1, 2, 3, 4, 2, 5, 1, 6]
...:
In [34]: modified_cumsum(input,N)
Out[34]: array([0, 0, 0, 1, 1, 2, 2, 3])
In [35]: N = 2 #number of subsets
...: input = [1, 2, 3, 4, 2, 5, 1, 6]
...:
In [36]: modified_cumsum(input,N)
Out[36]: array([0, 0, 0, 0, 0, 1, 1, 1])

Related

How to append for loop based on given index as parameter and rule?

I would like to apply this rule from two set of lists p and c with total n = 6.
Equation 1: a set of integer number with len(p) = 6 and index is in ascending order.
Equation 2: a set of integer number with len(c) = 6 and index is in ascending order.
Equation 3: the remainder of the index i divided by 2.
for example: i = 1 then q = 1 and i = 2 then q = 0. So, the set of q for n=6 is {1, 0, 1, 0, 1, 0}.
Equation 4: the order of the objective list (P) starts from index n, 1, 2,...,(n-1). In the case of n=6, then the index for P is [6,1,2,3,4,5]
Equation 5-8: to match the index of the objective list (P) with the index from p and c. Please note that equation 5 is not applicable in the case of n = 6.
Equation 9: A is a set of possible combination from P with maximal len(A) is odd. In the case of n=6 then the possible combination is len(A) = 3 and len(A) = 5.
Note that the index for equations 6 to 9 have been solved by creating a set of list storage_i and its respective equation 3 is q in the code.
Next, my challenge is to code the equation 6 to 9 to get the value from the respective p and c and index i in the storage_i.
Expected Output:
A = [[c_6, c_1, p_2], [c_6, c_1, p_3], ..., [c_3, p_4, c_5], [c_6, c_1, p_2, c_3, p_4], ...., [c_1, p_2, c_3, p_4, c_5]]
or
A = [[20,7,120], [20,7,3], ...,[80,50,9], [20,7,120,80,50], ...,[7,120,80,50,9]]
This is the parameters as explained:
import numpy as np
p = [100,120,3,50,200,90]
c = [7,100,80,220,9,20]
storage_i = [[6, 1, 2],
[6, 1, 3],
[6, 1, 4],
[6, 1, 5],
[6, 2, 3],
[6, 2, 4],
[6, 2, 5],
[6, 3, 4],
[6, 3, 5],
[6, 4, 5],
[1, 2, 3],
[1, 2, 4],
[1, 2, 5],
[1, 3, 4],
[1, 3, 5],
[1, 4, 5],
[2, 3, 4],
[2, 3, 5],
[2, 4, 5],
[3, 4, 5],
[6, 1, 2, 3, 4],
[6, 1, 2, 3, 5],
[6, 1, 2, 4, 5],
[6, 1, 3, 4, 5],
[6, 2, 3, 4, 5],
[1, 2, 3, 4, 5]]
q = [[0, 1, 0],
[0, 1, 1],
[0, 1, 0],
[0, 1, 1],
[0, 0, 1],
[0, 0, 0],
[0, 0, 1],
[0, 1, 0],
[0, 1, 1],
[0, 0, 1],
[1, 0, 1],
[1, 0, 0],
[1, 0, 1],
[1, 1, 0],
[1, 1, 1],
[1, 0, 1],
[0, 1, 0],
[0, 1, 1],
[0, 0, 1],
[1, 0, 1],
[0, 1, 0, 1, 0],
[0, 1, 0, 1, 1],
[0, 1, 0, 0, 1],
[0, 1, 1, 0, 1],
[0, 0, 1, 0, 1],
[1, 0, 1, 0, 1]]
For example (not python index):
storage_i[1] = [6, 1, 2] and q = [0, 1, 0], then:
A_cc = q[6]*p[6] + (1-q[6])*c[6] = 0*90 + (1-0)*20 = 20
A_c = q[1]*p[1] + (1-q[1])*c[1] = 1*100 + (1-1)*7 = 100
A_cc = q[2]*c[2] + (1-q[2])*p[2] = 0*100 + (1-0)*120 = 120
A = [A_cc,A_c,A_cc] = [20,100,120]
Another example:
p = [100,120,3,50,200,90]
c = [7,100,80,220,9,20]
storage_i = [1, 2, 3], and q = [1, 0, 1], then:
A_cc = q[1]*c[1] + (1-q[1])*p[1] = 1*7 + (1-1)*100 = 7
A_c = q[2]*p[2] + (1-q[2])*c[2] = 0*120 + (1-0)*100 = 100
A_cc = q[3]*c[3] + (1-q[3])*p[3] = 1*80 + (1-1)*3 = 80
A = [A_cc,A_c,A_cc] = [70,100,80]
Here I am not sure how to append the A_cc and A_c based on equation 6-9.
storage_var_R5 = []
for set_i in storage_i:
temp_var_R5 = []
for idx_i in set_i:
if idx_i ==n:
A_cc = q[idx_i]*p[idx_i] + (1-q[idx_i])*c[idx_i]
else:
A_c = q[idx_i]*p[idx_i] + (1-q[idx_i])*c[idx_i]
A_cc = q[idx_i]*c[idx_i] + (1-q[idx_i])*p[idx_i]
temp_var_R5.append()
Can anyone help me, please?
In case you need further clarification, please feel free to let me know.
Thank you in advance.

Splitting a sorted array of repeated elements

I have an array of repeated elements, where each repeated element represents a class. What i would like to do is obtain the indices of the repeated elements and partition in order of the nth first elements in 3 slices. For example:
np.array([0, 2, 2, 1, 0, 1, 2, 1, 0, 0])
split the first occurences in 3
[0, 2, 1] [2, 0, 1], [2, 1, 0, 0]
I would like to find the indices of the repeated elements and split the array in proportions of 3, where each sliced array will contain the first 3 repeated elements indices:
So for the array and it's splits, i'd like to obtain the following:
array[0, 2, 2, 1, 0, 1, 2, 1, 0, 0]
indices:[0, 1, 3], [2, 4, 5], [6, 7, 8, 9]
I've tried the following:
a = np.array([0, 2, 2, 1, 0, 1, 2, 1, 0, 0])
length = np.arange(len(a))
array_set = (([length[a ==unique] for unique in np.unique(a)]))
But i can't figure how to split the partitions in order of the first occurences like the above examples.
This is a way to split the array in proportions of 3, that is, the last 0 will be left out:
# unique values
uniques = np.unique(a)
# counting occurrence of each unique value
occ = np.cumsum(a == uniques[:,None], axis=1)
# maximum common occurrence
max_occ = occ.max(axis=1).min()
# masking the first occurrences
u = (occ[None,...] == (np.arange(max_occ)+1)[:,None, None])
# the indexes
idx = np.sort(np.argmax(u, axis=-1), axis=-1)
# the partitions
partitions = a[idx]
Output:
# idx
array([[0, 1, 3],
[2, 4, 5],
[6, 7, 8]])
# partitions
array([[0, 2, 1],
[2, 0, 1],
[2, 1, 0]])
This is a problem where np.concatenate(...) + some algorithm + np.split(...) does the trick, though they are slow methods.
Lets start from concatenation and referencing indexes where you split:
classes = [[0, 2, 1], [2, 0, 1], [2, 1, 0, 0]]
split_idx = np.cumsum(list(map(len, classes[:-1])))
flat_classes = np.concatenate(classes)
Then indexes that sorts an initial array and also indexes of starts of groups are needed. In this case sorted array is [0,0,0,0,1,1,1,2,2,2] and distinct groups start at 0, 4 and 7.
c = np.array([0, 2, 2, 1, 0, 1, 2, 1, 0, 0])
idx = np.argsort(c)
u, cnt = np.unique(c, return_counts=True)
marker_idx = np.r_[0, np.cumsum(cnt[:-1])]
Now this is a trickiest part. It is known that one of indexes 0, 4 or 7 changes in each step (while you iterate on flat_classes), so you can accumulate these changes in a special array called counter which has 3 columns for each index and after that access only these indexes where changes were met:
take = np.zeros((len(flat_classes), len(u)), dtype=int)
take[np.arange(len(flat_classes)), flat_classes] = 1
counter = np.cumsum(take, axis=0)
counter = counter + marker_idx - np.ones(len(u), dtype=int)
active_idx = counter[np.arange(len(flat_classes)), flat_classes]
splittable = idx[active_idx] #remember that we are working on indices that sorts array
output = np.split(splittable, split_idx)
Output
[array([0, 1, 3], dtype=int64),
array([2, 4, 5], dtype=int64),
array([6, 7, 8, 9], dtype=int64)]
Remark: the main idea of solution is to manipulate with changes of indexes of other indexes that sorts an array. This is example of changes for this problem:
>>> counter
array([[0, 3, 6],
[0, 3, 7],
[0, 4, 7],
[0, 4, 8],
[1, 4, 8],
[1, 5, 8],
[1, 5, 9],
[1, 6, 9],
[2, 6, 9],
[3, 6, 9]]

how to sort in array index to index

Hey everyone how can I sort array index to index.
So I have code here
a = [0, 1, 2, 3, 4, 4, 3, 2, 1, 0, 4, 3, 2, 1, 0, 0, 1, 2, 3, 4]
how can i sort to?
[0, 4, 1, 3, 2, 2, 3, 1, 4, 0, 4, 0, 3, 1, 2, 2, 1, 3, 0, 4]
this is my idea
I could be wrong, but it sounds like you would like to return a list that is sorted like this:
[first_item, last_item, second_item, second_to_last_item, third_item, third_to_last_item,...]
I don't know of a one-line way to do that, but here's one way you could do it:
import numpy as np
a = [0, 1, 2, 3, 7] # length of list is an odd number
# create indexes that are all positive
index_values = np.repeat(np.arange(0, len(a)//2 + 1), 2) # [0,0,1,1,.....]
# make every other one negative
index_values[::2] *= -1 #[-0, 0, -1, 1, ....]
# return a[i]
[a[i] for i in index_values[1:(len(a)+1)]]
### Output: [0, 7, 1, 3, 2]
It also works for lists with even length:
a = [0, 1, 2, 3, 7, 5] # list length is an even number
index_values = np.repeat(np.arange(0, len(a)//2 + 1), 2) # [0,0,1,1,.....]
index_values[::2] *= -1 #[-0, 0, -1, 1, ....]
[a[i] for i in index_values[1:(len(a)+1)]]
### Output: [0, 5, 1, 7, 2, 3]
Here’s an almost one liner (based on #Callin’s sort method) for those that want one and that can’t/don’t want to use pandas:
from itertools import zip_longest
def custom_sort(a):
half = len(a)//2
return [n for fl in zip_longest(a[:half], a[:half-1:-1]) for n in fl if n is not None])
Examples:
custom_sort([0, 1, 2, 3, 7])
#[0, 7, 1, 3, 2]
custom_sort([0, 1, 2, 3, 7, 5])
#[0, 5, 1, 7, 2, 3]
This can be done in one line, although you’d be repeating the math to find the halfway point
[n for x in zip_longest(a[:len(a)//2], a[:(len(a)//2)-1:-1]) for n in x if n is not None]
Sometimes we want to sort in place, that is without creating a new list. Here is what I came up with
l=[1,2,3,4,5,6,7]
for i in range(1, len(l), 2):
l.insert(i, l.pop())

How to find the minimum indices and break ties by the least used indices in Python?

I have a numpy array like the following:
A = np.array([[1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 5, 1, 1, 1],
[1, 1, 1, 1, 3, 3, 1, 1],
[1, 1, 1, 1, 1, 1, 2, 1],
[1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 4, 1, 1]])
I am looking for the minimum indices in each column. I found this using numpy.argmin as follows:
I = np.zeros(A.shape[1], dtype=np.int64)
for j in range(A.shape[1]):
I[j] = np.argmin(A[:, j])
This gives me I = [0, 0, 0, 0, 0, 0, 0, 0]. Since there are ties, I could obtain the following: I = [0, 1, 2, 3, 4, 0, 5, 1], where I break the ties by the index that is least used (from the previous indices).
In more details:
For j=0, we have np.argmin(A[:, 0]) in [0, 1, 2, 3, 4, 5] and, say, we choose np.argmin(A[:, 0]) = 0.
For j=1, we have np.argmin(A[:, 1]) in [0, 1, 2, 3, 4, 5] and we have to choose the minimum index from [1, 2, 3, 4, 5] since these indices are the least used (we already choose np.argmin(A[:, 0]) = 0 for j=0). Say, we choose np.argmin(A[:, 1])=1.
For j=2, we have np.argmin(A[:, 2]) in [0, 1, 2, 3, 4, 5] and we have to choose the minimum index from [2, 3, 4, 5] since these indices are the least used.
We continue in this way...
For j=5, we have np.argmin(A[:, 5]) in [0, 1, 3, 4] and we have to choose the minimum index from [0, 1, 3, 4] since these indices are the least used. Say we choose np.argmin(A[:, 5])=0.
For j=6, we have np.argmin(A[:, 6]) in [0, 1, 2, 4, 5] and we have to choose from [5] since these indices are the least used. We choose np.argmin(A[:, 6])=5.
For j=7, we have np.argmin(A[:, 7]) in [0, 1, 2, 3, 4, 5] and we have to choose from [1, 2, 3, 4, 5] since these indices are the least used. Say we choose np.argmin(A[:, 7])=1.
I hope it is clear. My question is how to find the minimum indices and break ties by the least used indices in Python?
You could use min combined with a dictionary for keeping the counts of each index:
import numpy as np
A = np.array([[1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 5, 1, 1, 1],
[1, 1, 1, 1, 3, 3, 1, 1],
[1, 1, 1, 1, 1, 1, 2, 1],
[1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 4, 1, 1]])
counts = {}
I = np.zeros(A.shape[1], dtype=np.int64)
for j in range(A.shape[1]):
_, _, i = min([(v, counts.get(i, 0), i) for i, v in enumerate(A[:, j])])
counts[i] = counts.get(i, 0) + 1
I[j] = i
print(I)
Output
[0 1 2 3 4 0 5 1]
The idea is to create the following key: (value, count of index, index), and then use the normal comparison of tuples, so if the values are equal the one with less counts of the corresponding index will be selected, if both counts are equal get the one with lower index will be selected.

devide int into lower whole ints

I have a random int in the range of 30-60 which I get using randint(30,60). Let's say it's 40. I want to split this number in exactly 7 random whole ints. So for instance [5,5,5,5,5,5,10] is a valid result. But there are many possible solutions, like this one as well [6,6,6,6,6,6,4] or [4,2,9,13,8,1,3] ...
I know there are many solutions but I am searching for a fast way to go through them. I am not trying to get every single solution but rather looking for a fast way to iterate over a lot of them in short time. One way to achieve it is to randomly pick a number (let's say in the range from 1-15) and save it to a list, then do a while loop until the sum is exactly 40. I tried that and it is not efficient at all. I think choosing a start value like [5,5,5,5,5,5,10] and altering the numbers in a precise way like "1st digit -2" and 3rd +2 to yield [3,5,7,5,5,5,10] would be a much faster solution. Does anyone know how to do that or has a good suggestion? Thanks. I prefer python 3.
A set of whole numbers that sum to a number n is called a partition of n; if order matters then it's called a composition.
Here's a reasonably fast way to produce random compositions.
import random
def random_partition(n, size):
seq = []
while size > 1:
x = random.randint(1, 1 + n - size)
seq.append(x)
n -= x
size -= 1
seq.append(n)
return seq
n = 40
for _ in range(20):
print(random_partition(n, 7))
typical output
[26, 2, 8, 1, 1, 1, 1]
[30, 2, 1, 3, 1, 1, 2]
[26, 5, 3, 1, 2, 2, 1]
[2, 25, 9, 1, 1, 1, 1]
[28, 2, 2, 2, 1, 2, 3]
[23, 1, 9, 3, 2, 1, 1]
[3, 26, 1, 7, 1, 1, 1]
[25, 1, 7, 1, 2, 1, 3]
[10, 8, 11, 5, 3, 1, 2]
[19, 16, 1, 1, 1, 1, 1]
[12, 23, 1, 1, 1, 1, 1]
[1, 14, 15, 7, 1, 1, 1]
[29, 5, 1, 1, 2, 1, 1]
[25, 1, 3, 3, 1, 2, 5]
[10, 12, 10, 4, 1, 2, 1]
[13, 4, 6, 14, 1, 1, 1]
[31, 3, 1, 1, 1, 1, 2]
[16, 11, 9, 1, 1, 1, 1]
[3, 26, 5, 3, 1, 1, 1]
[31, 2, 1, 2, 2, 1, 1]
We use 1 + n - size as the upper limit because the other size - 1 numbers are at least 1.
Here's a fairly efficient way to generate all partitions of a given integer. Note that these are ordered; you could use random.shuffle if you want to produce random compositions from these partitions.
We first print all partitions of 16 of size 5, and then we count the number of partitions of 40 of size 7 (= 2738).
This code was derived from an algorithm by Jerome Kelleher.
def partitionR(num, size):
a = [0, num] + [0] * (num - 1)
size -= 1
k = 1
while k > 0:
x = a[k - 1] + 1
y = a[k] - 1
k -= 1
while x <= y and k < size:
a[k] = x
y -= x
k += 1
a[k] = x + y
if k == size:
yield a[:k + 1]
for u in partitionR(16, 5):
print(u)
print('- ' * 32)
print(sum(1 for _ in partitionR(40, 7)))
output
[1, 1, 1, 1, 12]
[1, 1, 1, 2, 11]
[1, 1, 1, 3, 10]
[1, 1, 1, 4, 9]
[1, 1, 1, 5, 8]
[1, 1, 1, 6, 7]
[1, 1, 2, 2, 10]
[1, 1, 2, 3, 9]
[1, 1, 2, 4, 8]
[1, 1, 2, 5, 7]
[1, 1, 2, 6, 6]
[1, 1, 3, 3, 8]
[1, 1, 3, 4, 7]
[1, 1, 3, 5, 6]
[1, 1, 4, 4, 6]
[1, 1, 4, 5, 5]
[1, 2, 2, 2, 9]
[1, 2, 2, 3, 8]
[1, 2, 2, 4, 7]
[1, 2, 2, 5, 6]
[1, 2, 3, 3, 7]
[1, 2, 3, 4, 6]
[1, 2, 3, 5, 5]
[1, 2, 4, 4, 5]
[1, 3, 3, 3, 6]
[1, 3, 3, 4, 5]
[1, 3, 4, 4, 4]
[2, 2, 2, 2, 8]
[2, 2, 2, 3, 7]
[2, 2, 2, 4, 6]
[2, 2, 2, 5, 5]
[2, 2, 3, 3, 6]
[2, 2, 3, 4, 5]
[2, 2, 4, 4, 4]
[2, 3, 3, 3, 5]
[2, 3, 3, 4, 4]
[3, 3, 3, 3, 4]
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
2738
If you only care about getting an arbitrary set of numbers that add up to your total rather than an exhaustive iteration over all combinations, the following should get you what you need.
def get_parts(total, num_parts=7, max_part=15):
running_total = 0
for i in range(num_parts - 1):
remaining_total = total - running_total
upper_limit = min(max_part, remaining_total - num_parts + 1 + i)
# need to make sure there will be enough left
lower_limit = max(1, remaining_total - max_part*(num_parts - i - 1))
part = randint(lower_limit, upper_limit)
running_total += part
yield part
yield total - running_total
>>> list(get_parts(40))
[2, 7, 10, 11, 1, 4, 5]
>>> list(get_parts(40))
[7, 13, 11, 6, 1, 1, 1]
>>> list(get_parts(50, 4))
[6, 14, 15, 15]
Of course, the items in each list above is not truly random and will favor larger numbers earlier in the list and smaller numbers later. You can feed these lists through random.shuffle() if you want more of an element of pseudorandomness.
From Python Integer Partitioning with given k partitions
def partitionfunc(n,k,l=1):
'''n is the integer to partition, k is the length of partitions, l is the min partition element size'''
if k < 1:
raise StopIteration
if k == 1:
if n >= l:
yield (n,)
raise StopIteration
for i in range(l,n//k+1):
for result in partitionfunc(n-i,k-1,i):
yield (i,)+result
list(partitionfunc(40,7))
You can do a simple iteration over all possible combinations of the first 6 values (where the sum does not exceed 40), and calculate the 7th value.
for a in range(41):
for b in range(41-a):
for c in range(41-(a+b)):
for d in range(41-(a+b+c)):
for e in range(41-(a+b+c+d)):
for f in range(41-(a+b+c+d+e)):
g = 40 - (a+b+c+d+e+f)
# Do what you need to do here
You can cut the amount of time required by the loop almost in half (according to tests using timeit) by precomputing the sums:
for a in range(41):
for b in range(41-a):
ab = a + b
for c in range(41-ab):
abc = ab + c
for d in range(41-abc):
abcd = abc + d
for e in range(41-abcd):
abcde = abcd + e
for f in range(41-abcde):
g = 40 - (abcde + f)
# Do what you need to do here

Categories

Resources