How to generate sequential subsets of integers? - python

I have the following start and end values:
start = 0
end = 54
I need to generate subsets of 4 sequential integers starting from start until end with a space of 20 between each subset. The result should be this one:
0, 1, 2, 3, 24, 25, 26, 27, 48, 49, 50, 51
In this example, we obtained 3 subsets:
0, 1, 2, 3
24, 25, 26, 27
48, 49, 50, 51
How can I do it using numpy or pandas?
If I do r = [i for i in range(0,54,4)], I get [0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52].

This should get you what you want:
j = 20
k = 4
result = [split for i in range(0,55, j+k) for split in range(i, k+i)]
print (result)
Output:
[0, 1, 2, 3, 24, 25, 26, 27, 48, 49, 50, 51]

Maybe something like this:
r = [j for i in range(0, 54, 24) for j in range(i, i + 4)]
print(r)
[0, 1, 2, 3, 24, 25, 26, 27, 48, 49, 50, 51]

you can use numpy.arange which returns an ndarray object containing evenly spaced values within a given range
import numpy as np
r = np.arange(0, 54, 4)
print(r)
Result
[0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52]

Numpy approach
You can use np.arange to generate number with a step value of 20 + 4, where 20 is for space between each interval and 4 for each sequential sub array.
start = 0
end = 54
out = np.arange(0, 54, 24) # array([ 0, 24, 48]) These are the starting points
# for each subarray
step = np.tile(np.arange(4), (len(out), 1))
# [[0 1 2 3]
# [0 1 2 3]
# [0 1 2 3]]
res = out[:, None] + step
# array([[ 0, 1, 2, 3],
# [24, 25, 26, 27],
# [48, 49, 50, 51]])

This can be done with plane python:
rangeStart = 0
rangeStop = 54
setLen = 4
step = 20
stepTot = step + setLen
a = list( list(i+s for s in range(setLen)) for i in range(rangeStart,rangeStop,stepTot))
In this case you will get the subsets as sublists in the array.

I dont think you need to use numpy or pandas to do what you want. I achieved it with a simple while loop
num = 0
end = 54
sequence = []
while num <= end:
sequence.append(num)
num += 1
if num%4 == 0: //If four numbers have been added
num += 20
//output: [0, 1, 2, 3, 24, 25, 26, 27, 48, 49, 50, 51]

Related

Creating a list with 3 values every 3 values

I'm having troubles writing this piece of code.
I need to create a list to only have 3 values every 3 values :
The expected output must be something like :
output1 = [1,2,3,7,8,9,13,14,15,....67,68,69]
output2 = [4,5,6,10,11,12...70,71,72]
Any ideas how can I reach that ?
Use two loops -- one for each group of three, and one for each item within that group. For example:
>>> [i*6 + j for i in range(12) for j in range(1, 4)]
[1, 2, 3, 7, 8, 9, 13, 14, 15, 19, 20, 21, 25, 26, 27, 31, 32, 33, 37, 38, 39, 43, 44, 45, 49, 50, 51, 55, 56, 57, 61, 62, 63, 67, 68, 69]
>>> [i*6 + j for i in range(12) for j in range(4, 7)]
[4, 5, 6, 10, 11, 12, 16, 17, 18, 22, 23, 24, 28, 29, 30, 34, 35, 36, 40, 41, 42, 46, 47, 48, 52, 53, 54, 58, 59, 60, 64, 65, 66, 70, 71, 72]
Suppose you want n values every n values of total sets starting with start. Just change the start and number of sets you need. In below example list start with 1, so first set [1,2,3] and we need 12 sets each containing 3 consecutive element
Method 1
n = 3
start = 1
total = 12
# 2*n*i + start is first element of every set of n tuples (Arithmetic progression)
print([j for i in range(total) for j in range(2*n*i + start, 2*n*i + start+n)])
# Or
print(sum([list(range(2*n*i + start, 2*n*i + start+n)) for i in range(total)], []))
Method 2 (Numpy does operation in C, so fast)
import numpy as np
n = 3
start = 1
total = 12
# One liner
print(
(np.arange(start, start + n, step=1)[:, np.newaxis] + np.arange(0, total, 1) * 2*n).transpose().reshape(-1)
)
##############EXPLAINATION OF ABOVE ONE LINEAR########################
# np.arange start, start+1, ... start + n - 1
first_set = np.arange(start, start + n, step=1)
# [1 2 3]
# np.arange 0, 2*n, 4*n, 6*n, ....
multiple_to_add = np.arange(0, total, 1) * 2*n
print(multiple_to_add)
# broadcast first set using np.newaxis and repeatively add to each element in multiple_to_add
each_set_as_col = first_set[:, np.newaxis] + multiple_to_add
# [[ 1 7 13 19 25 31 37 43 49 55 61 67]
# [ 2 8 14 20 26 32 38 44 50 56 62 68]
# [ 3 9 15 21 27 33 39 45 51 57 63 69]]
# invert rows and columns
each_set_as_row = each_set_as_col.transpose()
# [[ 1 2 3]
# [ 7 8 9]
# [13 14 15]
# [19 20 21]
# [25 26 27]
# [31 32 33]
# [37 38 39]
# [43 44 45]
# [49 50 51]
# [55 56 57]
# [61 62 63]
# [67 68 69]]
merge_all_set_in_single_row = each_set_as_row.reshape(-1)
# array([ 1, 2, 3, 7, 8, 9, 13, 14, 15, 19, 20, 21, 25, 26, 27, 31, 32,
# 33, 37, 38, 39, 43, 44, 45, 49, 50, 51, 55, 56, 57, 61, 62, 63, 67,
# 68, 69])
To make the logic understandable, because sometimes the Pythonic methods look 'magic'
Here's a naive algorithm to do that:
output1 = []
output2 = []
for i in range(1, 100): # change as you like:
if (i-1) % 6 < 3:
output1.append(i)
else:
output2.append(i)
What's going on here:
Initializing two empty lists.
Iterate through integers in a range.
How to tell if i should go to output1 or output2:
I can see that 3 consecutive numbers go to output1, then 3 consecutive to output2.
This tells me I can use the modulo % operator, (doing % 6)
The rest is simple logic to get the exact result wanted.

Generate ranges from sequences based on missing values

Given a range, range(101), I have known missing inputs from the sequence,
{ 8, 23, 56 }
In this range, the only significant numbers are the start and end, 0 and 100. Here's what my initialization looks like:
r = tuple(range(101))
init_start, init_end = r[0], r[-1]
missing = { 8, 23, 56 }
r = tuple(filter(lambda n: n not in missing, r))
Now here is where I get stuck. I don't know how to approach generating the sub-ranges where the holes are. The expected output here is 0, 7, 9, 22, 24, 55, and 57, 100. I could brute force it with the known missing values, but then it doesn't handle edge cases (what if I only have one value in the range?).
Edit
Someone posted a working solution for the "happy path" of the problem, but it misses edge cases:
how to handle sequentially missing values?
what if the start or end are in the missing values?
r = tuple(range(101))
init_start, init_end = r[0], r[-1]
missing = [8, 23, 56]
r = tuple(filter(lambda n: n not in missing, r))
def gen_ranges():
start = init_start
end = 0
for n in sorted(missing):
yield start
start = n + 1
end = n - 1
yield end
yield start
yield init_end
>>> list(gen_ranges())
[0, 7, 9, 22, 24, 55, 57, 100]
One solution using itertools.groupby():
holes = {8, 23, 56}
from itertools import groupby
def generate(holes, r=range(0, 101)):
for v, g in groupby(r, lambda k: k in holes):
if v is False:
l = [*g]
yield from (l[0], l[-1])
print(list(generate(holes)))
Prints:
[0, 7, 9, 22, 24, 55, 57, 100]
Other inputs:
holes = {8, 10, 56} # [0, 7, 9, 9, 11, 55, 57, 100]
holes = {8, 9, 56} # [0, 7, 10, 55, 57, 100]
EDIT (some explanation):
With itertools.groupby I'm making groups from the range() generator using the key function. Here is the key function k in holes (k is value from the range()). If the value returned from the key function changes, that means one consecutive group. I basically do this:
False [0, 1, 2, ... 5, 6, 7] # group 1 (Take first, last)
True [8, 9] # group 2
False [10, 11, ... 54, 55] # group 3 (Take first, last)
True [56] # group 4
False [57, 58, ... 99, 100] # group 5 (Take first, last)
I see a couple of issues:
You need to ensure the values in missing are sorted;
You're not outputting the last pair of values after you finish processing the values in missing
This should give the results you want (note I've added missing as a parameter to ease testing):
def gen_ranges(missing):
start = init_start
end = 0
for n in sorted(missing):
if n == start:
start = n + 1
continue
yield start
start = n + 1
end = n - 1
yield end
if start <= init_end:
yield start
yield init_end
print(list(gen_ranges({ 8, 9, 56 })))
print(list(gen_ranges({ 0 })))
print(list(gen_ranges({ 100 })))
print(list(gen_ranges({ 0, 100 })))
print(list(gen_ranges({ 1, 100 })))
print(list(gen_ranges({ 0, 99 })))
print(list(gen_ranges({ 0, 50, 100 })))
print(list(gen_ranges({ 0, 1, 50, 99, 100 })))
Output:
[0, 7, 10, 55, 57, 100]
[1, 100]
[0, 99]
[1, 99]
[0, 0, 2, 99]
[1, 98, 100, 100]
[1, 49, 51, 99]
[2, 49, 51, 98]
There's already a good accepted answer, but this was an interesting problem.
Here is my solution, which works with edge cases:
def gen_ranges(start, end, missing):
missing = sorted(missing + [start - 1, end + 1])
for num, value in enumerate(missing):
if value - missing[num - 1] > 1:
yield missing[num - 1] + 1
yield value - 1
print(list(gen_ranges(0, 100, [8, 23, 56]))) # [0, 7, 9, 22, 24, 55, 57, 100]
print(list(gen_ranges(0, 100, [5, 40, 50, 52, 93]))) # [0, 4, 6, 39, 41, 49, 51, 51, 53, 92, 94, 100]
print(list(gen_ranges(0, 100, [0, 50, 100]))) # [1, 49, 51, 99]
print(list(gen_ranges(0, 100, [0, 50, 51, 100]))) # [1, 49, 52, 99]

Find largest value from multiple colums in each group of row index in Python, arrange those values diagonally in matrix, and find determinant

I am new to Python. I want to find the largest values from all the columns for repetitive row indexes (i.e. 5 to 130), and also show its row and column index label in output.The largest values should be absolute. (Irrespective of + or - sign). There should not be duplicates for row indexes in different groups.
After finding largest from each group,I want to arrange those values diagonally in square matrix. Then fill the remaining array with the corresponding values of indexes for each group from the main dataframe and find its Determinant.
df=pd.DataFrame(
{'0_deg': [43, 50, 45, -17, 5, 19, 11, 32, 36, 41, 19, 11, 32, 36, 1, 19, 7, 1, 36, 10],
'10_deg': [47, 41, 46, -18, 4, 16, 12, 34, -52, 31, 16, 12, 34, -71, 2, 9, 52, 34, -6, 9],
'20_deg': [46, 43, -56, 29, 6, 14, 13, 33, 43, 6, 14, 13, 37, 43, 3, 14, 13, 25, 40, 8],
'30_deg': [-46, 16, -40, -11, 9, 15, 33, -39, -22, 21, 15, 63, -39, -22, 4, 6, 25, -39, -22, 7]
}, index=[5, 10, 12, 101, 130, 5, 10, 12, 101, 130, 5, 10, 12, 101, 130, 5, 10, 12, 101, 130]
)
Data set :
Expected Output:
My code is showing only till output 1.
Actual Output:
Code:
df = pd.read_csv ('Matrixfile.csv')
df = df.set_index('Number')
def f(x):
x1 = x.abs().stack()
x2 = x.stack()
x = x2.iloc[np.argsort(-x1)].head(1)
return x
groups = (df.index == 5).cumsum()
df1 = df.groupby(groups).apply(f).reset_index(level=[1,2])
df1.columns = ['Number','Angle','Value']
print (df1)
df1.to_csv('Matrix_OP.csv', encoding='utf-8', index=True)
I am not sure about #piRSquared output from what I understood from your question. There might be some errors in there, for instance, in group 2, max(abs(values)) = 52 (underline in red in picture) but 41 is displayed on left...
Here is a less elegant way of doing it but maybe easier for you to understand :
import numpy as np
# INPUT
data_dict ={'0_deg': [43, 50, 45, -17, 5, 19, 11, 32, 36, 41, 19, 11, 32, 36, 1, 19, 7, 1, 36, 10],
'10_deg': [47, 41, 46, -18, 4, 16, 12, 34, -52, 31, 16, 12, 34, -71, 2, 9, 52, 34, -6, 9],
'20_deg': [46, 43, -56, 29, 6, 14, 13, 33, 43, 6, 14, 13, 37, 43, 3, 14, 13, 25, 40, 8],
'30_deg': [-46, 16, -40, -11, 9, 15, 33, -39, -22, 21, 15, 63, -39, -22, 4, 6, 25, -39, -22, 7],
}
# Row idx of a group in this list
idx = [5, 10, 12, 101, 130]
# Getting some dimensions and sorting the data
row_idx_length = len(idx)
group_length = len(data_dict['0_deg'])
number_of_groups = len(data_dict.keys())
idx = idx*number_of_groups
data_arr = np.zeros((group_length,number_of_groups),dtype=np.int32)
#
col = 0
keys = []
for key in sorted(data_dict):
data_arr[:,col] = data_dict[key]
keys.append(key)
col+=1
def get_extrema_value_group(arr):
# function to find absolute extrema value of a 2d array
extrema = 0
for i in range(0, len(arr)):
max_value = max(arr[i])
min_value = min(arr[i])
if (abs(min_value) > max_value) and (abs(extrema) < abs(min_value)):
extrema = min_value
elif (abs(min_value) < max_value) and (abs(extrema) < max_value):
extrema = max_value
return extrema
# For output 1
max_values = []
for i in range(0,row_idx_length*number_of_groups,row_idx_length):
# get the max value for the current group
value = get_extrema_value_group(data_arr[i:i+row_idx_length])
# get the row and column idx associated with the max value
idx_angle_number = np.nonzero(abs(data_arr[i:i+row_idx_length,:])==value)
print('Group number : ' + str(i//row_idx_length+1))
print('Number : '+ str(idx[idx_angle_number[0][0]]))
print('Angle : '+ keys[idx_angle_number[1][0]])
print('Absolute extrema value : ' + str(value))
print('------')
max_values.append(value)
# Arrange those values diagonally in square matrix for output 2
A = np.diag(max_values)
print('A = ' + str(A))
# Fill A with desired values
for i in range(0,number_of_groups,1):
A[i,0] = data_arr[i*row_idx_length+2,2] # 20 deg 12
A[i,1:3] = data_arr[i*row_idx_length+3,1] # x2 : 10 deg 101
A[i,3] = data_arr[i*row_idx_length+1,1] # 10 deg 10
# Final output
# replace the diagonal of A with max values
# get the idx of diag
A_di = np.diag_indices(number_of_groups)
# replace with max values
A[A_di] = max_values
print ('A = ' + str(A))
# Compute determinant of A
det_A = np.linalg.det(A)
print ('det(A) = '+str(det_A))
Output 1:
Group number : 1
Number : 12
Angle : 20_deg
Absolute extrema value : -56
------
Group number : 2
Number : 101
Angle : 10_deg
Absolute extrema value : -52
------
Group number : 3
Number : 101
Angle : 10_deg
Absolute extrema value : -71
------
Group number : 4
Number : 10
Angle : 10_deg
Absolute extrema value : 52
------
Output 2 :
A = [[-56 0 0 0]
[ 0 -52 0 0]
[ 0 0 -71 0]
[ 0 0 0 52]]
Output 3 :
A = [[-56 -18 -18 41]
[ 33 -52 -52 12]
[ 37 -71 -71 12]
[ 25 -6 -6 52]]
det(A) = -5.4731330578761246e-11

How to swap the new added number into the correct position in binary heap?

This question of my homework has passed a list where index 1 is the new node and is also the root. Then I have to check if it's children is smaller then itself and swap it with the smaller child. I've written some code but it's not working.
def perc_down(data):
count = 0
index = 1
l, r = 2 * index, 2 * index + 1
while index < len(data):
if data[index] > data[l] and data[index] > data[r]:
min_i = data.index(min(data[l], data[r]))
data[index], data[min_i] = data[min_i], data[index]
count += 1
index = min_i
return count
values = [0, 100, 7, 8, 9, 22, 45, 12, 16, 27, 36]
swaps = perc_down(values)
print('Binary heap =',values)# should be [0, 7, 9, 8, 16, 22, 45, 12, 100, 27, 36]
print('Swaps =', swaps)# should be 3
Give l and r values inside the while loop
while index <= len(data) // 2:
l, r = 2 * index, 2 * index + 1
if r >= len(data):
r = index
if data[index] > data[l] or data[index] > data[r]:
min_i = data.index(min(data[l], data[r]))
data[index], data[min_i] = data[min_i], data[index]
count += 1
index = min_i
print(data) #Added this for easy debugging.
return count
And run the loop till half values only because it's binary min heap.
Output:
[0, 7, 100, 8, 9, 22, 45, 12, 16, 27, 36]
[0, 7, 9, 8, 100, 22, 45, 12, 16, 27, 36]
[0, 7, 9, 8, 16, 22, 45, 12, 100, 27, 36]
Binary heap = [0, 7, 9, 8, 16, 22, 45, 12, 100, 27, 36]
Swaps = 3
Revised the algorithm for those indices whose children do not exist.
For : values = [0, 100, 7, 11, 9, 8, 45, 12, 16, 27, 36] for 100 after 2 swaps comes at index 5 which does not have a right child so when it exceeds the length of list we just set it back to original index.
Heapified list : Binary heap = [0, 7, 8, 11, 9, 36, 45, 12, 16, 27, 100].

Grouping tuple columns so their sum is less than 1

I need to create a list of groups of items, grouped so that the sum of the negative logarithms of the probabilities is roughly 1.
So far I've come up with
probs = np.random.dirichlet(np.ones(50)*100.,size=1).tolist()
logs = [-1 * math.log(1-x,2) for x in probs[0]]
zipped = zip(range(0,50), logs)
for key, igroup in iter.groupby(zipped, lambda x: x[1] < 1):
print(list(igroup))
I.e. I create a list of random numbers, take their negative logarithms, then zip these probabilities together with the item number.
I then want to create groups by adding together the numbers in the second column of the tuple until the sum is 1 (or slightly above it).
I've tried:
for key, igroup in iter.groupby(zipped, lambda x: x[1]):
for thing in igroup:
print(list(iter.takewhile(lambda x: x < 1, iter.accumulate(igroup))))
and various other variations on using itertools.accmuluate, but I can't get it to work.
Does anyone have an idea of what could be going wrong (I think I'm doing too much work).
Ideally, the output should be something like
groups = [[1,2,3], [4,5], [6,7,8,9]]
etc i.e these are the groups which satisfy this property.
Using numpy.ufunc.accumulate and simple loop:
import numpy as np
def group(xs, start=1):
last_sum = 0
for stop, acc in enumerate(np.add.accumulate(xs), start):
if acc - last_sum >= 1:
yield list(range(start, stop))
last_sum = acc
start = stop
if start < stop:
yield list(range(start, stop))
probs = np.random.dirichlet(np.ones(50) * 100, size=1)
logs = -np.log2(1 - probs[0])
print(list(group(logs)))
Sample output:
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]]
ALTERNATIVE
Using numpy.searchsorted:
def group(xs, idx_start=1):
xs = np.add.accumulate(xs)
idxs = np.searchsorted(xs, np.arange(xs[-1]) + 1, side='left').tolist()
return [list(range(i+idx_start, j+idx_start)) for i, j in zip([0] + idxs, idxs)]

Categories

Resources