median of five points python - python

My data frame (df) has a list of numbers (total 1773) and I am trying to find a median for five numbers (e.g. the median of 3rd number = median (1st,2nd,3rd,4th,5th))
num
10
20
15
34
...
...
...
def median(a, b, c,d,e):
I=[a,b,c,d,e]
return I[2]
num_median = [num[0]]
for i in range(1, len(num)):
num_median = median(num[i - 1], num[i-2], num[i],num[i+1],num[i+2])
df['num_median']=num_median
IndexError: index 1773 is out of bounds for axis 0 with size 1773
Where did it go wrong and Is there any other method to compute the median?

An example which will help:
a = [0, 1, 2, 3]
print('Length={}'.format(len(a)))
print(a(4))
You are trying the same thing. The actual index of an element is one lower than the place it is. Keep in mind your exception shows you exactly where your problem is.
You need to modify:
for i in range(1, len(num) - 2):
num_median = median(num[i - 1], num[i-2], num[i],num[i+1],num[i+2])
So that your last index check will not be too large. Otherwise, when you are at the end of your array (index = 1773) you will be trying to access an index which doesn't exist (1773 + 2).

Related

How can I change the index correct in a list? (Python)

I wrote some code to calculate the maximum path in a triangle.
75
95 64
17 47 82
18 35 87 10
20 04 82 47 65
The maximum path sum is 75+95+82+87+82. I want that it calculates the maximum path from the adjecent numbers under the current layer. For example the maximum path sum must be: 75+95+47+87+82, because 47 'touches' 95 and 82doesn't. In the layer under this there is a choice between 35 and 87. So there is always a choice between two numbers. Does anyone how I can do this? This is my code:
lst = [[72],
[95,64],
[17,47,82],
[18,35,87,10],
[20,4,82,47,65]]
something = 1
i = 0
mid = 0
while something != 0:
for x in lst:
new = max(lst[i])
print(new)
i += 1
mid += new
something = 0
print(mid)
For each row the choice should be between two values. If you always want to pick the maximum between those two values, the first solution works. If you want to maximize the sum by sometimes picking the lower value, the second solution works.
values = []
index = 0 # the index of the 'max' number in the first row
for row in lst:
section = row[index:index+2]
value = max(section)
index = row.index(value)
values.append(value)
In this code we loop through each row, where the first row is just [75], the second row is [95, 64], etc.
For each row, we take the two numbers that are directly below our previous choice. For the first row, the choice must always be number in position 0, as it's the only number.
Then, we take the max of those two numbers. Then we take the index of that max number, which we will use to select the new two numbers on the next iteration.
Now all the values are stored in value. We can use sum(values) to get the sum.
Second solution
import itertools
# Create a new pyramid where each cell is the index of each number
index_lst = [[x for x in range(len(l))] for l in lst]
# Now index_lst looks like this:
# [[0],
# [0, 1],
# [0, 1, 2],
# [0, 1, 2, 3],
# [0, 1, 2, 3, 4]]
# get all possible paths, including those that are not valid such as [0, 0, 0, 0, 4]
all_possible_paths = itertools.product(*index_lst)
def check_validity(path):
# check the step size for each path.
# A step it should always be less then 2
# If the differene is too large, then it's not a valid path
for x in range(0,len(path)-1):
difference = abs(path[x] - path[x+1])
if difference > 1:
return False
return True
# Filter out all the paths that are not valid
valid_path = []
for path in all_possible_paths:
if check_validity(path):
valid_path.append(path)
# For all the valid paths, get the path that returns the maximum value
# here the max funciton takes in the paths, and then converts the path to the sum of the values
optimal_path = max(valid_path, key = lambda e: sum([lst[row][index] for row, index in enumerate(e)]))
# Actually conver the optimal path to values
optimal_values = [lst[row][index] for row, index in enumerate(optimal_path)]
print(optimal_path)
print(optimal_values)
print(sum(optimal_values))
edit: Some extra details about this line
optimal_path = max(valid_path, key = lambda e: sum([lst[row][index] for row, index in enumerate(e)]))
First, the max function can take a key parameter. This just tells the max function how it should actually determine the max. For example, with input [[0,5],[1,0]] I can use the key parameter to tell max to use the second elements (i..e, 5 and 0) in each list to determine the max, instead of 0 and 1.
In this case, we give it a lambda function. This basically means we pass a function as the key parameter, without defining the function before hand. It is conceptually imilar to doing this
def fun(e):
# do some stuff
optimal_path = max(valid_path, key = fun(e))
Next, I use a list comprehension. That's just a shorter way of writing a for loop.
[lst[row][index] for row, index in enumerate(e)]
Is the same thing as doing
output = []
for row, index in enumerate(e):
output.append(lst[row][index])
The enumerate takes an iterable (like ['a','b','c']) and also gives the index for each element. Thus list(enumerate(a)) returns [(0, 'a'), (1, 'b'), (2, 'c')]. In the for loop I immediately assign them to row and index.
The "triangle" is an example of a graph (in the computer science sense, not the maths sense).
You will have to use recursion in order to solve this, or some kind of stochastic algorithm. Personally, I'd use the genetic algorithm since the problem is combinatorial.

order a sequence based on the sum of the window values

I know the question is not very clear so I try to make a clearer example:
I have some values, which represent the minute of the day and a corresponding value, like
import numpy as np
x = np.arange(1440)
values = np.sin(2*np.pi/1440*2*l - np.pi/2)+1
and I want to get the ordered (from lower to larger) sums per hour of values, where I don't want any value to be excluded, except at least the highest remaining ones, so in this case I would get 24 ordered (my hours) values, or better 23 if I consider that the minimum sum could be anywhere in the series and in the end I will have some values at the end and the beginning whose 'window will be less than 60 minutes, except in a very particular case.
I don't know if I should apply a boolean mask in a while loop which I could manage to do or if numpy or some other packets have already some functions that could help me in solving the problem. Thanks
Also, if possible, I would like to get the SORTED NONSUBSQUENT SUMS, up to the moment where I have no more intervals of the right (WINDOW) length, which in this particular case implies that my results will have a minimum of 12 windows sums up to 24. Thanks again
So, in a general case (I have no space to insert the minute data) if my values are: [1,2,3,4,5,6,0,0,0,7,8,9] and I will need to group them in windows of 3 elements size (in the real case this is my 60 minutes window), my result will be:
[[0,0,0],[1,2,3],[4,5,6],[7,8,9]].
Or to be more general,if they are: [0,1,2,3,4,5,6,0,0,0,7,9,1,9,6], in the first case I would get: [[0,0,0],[1,2,3],[4,5,6],[7,9,1]] (because I take subsequent windows) and in the second case I would get: [[0,0,0],[0,1,2],[3,4,5],[1,9,6]] (because I just focus on the minimum sorted sums)
I am not sure if you want to have a rolling window or if you want to sum over full hours. Here is the first try, by reshaping the values into a 24 x 60 array to get sums for the full hours.
import numpy as np
x = np.arange(60*24)
values = np.sin(2*np.pi/1440*2*x - np.pi/2)+1
x_values = values.reshape((24, 60))
sums = x_values.sum(axis=-1)
print(np.argsort(sums))
# array([12, 0, 11, 23, 13, 1, 10, 22, 14, 2, 9, 21,
15, 3, 8, 20, 16, 4, 7, 19, 17, 5, 18, 6])
# if some values are missing / not yet available you can simply
# fill them with zeros
values[3, 16:] = 0 # no data available for 04:16 - 05:00
values[3, :] = 0 # or ignore this hour completely
EDIT
Some remarks after the clarifications:
As always it depends on your use case which of the two options are better for you, maybe you even want to allow some overlap between the hours...
It seems a little strange to me to order everything after the first minimum you find; I would prefer the option where the intervals can be placed arbitrarily over the day, but note that you can easily end up with less than 23 valid intervals in this case.
inf = 1e9
m = 3
a = np.array([1,0,1,2,3,4,5,6,0,0,0,7,9,1,9,6])
for i in range(len(a)//m):
b = np.convolve(a, v=np.ones(m), mode='full')
i = np.argmin(b[m-1:-m+1])
if inf in a[i:i+m]:
break
print(i, a[i:i+m])
a[i:i+m] = inf
# This gives you the intervals with the corresponding starting index
# 8 [0 0 0]
# 0 [1 0 1]
# 3 [2 3 4]
# 13 [1 9 6]
For completeness here is also the second option:
# get interval with minimal sum
b = np.convolve(a, v=np.ones(m), mode='full')
i = np.argmin(b[m - 1:-m + 1])
# reshape values and clip boundaries according to found minimum i
a = a[i % m: -((len(a)-i) % m)]
a = a.reshape(-1, m)
# order the intervals and print their respective indices in the intial array
i_list = np.argsort(a.sum(axis=-1))
print(a[i_list])
print(i%m + i_list*m)
# [[0 0 0]
# [1 2 3]
# [4 5 6]
# [7 9 1]]
# [ 8 2 5 11]

extract integer from an array in a certain interval

I would like to extract the number from an array at a certain interval. The array is of size (1,20). I want to print the number from this array at an interval of 4. I am trying to print all the 4th numbers from 0 to 20 from the array. But I suspect my code Is not printing the right 4th number in the range 0 to 20. I am trying to extract the column numbers from stimnumber here. stimnumber has a shape (1,20)
If
stimnuber = [[1,1,1,1,4,4,4,4,8,8,8,8,9,9,9,9,0,0,0,0]]
I want to print all the numbers 1, 4, 8, 9 and 0.
j = 0
for j in range(stimnumber.shape[1]):
while j < 5:
stimnum = stimnumber[:,j::20]
print(stimnum[:,j])
j += 20
Just iterate with a step of 4
stimnumber =[[1,1,1,1,4,4,4,4,8,8,8,8,9,9,9,9,0,0,0,0]]
for i in range(0,len(stimnumber[0]),4):
print(stimnumber[0][i])
Or as pointed by wjandrea if you are familiar with array slicing (https://www.geeksforgeeks.org/python-list-comprehension-and-slicing/) you can try:
stimnumber =[[1,1,1,1,4,4,4,4,8,8,8,8,9,9,9,9,0,0,0,0]]
for i in stimnumber[0][::4]: print(i)
basically it means [start, stop, step] for the array locations

How can I fix my code to solve for this puzzle? (Python)

This is a diagram of the puzzle I'm solving for:
Where A,B,C,D are four numbers that are either 0 or 1, and alphas are some random constants (could be positive or negative, but ignore the case when they're 0). The goal is to determine the list [A,B,C,D] given a random array of alpha. Here're some rules:
If alpha >0, then the neighboring points have different values. (Ex: if alpha_A = 4.5, then A,B = 01 or 10.
If alpha <0, then the neighboring points have the same value. (Ex: if alpha_B = -4.5, then B,C = 00 or 11.
Making the two adjacent points constant for the highest alpha_i, then start from the 2nd highest absolute value, iterate to the 2nd lowest one. Rule 1 and 2 could be violated only if alpha_i has the minimum absolute value.
Here's the current code I have:
alpha = np.random.normal(0, 10, N_object4)
pa = np.abs(alpha)
N_object4 = 4
num = pa.argsort()[-3:][::-1] # Index of descending order for abs
print(alpha)
print(num)
ABCD = np.zeros(N_object4).tolist()
if alpha[num[0]] > 0:
ABCD[num[0]+1] = 1 # The largest assignment stays constant.
for i in range (1,len(num)): # Iterating from the 2nd largest abs(alpha) to the 2nd smallest.
if i == 3:
num[i+1] = num[0]
elif alpha[num[i]] > 0:
if np.abs(num[i]-num[0]) == 1 or 2:
ABCD[num[i+1]] = 1-ABCD[num[i]]
elif np.abs(num[i]-num[0]) == 3:
ABCD[num[i]] = 1-ABCD[num[0]]
elif alpha[num[i]] < 0:
if np.abs(num[i]-num[0]) == 1 or 2:
**ABCD[num[i+1]] = ABCD[num[i]]**
elif np.abs(num[i]-num[0]) == 3:
ABCD[num[i]] = ABCD[num[0]]
I think I'm still missing something in my code. There's an error in my current version (marked in the line with **):
IndexError: index 3 is out of bounds for axis 0 with size 3
I'm wondering where's my mistake?
Also, an example of the desired output: if
alpha = [12.74921599, -8.01870123, 11.07638142, -3.51723019]
then
num = [0, 2, 1]
The correct list of ABCD should be:
ABCD = [0, 1, 1, 0] (or [1, 0, 0, 1])
My previous version could give me an output, but the list ABCD doesn't look right.
Thanks a lot for reading my question, and I really appreciate your help:)
Looking at your code if found a few smells:
nums is 3 elements long, seems that you want num = pa.argsort()[-4:][::-1]
You also have indexing issues on other places:
if i == 3:
num[i+1] = num[0]
element 4 is not on the array
My personal advice: given the small size of the problem, get rid of numpy and just use python lists.
The other observation (that took me years to learn) is that for small problems, bruteforce is ok.
This is my bruteforce solution, with more time, this can be improved to do backtracking or dynamic programming:
import random
# To make things simpler, I will consider A = 0, and so on
pairs = [
(0, 1),
(1, 2),
(2, 3),
(3, 0),
]
# Ensure no zeros on our list
alphas = [(random.choice([1, -1]) * random.uniform(1, 10), pairs[n]) for n in range(4)]
print(alphas)
# Sort descending in
alphas.sort(reverse=True, key=lambda n: abs(n[0]))
print(alphas[:-1])
def potential_solutions():
"""Generator of potential solutions"""
for n in range(16):
# This just abuses that in binary, numbers from 0 to 15 cover all posible configurations
# I use shift and masking to get each bit and return a list
yield [n >> 3 & 1 , n >> 2 & 1, n >> 1 & 1, n & 1]
def check(solution):
"""Takes a candidate solution and check if is ok"""
for alpha, (right, left) in alphas[:-1]:
if alpha < 0 and solution[right] != solution[left]:
return False
elif alpha > 0 and solution[right] == solution[left]:
return False
return True
# Look for correct solutions
for solution in potential_solutions():
if check(solution):
print(solution)

MaxDoubleSliceSum Algorithm

I'm trying to solve the problem of finding the MaxDoubleSliceSum value. Simply, it's the maximum sum of any slice minus one element within this slice (you have to drop one element, and the first and the last element are excluded also). So, technically the first and the last element of the array cannot be included in any slice sum.
Here's the full description:
A non-empty zero-indexed array A consisting of N integers is given.
A triplet (X, Y, Z), such that 0 ≤ X < Y < Z < N, is called a double slice.
The sum of double slice (X, Y, Z) is the total of A[X + 1] + A[X + 2] + ... + A[Y − 1] + A[Y + 1] + A[Y + 2] + ... + A[Z − 1].
For example, array A such that:
A[0] = 3
A[1] = 2
A[2] = 6
A[3] = -1
A[4] = 4
A[5] = 5
A[6] = -1
A[7] = 2
contains the following example double slices:
double slice (0, 3, 6), sum is 2 + 6 + 4 + 5 = 17,
double slice (0, 3, 7), sum is 2 + 6 + 4 + 5 − 1 = 16,
double slice (3, 4, 5), sum is 0.
The goal is to find the maximal sum of any double slice.
Write a function:
def solution(A)
that, given a non-empty zero-indexed array A consisting of N integers, returns the maximal sum of any double slice.
For example, given:
A[0] = 3
A[1] = 2
A[2] = 6
A[3] = -1
A[4] = 4
A[5] = 5
A[6] = -1
A[7] = 2
the function should return 17, because no double slice of array A has a sum of greater than 17.
Assume that:
N is an integer within the range [3..100,000];
each element of array A is an integer within the range [−10,000..10,000].
Complexity:
expected worst-case time complexity is O(N);
expected worst-case space complexity is O(N), beyond input storage (not counting the storage required for input arguments).
Elements of input arrays can be modified.
Here's my try:
def solution(A):
if len(A) <= 3:
return 0
max_slice = 0
minimum = A[1] # assume the first element is the minimum
max_end = -A[1] # and drop it from the slice
for i in xrange(1, len(A)-1):
if A[i] < minimum: # a new minimum found
max_end += minimum # put back the false minimum
minimum = A[i] # assign the new minimum to minimum
max_end -= minimum # drop the new minimum out of the slice
max_end = max(0, max_end + A[i])
max_slice = max(max_slice, max_end)
return max_slice
What makes me think that this may approach the correct solution but some corners of the problem may haven't been covered is that 9 out 14 test cases pass correctly (https://codility.com/demo/results/demoAW7WPN-PCV/)
I know that this can be solved by applying Kadane’s algorithm forward and backward. but I'd really appreciate it if someone can point out what's missing here.
Python solution O(N)
This should be solved using Kadane’s algorithm from two directions.
ref:
Python Codility Solution
C++ solution - YouTube tutorial
JAVA solution
def compute_sum(start, end, step, A):
res_arr = [0]
res = 0
for i in range(start, end, step):
res = res + A[i]
if res < 0:
res_arr.append(0)
res = 0
continue
res_arr.append(res)
return res_arr
def solution(A):
if len(A) < 3:
return 0
arr = []
left_arr = compute_sum(1, len(A)-1, 1, A)
right_arr = compute_sum(len(A)-2, 0, -1, A)
k = 0
for i in range(len(left_arr)-2, -1, -1):
arr.append(left_arr[i] + right_arr[k])
k = k + 1
return max(arr)
This is just how I'd write the algorithm.
Assume a start index of X=0, then iteratively sum the squares to the right.
Keep track of the index of the lowest int as you count, and subtract the lowest int from the sum when you use it. This effectively lets you place your Y.
Keep track of the max sum, and the X, Y, Z values for that sum
if the sum ever turns negative then save the max sum as your result, so long as it is greater than the previous result.
Choose a new X, You should start looking after Y and subtract one from whatever index you find. And repeat the previous steps, do this until you have reached the end of the list.
How might this be an improvement?
Potential problem case for your code: [7, 2, 4, -18, -14, 20, 22]
-18 and -14 separate the array into two segments. The sum of the first segment is 7+2+4=13, the sum of the second segment is just 20. The above algorithm handles this case, yours might but I'm bad at python (sorry).
EDIT (error and solution): It appears my original answer brings nothing new to what I thought was the problem, but I checked the errors and found the actual error occurs here: [-20, -10, 10, -70, 20, 30, -30] will not be handled correctly. It will exclude the positive 10, so it returns 50 instead of 60.
It appears the askers code doesn't correctly identify the new starting position (my method for this is shown in case 4), it's important that you restart the iterations at Y instead of Z because Y effectively deletes the lowest number, which is possibly the Z that fails the test.

Categories

Resources