A lot of python list with some data, how to detect the data in the list is meet downward then upward.
For example:
a1 = [8,6,4,1,-2,-6,5,8,9,87]
a2 = [8,6,4,1,-2,-6,5,-8,9,10]
where data in a1 is downard then upward,expect print 'OK'
but data in a2 is downard then upward then downard last upward, expect print 'NG'
how to do it by python?
You can compute the sign of the successive differences, then only keep the different ones. If this yields [-1, 1] this is a OK, else a NG:
import numpy as np
def check(lst):
# sign of successive difference
s = np.sign(np.diff(lst))
# mask to deduplicate successive values
m1 = np.r_[True, s[:-1]!=s[1:]]
# mask to remove flat lines
m2 = s != 0
# OK if both masks are True else NG
return 'OK' if s[m1&m2].tolist() == [-1, 1] else 'NG'
check(a1)
# 'OK'
check(a2)
# 'NG'
check([2,1,1,2])
# 'OK'
Intermediates:
### first example
lst
# [8, 6, 4, 1, -2, -6, 5, 8, 9, 87]
s = np.sign(np.diff(lst))
# array([-1, -1, -1, -1, -1, 1, 1, 1, 1])
s[np.r_[True, s[:-1]!=s[1:]]].tolist()
# [-1, 1]
### second example
lst
# [8, 6, 4, 1, -2, -6, 5, -8, 9, 10]
s = np.sign(np.diff(lst))
# array([-1, -1, -1, -1, -1, 1, -1, 1, 1])
s[np.r_[True, s[:-1]!=s[1:]]].tolist()
# [-1, 1, -1, 1]
You can iterate through the list three items at a time and count the types of peaks that you see, either downward or upwards. At the end, return True if there is only one downward peak, False otherwise.
def one_downward_peak(data):
upward_peaks = 0
downward_peaks = 0
for i in range(len(data) - 2):
a, b, c = data[i:i+3]
if a > b < c:
downward_peaks += 1
elif a < b > c:
upward_peaks += 1
return upward_peaks == 0 and downward_peaks == 1
>>> one_downward_peak([8,6,4,1,-2,-6,5,8,9,87])
True
>>> one_downward_peak([8,6,4,1,-2,-6,5,-8,9,10])
False
If you want to optimize this, you could exit early as soon as you see an upward peak.
Related
I have four given variables:
group size
total of groups
partial sum
1-D tensor
and I want to add zeros when the sum within a group reached the partial sum. For example:
groupsize = 4
totalgroups = 3
partialsum = 15
d1tensor = torch.tensor([ 3, 12, 5, 5, 5, 4, 11])
The expected result is:
[ 3, 12, 0, 0, 5, 5, 5, 0, 4, 11, 0, 0]
I have no clue how can I achieve that in pure pytorch. In python it would be something like this:
target = [0]*(groupsize*totalgroups)
cursor = 0
current_count = 0
d1tensor = [ 3, 12, 5, 5, 5, 4, 11]
for idx, ele in enumerate(target):
subgroup_start = (idx//groupsize) *groupsize
subgroup_end = subgroup_start + groupsize
if sum(target[subgroup_start:subgroup_end]) < partialsum:
target[idx] = d1tensor[cursor]
cursor +=1
Can anyone help me with that? I have already googled it but couldn't find anything.
Some logic, Numpy and list comprehensions are sufficient here.
I will break it down step by step, you can make it slimmer and prettier afterwards:
import numpy as np
my_val = 15
block_size = 4
total_groups = 3
d1 = [3, 12, 5, 5, 5, 4, 11]
d2 = np.cumsum(d1)
d3 = d2 % my_val == 0 #find where sum of elements is 15 or multiple
split_points= [i+1 for i, x in enumerate(d3) if x] # find index where cumsum == my_val
#### Option 1
split_array = np.split(d1, split_points, axis=0)
padded_arrays = [np.pad(array, (0, block_size - len(array)), mode='constant') for array in split_array] #pad arrays
padded_d1 = np.concatenate(padded_arrays[:total_groups]) #put them together, discard extra group if present
#### Option 2
split_points = [el for el in split_points if el <len(d1)] #make sure we are not splitting on the last element of d1
split_array = np.split(d1, split_points, axis=0)
padded_arrays = [np.pad(array, (0, block_size - len(array)), mode='constant') for array in split_array] #pad arrays
padded_d1 = np.concatenate(padded_arrays)
I am trying to understand a few slides from this source
Specifically, this example at slide 59:
The part I do not understand is how to go from the chain-code to the curvature.
I believe the formula is given in slide 56:
But if I try to implement this in python I get different results.
For example:
import matplotlib.pyplot as plt
# Dataset
x = [0, 1, 2, 2, 3, 4, 5, 6, 6, 7, 8]
y = [0, 0, 0, 1, 1, 2, 2, 1, 0, 0, 0]
# Show data
plt.scatter(x, y)
plt.plot(x, y)
plt.axis('equal')
plt.show()
import math
i = 4 # Taking the 5th point, at index 4, with supposed curvature of 1 from the slide
k = 1
a = math.atan((y[i+k]-y[i])/(x[i+k]-x[i]))
b = math.atan((y[i]-y[i-k])/(x[i]-x[i-k]))
result = (a - b) % (2 * math.pi) # = 0.7853981633974483
So clearly I a missing something, but what?
The "curvature" in the first image is the difference between two subsequent "chain-codes" modulo 8. So for example for chain codes 0 0 2 0 1 0 7 6 0 0 the 4th entry in curvature is 1-0 = 1 while the sixth is 7-0 = 7 = -1 (mod 8). In Python you can calculate it like this:
>>> def mod8(x):
... m = x % 8
... return m if m < 4 else m - 8
...
>>> cc = [0, 0, 2, 0, 1, 0, 7, 6, 0, 0]
>>> [mod8(a - b) for (a, b) in zip(cc[1:], cc[:-1])]
[0, 2, -2, 1, -1, -1, -1, 2, 0]
If you compare this with the formula that uses atan, what the formula is missing is the conversion of the angles from radians to the units where 1 is 45 degrees (pi/4). Your result 0.7853981633974483 is correct according to the formula, but if you expected to get 1.0 you would have to divide the result by math.pi/4.
I have two numpy arrays of equal size. They contain the values 1, 0, and -1. I can count the number of matching ones and negative ones, but I'm not sure how to count the matching elements that have the same index and value of zero.
I'm a little confused on how to proceed here.
Here is some code:
print(actual_direction.shape)
print(predicted_direction.shape)
act = actual_direction
pre = predicted_direction
part1 = act[pre == 1]
part2 = part1[part1 == 1]
result1 = part2.sum()
part3 = act[pre == -1]
part4 = part3[part3 == -1]
result2 = part4.sum() * -1
non_zeros = result1 + result2
zeros = len(act) - non_zeros
print(f'zeros : {zeros}\n')
print(f'non_zeros : {non_zeros}\n')
final_result = non_zeros + zeros
print(f'result1 : {result1}\n')
print(f'result2 : {result2}\n')
print(f'final_result : {final_result}\n')
Here is the printout:
(11279,)
(11279,)
zeros : 5745.0
non_zeros : 5534.0
result1 : 2217.0
result2 : 3317.0
final_result : 11279.0
So what I've done here is simply subtract the summation of the ones and negative ones from the total length of the array. I can't assume that the difference (zeros: 5745) contains ALL matching elements that contain zeros can I?
You could try this:
import numpy as np
a=np.array([1,0,0,1,-1,-1,0,0])
b=np.array([1,0,0,1,-1,-1,0,1])
summ = np.sum((a==0) & (b==0))
print(summ)
Output:
3
You can use numpy.ravel() to flatten out the array, then use zip() to compare each element side by side:
import numpy as np
ar1 = np.array([[1, 0, 0],
[0, 1, 1],
[0, 1, 0]])
ar2 = np.array([[0, 0, 0],
[1, 0, 1],
[0, 1, 0]])
count = 0
for e1, e2 in zip(ar1.ravel(), ar2.ravel()):
if e1 == e2:
count += 1
print(count)
Output:
6
You can also do this to list all the matches found, as well as print out the amount:
dup = [e1 for e1, e2 in zip(ar1.ravel(), ar2.ravel()) if e1 == e2]
print(dup)
print(len(dup))
Output:
[0, 0, 1, 0, 1, 0]
6
You have two arrays and want to count the positions where both of these are 0, right?
You can check where the array meets your required condition (a == 0), and then use the 'and' operator & to check where both arrays meet your requirement:
import numpy as np
a = np.array([1, 0, -1, 0, -1, 1, 1, 1, 1])
b = np.array([1, 0, -1, 1, 0, -1, 1, 0, 1])
both_zero = (a == 0) & (b == 0) # [False, True, False, False, False, False]
both_zero.sum() # 1
In your updated question you appear to be interested in the similarities and differences between actual values and predictions. For this, a confusion matrix is ideally suited.
from sklearn.metrics import confusion_matrix
confusion_matrix(a, b, labels=[-1, 0, 1])
will give you a confusion matrix as output telling you how many -1s were predicted as -1, 0 and 1, and the same for 0 and +1:
[[1 1 0] # -1s predicted as -1, 0 and 1
[0 1 1] # 0s predicted as -1, 0 and 1
[1 1 3]] # 1s predicted as -1, 0 and 1
I have a NumPy array of integers:
x = np.array([1, 0, 2, 1, 4, 1, 4, 1, 0, 1, 4, 3, 0, 1, 0, 2, 1, 4, 3, 1, 4, 1, 0])
and another array of indices that references the array above:
indices = np.array([22, 12, 8, 1, 14, 21, 7, 0, 13, 19, 5, 3, 9, 16, 2, 15, 11, 18, 20, 6, 4, 10, 17])
For every pair of neighboring indices, we need to count how many consecutive values in x are overlapping starting at each of the two neighboring indices. For example, for indices[2] and indices[3], we have index 8 and 1, respectively, and they both reference positions in x. Then, starting at x[8] and x[1], we count how many consecutive values are the same or are overlapping but we stop checking the overlap under specific conditions (see below). In other words, we check if:
x[8] == x[1]
x[9] == x[2] # increment each index by one
... # continue incrementing each index except in the following conditions
stop if i >= x.shape[0]
stop if j >= x.shape[0]
6. stop if x[i] == 0
7. stop if x[j] == 0
stop if x[i] != x[j]
In reality, we do this for all neighboring index pairs:
out = np.zeros(indices.shape[0], dtype=int)
for idx in range(indices.shape[0]-1):
count = 0
i = indices[idx]
j = indices[idx + 1]
k = 0
# while i+k < x.shape[0] and j+k < x.shape[0] and x[i+k] != 0 and x[j+k] != 0 and x[i+k] == x[j+k]:
while i+k < x.shape[0] and j+k < x.shape[0] and x[i+k] == x[j+k]:
count += 1
k += 1
out[idx] = k
And the output is:
# [0, 0, 0, 0, 0, 1, 1, 1, 1, 3, 3, 2, 3, 0, 3, 0, 1, 0, 2, 2, 1, 2, 0] # This is the old output if x[i] == 0 and x[j] == 0 are included
[1 2 1 4 0 2 2 5 1 4 3 2 3 0 3 0 1 0 3 2 1 2 0]
I'm looking for a vectorized way to do this in NumPy.
This should do the trick (I am ignoring the two conditions x[i]=0 and x[j]=0)
for idx in range(indices.shape[0]-1):
i = indices[idx]
j = indices[idx + 1]
l = len(x) - max(i,j)
x1 = x[i:i+l]
x2 = x[j:j+l]
# Add False at the end to handle the case in which arrays are exactly the same
x0 = np.append(x1==x2, False)
out[idx] = np.argmin(x0)
Notice that with np.argmin I am exploiting the following two facts:
False < True
np.argmin only returns the first instance of the min in the array
Performance Analysis
Regarding time performance, I tested with N=10**5 and N=10**6, and as suggested in the comments, this cannot compete with numba jit.
def f(x, indices):
out = np.zeros(indices.shape[0], dtype=int)
for idx in range(indices.shape[0]-1):
i = indices[idx]
j = indices[idx + 1]
l = len(x) - max(i,j)
x1 = x[i:i+l]
x2 = x[j:j+l]
x0 = np.append(x1==x2, False)
out[idx] = np.argmin(x0)
return out
N=100_000
x = np.random.randint(0,10, N)
indices = np.arange(0, N)
np.random.shuffle(indices)
%timeit f(x, indices)
3.67 s ± 122 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
N=1_000_000
x = np.random.randint(0,10, N)
indices = np.arange(0, N)
np.random.shuffle(indices)
%time f(x, indices)
Wall time: 8min 20s
(I did not have the patience to let %timeit finish)
I have an array y composed of 0 and 1, but at a different frequency.
For example:
y = np.array([0, 0, 1, 1, 1, 1, 0])
And I have an array x of the same length.
x = np.array([0, 1, 2, 3, 4, 5, 6])
The idea is to filter out elements until there are the same number of 0 and 1.
A valid solution would be to remove index 5:
x = np.array([0, 1, 2, 3, 4, 6])
y = np.array([0, 0, 1, 1, 1, 0])
A naive method I can think of is to get the difference between the value frequency of y (in this case 4-3=1) create a mask for y == 1 and switch random elements from True to False until the difference is 0. Then create a mask for y == 0, do a OR between them and apply it to both x and y.
This doesn't really seem the best "python/numpy way" of doing it though.
Any suggestions? Something like randomly select n elements from the highest count, where n is the count of the lowest value.
If this is easier with pandas then that would work for me too.
Naive algorithm assuming 1 > 0:
mask_pos = y == 1
mask_neg = y == 0
pos = len(y[mask_pos])
neg = len(y[mask_neg])
diff = pos-neg
while diff > 0:
rand = np.random.randint(0, len(y))
if mask_pos[rand] == True:
mask_pos[rand] = False
diff -= 1
mask_final = mask_pos | mask_neg
y_new = y[mask_final]
x_new = x[mask_final]
This naive algorithm is really slow
One way to do that with NumPy is this:
import numpy as np
# Makes a mask to balance ones and zeros
def balance_binary_mask(binary_array):
binary_array = np.asarray(binary_array).ravel()
# Count number of ones
z = np.count_nonzero(binary_array)
# If there are less ones than zeros
if z <= len(binary_array) // 2:
# Invert the array
binary_array = ~binary_array
# Find ones
idx = np.nonzero(binary_array)[0]
# Number of elements to remove
rem = 2 * len(idx) - len(binary_array)
# Pick random indices to remove
rem_idx = np.random.choice(idx, size=rem, replace=False)
# Make mask
mask = np.ones_like(binary_array, dtype=bool)
# Mask elements to remove
mask[rem_idx] = False
return mask
# Test
np.random.seed(0)
y = np.array([0, 0, 1, 1, 1, 1, 0])
x = np.array([0, 1, 2, 3, 4, 5, 6])
m = balance_binary_mask(y)
print(m)
# [ True True True True False True True]
y = y[m]
x = x[m]
print(y)
# [0 0 1 1 1 0]
print(x)
# [0 1 2 3 5 6]