I have a variable with zeros and ones. Each sequence of ones represent "a phase" I want to observe, each sequence of zeros represent the space/distance that intercurr between these phases.
It may happen that a phase carries a sort of "impulse response", for example it can be the echo of a voice: in this case we will have 1,1,1,1,0,0,1,1,1,0,0,0 as an output, the first sequence ones is the shout we made, while the second one is just the echo cause by the shout.
So I made a function that doesn't take into account the echos/response of the main shout/action, and convert the ones sequence of the echo/response into zeros.
(1) If the sequence of zeros is greater or equal than the input threshold nearby_thr the function will recognize that the sequence of ones is an independent phase and it won't delete or change anything.
(2) If the sequence of zeros (between two sequences of ones) is smaller than the input threshold nearby_thr the function will recognize that we have "an impulse response/echo" and we do not take that into account. Infact it will convert the ones into zeros.
I made a naive function that can accomplish this result but I was wondering if pandas already has a function like that, or if it can be accomplished in few lines, without writing a "C-like" function.
Here's my code:
import pandas as pd
import matplotlib.pyplot as plt
# import utili_funzioni.util00 as ut0
x1 = pd.DataFrame([0,0,0,0,0,0,0,1,1,1,1,1,0,0,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,0,0,1,1,1])
x2 = pd.DataFrame([0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,1,1,0])
# rule = x1==1 ## counting number of consecutive ones
# cumsum_ones = rule.cumsum() - rule.cumsum().where(~rule).ffill().fillna(0).astype(int)
def detect_nearby_el_2(df, nearby_thr):
global el2del
# df = consecut_zeros
# i = 0
print("")
print("")
j = 0
enterOnce_if = 1
reset_count_0s = 0
start2detect = False
count0s = 0 # init
start2_getidxs = False # if this is not true, it won't store idxs to delete
el2del = [] # store idxs to delete elements
for i in range(df.shape[0]):
print("")
print("i: ", i)
x_i = df.iloc[i, 0]
if x_i == 1 and j==0: # first phase (ones) has been detected
start2detect = True # first phase (ones) has been detected
# j += 1
print("count0s:",count0s)
if start2detect == True: # first phase, seen/detected, --> (wait) has ended..
if x_i == 0: # 1st phase detected and ended with "a zero"
if reset_count_0s == 1:
count0s = 0
reset_count_0s = 0
count0s += 1
if enterOnce_if == 1:
start2_getidxs=True # avoiding to delete first phase
enterOnce_0 = 0
if start2_getidxs==True: # avoiding to delete first phase
if x_i == 1 and count0s < nearby_thr:
print("this is NOT a new phase!")
el2del = [*el2del, i] # idxs to delete
reset_count_0s = 1 # reset counter
if x_i == 1 and count0s >= nearby_thr:
print("this is a new phase!") # nothing to delete
reset_count_0s = 1 # reset counter
return el2del
def convert_nearby_el_into_zeros(df,idx):
df0 = df + 0 # error original dataframe is modified!
if len(idx) > 0:
# df.drop(df.index[idx]) # to delete completely
df0.iloc[idx] = 0
else:
print("no elements nearby to delete!!")
return df0
######
print("")
x1_2del = detect_nearby_el_2(df=x1,nearby_thr=3)
x2_2del = detect_nearby_el_2(df=x2,nearby_thr=3)
## deleting nearby elements
x1_a = convert_nearby_el_into_zeros(df=x1,idx=x1_2del)
x2_a = convert_nearby_el_into_zeros(df=x2,idx=x2_2del)
## PLOTTING
# ut0.grayplt()
fig1 = plt.figure()
fig1.suptitle("x1",fontsize=20)
ax1 = fig1.add_subplot(1,2,1)
ax2 = fig1.add_subplot(1,2,2,sharey=ax1)
ax1.title.set_text("PRE-detect")
ax2.title.set_text("POST-detect")
line1, = ax1.plot(x1)
line2, = ax2.plot(x1_a)
fig2 = plt.figure()
fig2.suptitle("x2",fontsize=20)
ax1 = fig2.add_subplot(1,2,1)
ax2 = fig2.add_subplot(1,2,2,sharey=ax1)
ax1.title.set_text("PRE-detect")
ax2.title.set_text("POST-detect")
line1, = ax1.plot(x2)
line2, = ax2.plot(x2_a)
You can see that x1 has two "response/echoes" that I want to not take into account, while x2 has none, infact nothing changed in x2
My question is: How this can be accomplished in few lines using pandas?
Thank You
Interesting problem, and I'm sure there's a more elegant solution out there, but here is my attempt - it's at least fairly performant:
x1 = pd.Series([0,0,0,0,0,0,0,1,1,1,1,1,0,0,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,0,0,1,1,1])
x2 = pd.Series([0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,0,0,1,1,0])
def remove_echos(series, threshold):
starting_points = (series==1) & (series.shift()==0)
echo_starting_points = starting_points & series.shift(threshold)==1
echo_starting_points = series[echo_starting_points].index
change_points = series[starting_points].index.to_list() + [series.index[-1]]
for (start, end) in zip(change_points, change_points[1:]):
if start in echo_starting_points:
series.loc[start:end] = 0
return series
x1 = remove_echos(x1, 3)
x2 = remove_echos(x2, 3)
(I changed x1 and x2 to be Series instead of DataFrame, it's easy to adapt this code to work with a df if you need to.)
Explanation: we define the "starting point" of each section as a 1 preceded by a 0. Of those we define an "echo" starting point if the point threshold places before is a 1. (The assumption is that we don't have a phases which is shorter than threshold.) For each echo starting point, we zero from it to the next starting point or the end of the Series.
I'm making a trading strategy that uses support and resistance levels. One of the ways i'm finding those is by searching for maxima's/minima's (prices that are higher/lower than the previous and next 5 prices).
I have an array of smoothed closing prices and i first tried to find them with a for loop :
def find_max_min(smoothed_prices) # smoothed_prices = np.array([1.873,...])
avg_delta = np.diff(smoothed_prices).mean()
maximas = []
minimas = []
for index in range(len(smoothed_prices)):
if index < 5 or index > len(smoothed_prices) - 6:
continue
current_value = smoothed_prices[index]
previous_points = smoothed_prices[index - 5:index]
next_points = smoothed_prices [index+1:index+6]
previous_are_higher = all(x > current_value for x in previous_points)
next_are_higher = all(x > current_value for x in next_points)
previous_are_smaller = all(x < current_value for x in previous_points)
next_are_smaller = all(x < current_value for x in next_points)
previous_delta_is_enough = abs(previous[0] - current_value) > avg_delta
next_delta_is_enough = abs(next_points[-1] - current_value) > avg_delta
delta_is_enough = previous_delta_is_enough and next_delta_is_enough
if previous_are_higher and next_are_higher and delta_is_enough:
minimas.append(current_value)
elif previous_are_higher and next_are_higher and delta_is_enough:
maximas.append(current_value)
else:
continue
return maximas, minimas
(This isn't the actual code that i used because i erased it, this may not work but is was something like that)
So this code could find the maximas and minimas but it was way too slow and i need to use the function multiple times per secs on huge arrays.
My question is : is it possible to do it with a numpy mask in a similar way as this :
smoothed_prices = s
minimas = s[all(x > s[index] for x in s[index-5:index]) and all(x > s[index] for x in s[index+1:index+6])]
maximas = ...
or do you know how i could to it in another efficient numpy way ?
I have thought of a way, it should be faster than the for loop you presented, but it uses more memory. Simply put, it creates a intermediate matrix of windows, then it just gets the max and min of each window:
def find_max_min(arr, win_pad_size=5):
windows = np.zeros((len(arr) - 2 * win_pad_size, 2 * win_pad_size + 1))
for i in range(2 * win_pad_size + 1):
windows[:, i] = arr[i:i+windows.shape[0]]
return windows.max(axis=1), windows.min(axis=1)
Edit: I found a faster way to calculate the sub-sequences (I had called windows) from Split Python sequence into subsequences. It doesn't use more memory, instead, it creates a view of the array.
def subsequences(ts, window):
shape = (ts.size - window + 1, window)
strides = ts.strides * 2
return np.lib.stride_tricks.as_strided(ts, shape=shape, strides=strides)
def find_max_min(arr, win_pad_size=5):
windows = subsequences(arr, 2 * win_pad_size + 1)
return windows.max(axis=1), windows.min(axis=1)
You can do it easily by:
from skimage.util import view_as_windows
a = smoothed_prices[4:-5]
a[a == view_as_windows(smoothed_prices, (10)).min(-1)]
Please note that since you are looking at minimas within +/- 5 of the index, they can be in indices [4:-5] of your array.
I have two arrays filled with X and Y values. The values are pulled from text boxes the user fills.
These values (x1,y1), (y1,y2), (x_nth, y_nth) are plotted, where n is the number of points in my arrays.
I want to run through these coordinate pairs and find the ones that overlap each other. Once I find an overlapping point identified then I can change the duplicate(s) marker size to be bigger so the reader can see how often a point is repeated. Right now I just want to accomplish the former.
I’m not familiar with VBA I work mainly in Python. Below is my example code that works in python.
x = [1,2,2,4,5,5,5]
y = [1,3,3,4,5,5,5]
pts = []
for i in range(len(x)):
cX = x[i]
cY = y[i]
if (cX, cY) in pts:
print("duplicate")
print(cX, cY)
#plot this point on scatter
#increase marker size for this particular point
else:
pts.append((cX, cY))
print(pts)
Output
Duplicate
2 3
Duplicate
5 5
Duplicate
5 5
[(1,1), (2, 3), (4,4), (5,5)]
I just threw this together real quick, but it does the job. Python does a better job working with data than vba (more data types like lists and tuples). There are a few ways to make this happen, I just chose to use a 2 dimensional array.
Dim array_Tuple() As Variant, i As Integer, xsplit, ysplit
Dim sLength As Integer, x As String, y As String, v
Dim bool As Boolean
x = "1,2,2,4,5,5,5"
y = "1,3,3,4,5,5,5"
xsplit = Split(x, ",")
ysplit = Split(y, ",")
count = 0
On Error Resume Next
For i = 0 To UBound(xsplit)
bool = True
For j = 0 To count
If xsplit(i) = array_Tuple(1, j) And ysplit(i) = array_Tuple(2, j) Then
If Not err.Number <> 0 Then
bool = False
End If
err.Clear
End If
Next
If bool Then
count = count + 1
ReDim Preserve array_Tuple(1 To 2, 1 To count)
array_Tuple(1, count) = xsplit(i)
array_Tuple(2, count) = ysplit(i)
Debug.Print array_Tuple(1, count) & "," & array_Tuple(2, count)
End If
Next
On Error GoTo 0
I want to compare 2d numpy array with the single x_min , x_max and same for the value of y but I didn't understand the concept of loop in this case how to define loop to compare and use numpy.where_logical_and.
import numpy as np
group_count = 0
xy = np.array([[116,2306],[118,2307],[126,1517]])
idx = np.array([[0,0],[0,1]])
group1 = []
for l in xy:
for i in idx:
for j in range(1):
x_temp = xy[idx[i][j]]
x1 = x_temp[0][0]
y1 = x_temp[0][1]
x1_max = x1 + 60
x1_min = x1 - 60
y1_max = y1 +60
y1_min = y1 - 60
range_grp_1 = [x1_max,x1_min,y1_min,y1_max]
grp1 = [x1,y1]
grp_1 = np.array(grp1)
#print(grp_1,range_grp_1)
if group_count != 0:
print('group count greater than 0')
if np.where((l[i]>x1_min) and (l[i]<x1_max) and (l[i]>y1_min) and (l[i]<y1_max)):
print(l[i])
else:
group1.append(grp_1)
group_count+=1
Error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I post a new code here.
As you said you have a lot of points and the ranges seems to variate, I propose you to wrap the control inside a function, so you can call it as many times as you need to, passing the range of coordinates to be evaluated.
# function to return max and min of list of coordinates
def min_max(coords):
xy = np.array(coords)
xs = [] #save 'x' values
for i in range(len(xy)):
x = [xy[i][0]]
xs.append(x)
ys = [] #save 'y' values
for i in range(len(xy)):
y = [xy[i][1]]
ys.append(y)
rangex = []
rangey = []
for x in min(xs): #get min 'x'
minx = x - 60
rangex.append(minx)
maxx = x + 60
rangex.append(maxx)
for y in min(ys): #get min 'y'
miny = y - 60
rangey.append(miny)
maxy = y + 60
rangey.append(maxy)
return [rangex,rangey]
If you pass the same coordinates you posted the first time, it returns
Execution #1:
coords = [[116,2306],[118,2307],[126,1517]]
my_ranges = min_max(coords)
print(my_ranges)
#[[56, 176], [1457, 1577]]
Or if you pass just the new range you gave me:
Execution #2:
new_coord = [[518,2007]]#pay attention to the format
my_ranges = min_max(new_coord)
print(my_ranges)
#[[458, 578], [1947, 2067]]
And the last part of the code. The one that separates the groups if they belongs to the evaluated range or not.
#changed again:
group1 = [] #coords in the interval
group2 = [] #coords out of the interval
for l in dynCoords:
pair = [l[0],l[1]]
if l[0] in range(my_ranges[0][0],my_ranges[0][1]) and l[1] in range(my_ranges[1][0],my_ranges[1][1]):
group1.append(pair)
else:
group2.append(pair)
#new line appended
my_ranges = min_max(group2)
With the original coordinates [[116,2306],[118,2307],[126,1517]] the groups [118,2307],[126,1517] got out of the range, and went to group2. With the new line appended they were used to change the minimun threshold again, now it goes from 56-2246 for xs and 176-2366 for ys. Let's say you use group2 in dynCoords, dynCoords = group2 and execute what goes under the label #changed again, you get for group1: [[116, 2306], [118, 2307]] and group2 goes empty.
I think you can make a function for that part of the code too. And run it as many times as you need to treat all your coordinates set.
you going to correct.
Suppose we have 1st element of the array: [116 1517]
x_min = 116-60 (56)
x_max = 116+60 (176)
y_min = 1517-60 (1457)
y_max = 1517+60 (1577)
now the other coordinates compare to these values:
for example :
now we have array = [146 1568]
then
x =146 y=1568
if x>x_min and x<x_max and y<y_max and y>y_max
grp.append(array)
else:
print('not in range)
so I want this type of output
146>56 and 146<176 and 1568>1457 and 1568<1577
this might be true so it would append in new array
I am very new to Python and was surprised to find that this section of my code:
print len(allCommunities[5].boundary)
allCommunities[5].surface = triangularize(allCommunities[5].boundary)
print len(allCommunities[5].boundary)
Outputs this:
1310
2
Below is a function I wrote in Processing (a language like Java) and ported into Python. It does what it is supposed to (triangulate a polygon) but my intention was to pass inBoundary for the function to use but not remove elements from allCommunities[5].boundary.
How should I go about preventing allCommunities[5].boundary from being modified in the function? On a side note, I would appreciate pointers if I am doing something silly otherwise in the function, still getting used to Python.
def triangularize(inBoundary):
outSurface = []
index = 0;
while len(inBoundary) > 2:
pIndex = (index+len(inBoundary)-1)%len(inBoundary);
nIndex = (index+1)%len(inBoundary);
bp = inBoundary[pIndex]
bi = inBoundary[index]
bn = inBoundary[nIndex]
# This assumes the polygon is in clockwise order
theta = math.atan2(bi.y-bn.y, bi.x-bn.x)-math.atan2(bi.y-bp.y, bi.x-bp.x);
if theta < 0.0: theta += math.pi*2.0;
# If bp, bi, and bn describe an "ear" of the polygon
if theta < math.pi:
inside = False;
# Make sure other vertices are not inside the "ear"
for i in range(len(inBoundary)):
if i == pIndex or i == index or i == nIndex: continue;
# Black magic point in triangle expressions
# http://answers.yahoo.com/question/index?qid=20111103091813AA1jksL
pi = inBoundary[i]
ep = (bi.x-bp.x)*(pi.y-bp.y)-(bi.y-bp.y)*(pi.x-bp.x)
ei = (bn.x-bi.x)*(pi.y-bi.y)-(bn.y-bi.y)*(pi.x-bi.x)
en = (bp.x-bn.x)*(pi.y-bn.y)-(bp.y-bn.y)*(pi.x-bn.x)
# This only tests if the point is inside the triangle (no edge / vertex test)
if (ep < 0 and ei < 0 and en < 0) or (ep > 0 and ei > 0 and en > 0):
inside = True;
break
# No vertices in the "ear", add a triangle and remove bi
if not inside:
outSurface.append(Triangle(bp, bi, bn))
inBoundary.pop(index)
index = (index+1)%len(inBoundary)
return outSurface
print len(allCommunities[5].boundary)
allCommunities[5].surface = triangularize(allCommunities[5].boundary)
print len(allCommunities[5].boundary)
Lists in Python are mutable, and operations such as
inBoundary.pop
modify them. The easy solution is to copy the list inside the function:
def triangularize(inBoundary):
inBoundary = list(inBoundary)
# proceed as before
The easiest thing to do would be to make a copy of the argument coming in:
def triangularize(origBoundary):
inBoundary = origBoundary[:]
Then the rest of your code can stay the same.