How to find the smoothest part of the trajectory using python? - python

I have came across the following problem. I have created a program that estimates the trajectory of camera recording video along x and y axis. This approach is very common in warp-stabilizers. I want to find the range of values which represent the most stable video footage and get their values. Here is the graph of trajectory. Graph is based on the numpy array. I guess that the best idea would be to pick up the part where the increase of values is the slowest but i am not sure how to do it.

The following code will find the longest range of values that for both X and Y are stable for a certain threshold. This threshold limits the change in X and Y.
You can use it to tune your result.
If you want more stable section than choose epsilon lower.
This will result in a shorter range.
import numpy as np
X=150*np.random.rand(270) # mock data as I do not have yours
Y=150*np.random.rand(270) # Replace X and Y with your X and Y data
epsilon = 80 #threshold
# get indexes where the difference is smaller than the threshold for X and Y
valid_values = np.logical_and(np.abs(np.diff(X))<epsilon,np.abs(np.diff(Y))<epsilon)
cummulative_valid_values=[]
count = 0
# find longest range of values that satisfy the threshold
for id,value in enumerate(valid_values):
if value == True:
count=count+1
else:
count = 0
cummulative_valid_values.append(count)
# Calculate start and end of largest stable range
end_of_range = cummulative_valid_values.index(max(cummulative_valid_values))+1
start_of_range = end_of_range-max(cummulative_valid_values)+1
print("Largest stable range is: ",start_of_range," - ",end_of_range)

Related

How to get the highest Two peak values from a discrete sampled data point using python?

I have Radar Data for a vehicle going further from it, The Radar outputs a .csv file. Once The Radar detects something, the amplitude column switches from a 0 to a one and starts outputting values, plotting it across. For example, here:
When the column for Distance/Amplitude goes from 0 to a number, it can be inferred that target has been seen by the Radar. So plotting the row of the first instance gives out this blue wave
If we plot the rows below it, We get this,
the Radar was placed in the back, so the target was moving away from it. The x-axis represents distance multiplied by .077 m. So, for the first blue wave, the distance that the Radar registers for 37*.077m. I was wondering if there is a way where I can get a range of values from the .csv file to take into account of the two peaks, for example: I was wondering how I could get the top two peaks from the blue wave, get the x-axis coordinates for them and then get a median point for them and track it for the orange, which is the second row below the first one.
I have attached below the .csv file.
https://drive.google.com/file/d/1IJOebiXuScjLPytemulcXph7ZB1X65wU/view?usp=sharing
I have an algorithm that gets the index of the first hit and the last hit, for example when switching from a 0 to a value and a value to a zero, these allow me to catch the when the radar detects a target. This was helpful while I was using the values directly given by the Radar, like the distance and amplitude values, but now that I need a whole row, I don't know how to proceed with this. I don't know if Pandas or Numpy has ways I can utilize to deal with this
There are a few ways to get peaks, and thus get the two peak positions. Get the derivative of the data set. The points where the derivative data intersects the x-axis will be your peaks and valleys of the original data. While doing that, you can also grab the indices of those peaks and valleys. From there, you can iterate through those points in the original data to get the two maximum values, and their indices.
It would look something like this:
import matplotlib.pyplot as plt
import numpy as np
# My data set (example is a crazy cosine wave)
x = np.linspace(1, 100, 1000)
y = np.cos(x*0.5)*((x - 50)**3)
# Find first derivative:
m = np.diff(y)/np.diff(x)
# Get indicies of peaks and valleys
c = len(m)
indices = []
for i in range(1, c):
prev_val = m[i-1]
val = m[i]
if prev_val < 0 and val >= 0:
indices.append(i)
elif prev_val > 0 and val <= 0:
indices.append(i)
# Get the values, positions, and indicies of the two peaks
max_list = [0, 0]
index_list = [0, 0]
pos_list = [0, 0]
for index in indices:
val = y[index]
if val > max_list[0]:
max_list[0] = val
index_list[0] = index
pos_list[0] = x[index]
elif val > max_list[1]:
max_list[1] = val
index_list[1] = index
pos_list[1] = x[index]
print('Two peak indices:', index_list)
print('Two peak values:', max_list)
print('Two peak x-positions:', pos_list)
average_pos = (pos_list[0] + pos_list[1])/2
print('Average x-position:', average_pos)
plt.plot(x, y)
plt.show()

Identification of Peaks from a Line Graph

I have added a link to the data set here.The first script produces a line graph using signal output data. My next step was to identify the peaks present on the line graph. The second script has an algorithm to identify all the peaks present on the line graph. However it is too sensitive. It classifies even the slightest bumps on the graph as a peak. I do not want this. I only wish to identify the conspicous(large) bumps as peaks. How do I modify the second script to do this?[Line Graph][2]
import matplotlib.pyplot as plt
import numpy as np
X, Y = [], []
X = np.zeros((10, 4096))
Y = np.zeros((10, 4096))
n=0
m=0
for line in open('data_set2.txt', 'r'):
values = [float(s) for s in line.split()]
X[n,0] = values[0]-1566518691968
for m in range(4096):
Y[n,m]=values[m+1]
n=n+1
plt.plot(Y[1,0:4095])
plt.show()
b = (X[1:]-X[:-1])[:-1]
c = (X[:-1]-X[1:])[1:]
minima = np.where(np.bitwise_and(b<0, c<0))[0]+1
maxima = np.where(np.bitwise_and(b>0, c>0))[0]+1
all_peaks = np.where((b*c)>0)[0]+1
del b,c
print(minima)
print(maxima)
print(all_peaks)
I wish you had attached the data set so I can try my solution before I post it here. What I think you're doing is looking for all the points that are higher than the point before and point after them, which is why you're ending up with too many points. What your code should look for tho is a number of the highest peaks. It does not matter if the point is a peak, and the peak's height itself does not matter either, what matters is the "uniqueness" of the peak, or how much it is higher than the average point. Imagine removing the highest three peaks in your example and zooming in, you will find a new number of peaks that look much higher than the rest; and so on.
The tricky thing for you is to find the number of those peaks which depends on how sensitive you want your code to be.
There are some packages for identifying peaks. scypi provides scipy.signal.find_peaksfunction (https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.find_peaks.html). There are also peak identification for matlab, octave and others.
All of them require some quantitative criteria for peak identification. None will work with just "conspicous(large) bumps".
UPD. So, if you want to write the code yourself, you have choose some filtering function. There may be some pretty obvious:
Peaks that have value greater than x;
Peaks that are greater than x% of maximum peak's value;
x largest peaks (where x < total numbers of peaks.
Varying parameter x you can come to solution that satisfies your criteria of "conspicous(large) bumps".
P. S. There might be other filtering functions, but it looks that in your case width and form of peak do not matter.
I am unaware of the algorithms suggested by someone here; if I were you I'd check them out because their way should be more efficient. However, if you want to be lazy, here is a solution:
First I gather all the peaks (the same peaks you get):
x_range = range(4096)
peaks = []
for line in open('data_set2.txt', 'r'):
values = [float(s) for s in line.split()]
X[n, 0] = values[0] - 1566518691968
for m in range(4096):
Y[n, m] = values[m + 1]
# only work with the plotted row starting from the third value
if n != 1 or m in (0, 1):
continue
# if a point is higher than the point before and point after
if Y[n, m - 2] < Y[n, m - 1] > Y[n, m]:
peaks.append((x_range[m-1], Y[n, m - 1]))
n = n + 1
plt.plot(Y[1, 0:4095])
plt.show()
Then I loop through each 100 points (assuming no two peaks can occur in 100 point-range, otherwise one of them will be discarded) and find the maximum. If the segment maximum is some percentage of the absolute maximum, it is included.
max_ = max(peaks, key=lambda x: x[1])[1]
# If the graph does not have peaks
if max_ < np.mean(Y) * 10: # Tweak point 1
print("No peaks")
sys.exit()
highest = []
sensitivity = 0.2 # Tweak point 2
for i in range(0, len(peaks), 100):
try:
segment = peaks[i: i + 100]
except IndexError:
segment = peaks[i:]
finally:
segment_max = max(segment, key=lambda x: x[1])[1]
if segment_max >= max_ * sensitivity:
highest.append(segment_max)
print(highest)

bin counts in stacked histogram (weighted) with x-coordinate greater than certain value

Lets say I have two datasets, and then I plot a stacked histograms of both datasets with some weight. Now, can I know what is the total bin counts for data elements greater than certain number (i.e. for x-coordinate greater than a certain value). To illustrate my question, I have done the following
import matplotlib.pyplot as plt
import numpy as np
data1 = np.random.normal(0,0.6,1000)
data2 = np.random.normal(0,1.4,1000)
weight1 = np.array([0.5]*len(data1))
weight2 = np.array([0.9]*len(data2))
hist = plt.hist((data1,data2),weights=(weight1,weight2),stacked=True,range=(-5,5))
plt.show()
Now, how would I know the the bin counts, say for x greater than -2?
As of now, to get that answer, I was doing the following
n1,_,_ = plt.hist((data1,data2),weights=(weight1,weight2),stacked=False,range=(-2,10000))
bin_counts=sum(sum(n1))
print(bin_counts)
Here, I choose the max value in range to be some crazily large number, so that I get all the bin counts for x=-2 and greater.
Is there any more efficient way than this?
Also, what would be the way to obtain the bin_counts for a variable x, where x varies from the minimum value of x-coordinate to maximum value of x-coordinate in some steps?
Any help will be greatly appreciated!
Thanks much!
You could do the following:
#in your case n is going to be a list of arrays, because you have 2 histograms
n,bins,_ = plt.hist(...)
#get a list of lists of counts for bin values over x
n_over_x = [[val for val,bin in zip(selected_cnt, bins) if bin > x] for selected_cnt in n]
#sum up list of lists
result = sum([sum(part_list) for part_list in n_over_x])
here's what I came up with,
def my_range(start, end, step):
while start <= end:
yield start
start += step
b_counts=[0]*len(data1) #here b_counts is the normalized events (i mean normalized according to the weights)
value=[0]*len(data1)
bin_min=-5
bin_max=10
bin_step=1
count_max = (bin_max-bin_min)/bin_step
for i in my_range(bin_min,count_max,1):
n1,_,_ = plt.hist((data1,data2),weights=(weight1,weight2),stacked=False,range=(i*bin_step,10000))
b_counts[i] = sum(sum(n1))
value[i] = i*bin_step #here value is exactly equal to "i", but I am writing this for a general case
print(b_counts[i],value[I])
I do believe that this gives me the events (in the histogram) in the range (value,10000), where the value is the variable

Finding a maximum in inverse parabola iteratively

I have an array that represents an inverse parabola and I want to find the maximum which can be anywhere in the array. In my application I cannot take the derivative and I have to loop through the array.
I implemented this by iterating through the array starting on the left and until I get a value lower that the previous iteration:
import numpy as np
def simulation(n):
# create inverse parabola
num = 21
parabola= np.linspace(-8, 12, num=num)
parabola= -np.abs(parabola) ** 2
return parabola[n]
previous_iteration = -1000 # some initialization
for n in range(num):
# Configure the entire system
# Run simulation
# simulation(n) - a function returning simulation result with configuration "n"
simulation_result = simulation(n)
if previous_iteration < simulation_result :
previous_iteration = simulation_result
else:
best_iteration = n-1
break
print(best_iteration)
print(previous_iteration)
Is there a faster way to do this?
Edit:
The actual implementation will be on a FPGA and for every iteration I have to configure the system and run a simulation so every iteration costs a lot of time. If I run the simulation with all possible configurations, I will get a parabola vector but that will be time consuming and very inefficient.
Im looking for a way to find the maximum value while generating as less points as possible. Unfortunately the for loop has to stay because that is representation of how the system works. The key here is to change the code inside the for loop.
I edited the code to better explain what I mean.
An inverse parabola (sampled at evenly spaced points) has the property that the differences between adjacent points always decreases. The maximum is just before the differences go negative.
If the difference between the first two points is negative or zero, the maximum is the first point in the array.
Otherwise, do a binary search to find the smallest positive difference between two adjacent points. The maximum will be the second of these two points.
What you want to do is called a grid search. You need to define your search grid points (here in 'x') and calculate all the values of the parabola for these points (here in 'y'). You can use np.argmax to find the index of the argument that yields the maximum value:
import numpy as np
# Define inverse parabola
parabola = lambda x: -np.abs(x)**2
# Search axis
x = np.linspace(-8, 12, 21)
# Calculate parabola at each point
y = parabola(x)
# Find argument that yields maximum value
xmax = x[np.argmax(y)]
print(xmax) # Should be 0.0
I edited the post hopefully the problem clearer now.
What you're looking for is called a "ternary search": https://en.wikipedia.org/wiki/Ternary_search
It works to find the maximum of any function f(x) that has an increasing part, maybe followed by an all-equal part, followed by a decreasing part.
Given bounds low and high, pick 2 points m1 = low +(high-low)/3 and m2 = low + (high-low)*2/3.
Then if f(m1) > f(m2), you know the maximum is at x<=m2, because m2 can't be on the increasing part. So set high=m2 and try again.
Otherwise, you know the maximum is at x>=m1, so set low=m1 and try again.
repeat until high-low < 3 and then just pick whichever of those values is biggest.

Counting the number of times a threshold is met or exceeded in a multidimensional array in Python

I have an numpy array that I brought in from a netCDF file with the shape (930, 360, 720) where it is organized as (time, latitudes, longitudes).
At each lat/lon pair for each of the 930 time stamps, I need to count the number of times that the value meets or exceeds a threshold "x" (such as 0.2 or 0.5 etc.) and ultimately calculate the percentage that the threshold was exceeded at each point, then output the results so they can be plotted later on.
I have attempted numerous methods but here is my most recent:
lat_length = len(lats)
#where lats has been defined earlier when unpacked from the netCDF dataset
lon_length = len(lons)
#just as lats; also these were defined before using np.meshgrid(lons, lats)
for i in range(0, lat_length):
for j in range(0, lon_length):
if ice[:,i,j] >= x:
#code to count number of occurrences here
#code to calculate percentage here
percent_ice[i,j] += count / len(time) #calculation
#then go on to plot percent_ice
I hope this makes sense! I would greatly appreciate any help. I'm self taught in Python so I may be missing something simple.
Would this be a time to use the any() function? What would be the most efficient way to count the number of times the threshold was exceeded and then calculate the percentage?
You can compare the input 3D array with the threshold x and then sum along the first axis with ndarray.sum(axis=0) to get the count and thereby the percentages, like so -
# Calculate count after thresholding with x and summing along first axis
count = (ice > x).sum(axis=0)
# Get percentages (ratios) by dividing with first axis length
percent_ice = np.true_divide(count,ice.shape[0])
Ah, look, another meteorologist!
There are probably multiple ways to do this and my solution is unlikely to be the fastest since it uses numpy's MaskedArray, which is known to be slow, but this should work:
Numpy has a data type called a MaskedArray which actually contains two normal numpy arrays. It contains a data array as well as a boolean mask. I would first mask all data that are greater than or equal to my threshold (use np.ma.masked_greater() for just greater than):
ice = np.ma.masked_greater_equal(ice)
You can then use ice.count() to determine how many values are below your threshold for each lat/lon point by specifying that you want to count along a specific axis:
n_good = ice.count(axis=0)
This should return a 2-dimensional array containing the number of good points. You can then calculate the number of bad by subtracting n_good from ice.shape[0]:
n_bad = ice.shape[0] - n_good
and calculate the percentage that are bad using:
perc_bad = n_bad/float(ice.shape[0])
There are plenty of ways to do this without using MaskedArray. This is just the easy way that comes to mind for me.

Categories

Resources