I have a series of signals, sample data looks like this:
We can see that there are 5 peaks there. I can assume that there won't be more than 1 pick every 10 samples, usually there is one pick every 20 to 40 samples.
I was trying to fit a polynomial and then use scipy.signal.find_peaks and it kind of works but I have to choose different numbers of spline knots to approximate each series correctly and the number of knots correlates to the number of peaks so I sort of ended up where I begun - but now I'd need only a rough idea about the number of peaks.
Then I tried it by dividing the signal into parts:
window = 10 # the smallest range potentially containing whole peak
parts = np.array_split(data, len(data)//window) # divide data set into parts
lengths = []
d = np.nan
for i in parts:
d = abs(i.max() - i.min())
lengths.append(d) # differences between max and min values in each part
av = sum(lengths)/len(lengths)
for i in lengths:
if i < some_tolerance_fraction*av:
window = window+1 # make part for the next check bigger
break
The idea was that the difference between min and max values in these parts should be smaller than the height of an actual pick I'm looking for unless the parts are large enough to contain whole peak - then the differences should be similar in each part and the average should also be similar to the actual height of the pick.
But this doesn't work at all and possibly doesn't even make sense - depending on the tolerance it divides window all the time or doesn't divide it at all.
this is the array from the image:
array([254256., 254390., 251546., 250561., 250603., 250128., 251000.,
252612., 253552., 253776., 252843., 251800., 250808., 250569.,
249804., 247755., 247685., 247111., 242320., 242580., 243462.,
240383., 239689., 240730., 239508., 239604., 238544., 240174.,
240806., 240218., 239956., 241325., 241343., 241532., 240696.,
242064., 241830., 237569., 237392., 236353., 234819., 234430.,
233890., 233215., 233745., 232159., 231778., 230307., 228754.,
225823., 225139., 223737., 222078., 221188., 220669., 221944.,
223928., 224996., 223405., 223018., 224966., 226590., 226166.,
226012., 226192., 224900., 224439., 223179., 222375., 221509.,
220734., 219686., 218656., 217792., 215934., 214829., 213673.,
212837., 211604., 210748., 210216., 209974., 209659., 209707.,
210131., 210663., 212113., 213078., 214476., 215087., 216220.,
216831., 217286., 217373., 217030., 216491., 215642., 214249.,
213273., 212148., 210846., 209570., 208202., 207165., 206677.,
205703., 203837., 202620., 201530., 198812., 197654., 196506.,
194163., 193736., 193945., 193785., 193417., 193044., 193768.,
194690., 195739., 198592., 199237., 199932., 200142., 199859.,
199593., 199337., 198403., 197500., 195988., 195114., 194278.,
193837., 193861.])
I would use find_peaks of scipy but filtering the signal with a moving average mean:
import numpy as np
import matplotlib.pyplot as plt
arr = np.array([254256., 254390., 251546., 250561., 250603., 250128., 251000.,
252612., 253552., 253776., 252843., 251800., 250808., 250569.,
249804., 247755., 247685., 247111., 242320., 242580., 243462.,
240383., 239689., 240730., 239508., 239604., 238544., 240174.,
240806., 240218., 239956., 241325., 241343., 241532., 240696.,
242064., 241830., 237569., 237392., 236353., 234819., 234430.,
233890., 233215., 233745., 232159., 231778., 230307., 228754.,
225823., 225139., 223737., 222078., 221188., 220669., 221944.,
223928., 224996., 223405., 223018., 224966., 226590., 226166.,
226012., 226192., 224900., 224439., 223179., 222375., 221509.,
220734., 219686., 218656., 217792., 215934., 214829., 213673.,
212837., 211604., 210748., 210216., 209974., 209659., 209707.,
210131., 210663., 212113., 213078., 214476., 215087., 216220.,
216831., 217286., 217373., 217030., 216491., 215642., 214249.,
213273., 212148., 210846., 209570., 208202., 207165., 206677.,
205703., 203837., 202620., 201530., 198812., 197654., 196506.,
194163., 193736., 193945., 193785., 193417., 193044., 193768.,
194690., 195739., 198592., 199237., 199932., 200142., 199859.,
199593., 199337., 198403., 197500., 195988., 195114., 194278.,
193837., 193861.])
def moving_average(x, w):
"""calculate moving average with window size w"""
return np.convolve(x, np.ones(w), 'valid') / w
#moving average with size 5
n=5
arr_f = moving_average(arr, 5)
#to show in same plot
arr_f_ext= np.hstack([np.ones(n//2)*arr_f[0],arr_f])
plt.figure()
plt.plot(arr,'o')
plt.plot(arr_f_ext)
This will show:
Then find peaks:
from scipy.signal import find_peaks
#n//2 is the offset of the averaged signal (2 in this example)
peaks =find_peaks(arr_f)[0] + n//2
plt.plot(peaks,arr[peaks],'xr',ms=10)
wich will show:
Note that,
the filtered signal will have a delay of n/2 samples (rounding down) so add n//2 to the peaks finded in filtered signal.
2)the filtered signal does not have the same values that the original, but same behaviour, Then to extract peak value use the original signal.
My informal definition of a peak is a point surrounded by two vectors, one ascending and one descending. It's pretty easy to implement it by iterating the array and comparing two neighbouring segments.
If they are both in the same direction, we merge the 2 segments by deleting the middle point.
To determine if they are in the same direction, I used multiplication. The product is positive if the 2 segments are in same direction.
At the end, every point will be a peak (we cannot determine for the first and last two).
i = 0 # position cursor at beginning
while i <= (len(t)-3):
if (t[i] - t[i+1]) * (t[i+1] - t[i+2]) >= 0:
# Same direction: join 2 segments by removing the middlepoint.
# This test also include the case of an horizontal segment \
# formed by the first 2 points. We remove the second.
del( t[i+1])
else:
# different directions. Delete nothing. Move cursor by 1
i += 1
see plot. You can see the reduction from 135 to 34 points.
Each blue mark is a peak.
Some of these peaks are non-significant and some more filtering is required. But the best method depend on your application. You may filter on vertical distance between 2 adjacent peaks or the horizontal distance between 2 adjacent peaks. For this last case, we need the x value of each peak so I rewrote the program using x-y data points.
t0 = [254256, 254390, 251546, 250561, 250603, 250128, 251000,
252612, 253552, 253776, 252843, 251800, 250808, 250569,
249804, 247755, 247685, 247111, 242320, 242580, 243462,
240383, 239689, 240730, 239508, 239604, 238544, 240174,
240806, 240218, 239956, 241325, 241343, 241532, 240696,
242064, 241830, 237569, 237392, 236353, 234819, 234430,
233890, 233215, 233745, 232159, 231778, 230307, 228754,
225823, 225139, 223737, 222078, 221188, 220669, 221944,
223928, 224996, 223405, 223018, 224966, 226590, 226166,
226012, 226192, 224900, 224439, 223179, 222375, 221509,
220734, 219686, 218656, 217792, 215934, 214829, 213673,
212837, 211604, 210748, 210216, 209974, 209659, 209707,
210131, 210663, 212113, 213078, 214476, 215087, 216220,
216831, 217286, 217373, 217030, 216491, 215642, 214249,
213273, 212148, 210846, 209570, 208202, 207165, 206677,
205703, 203837, 202620, 201530, 198812, 197654, 196506,
194163, 193736, 193945, 193785, 193417, 193044, 193768,
194690, 195739, 198592, 199237, 199932, 200142, 199859,
199593, 199337, 198403, 197500, 195988, 195114, 194278,
193837, 193861]
def graph( t1, t2):
import matplotlib.pyplot as plt
fig=plt.figure()
plt.plot( [p[0] for p in t1], [p[1] for p in t1], color='r', label="raw data")
plt.plot( [p[0] for p in t2], [p[1] for p in t2], marker='.', color='b', label="reduced data")
plt.title('Peak identification')
plt.legend()
plt.show()
def reduce( t):
i = 0 # position cursor at beginning
while i < (len(t)-2):
if (t[i][1] - t[i+1][1]) * (t[i+1][1] - t[i+2][1]) >= 0:
# Same direction: join 2 segments by removing the middlepoint.
# This test also include the case of an horizontal segment \
# formed by the first 2 points. We remove the second.
del( t[i+1])
else:
# different directions. Delete nothing. Move cursor by 1
i += 1
t1 = [(i,t) for i,t in enumerate(t0)] # add x to every data point
t = t1.copy()
reduce( t)
graph( t1, t)
Have fun!
I have a TOF spectrum and I would like to implement an algorithm using python (numpy) that finds all the maxima of the spectrum and returns the corresponding x values.
I have looked up online and I found the algorithm reported below.
The assumption here is that near the maximum the difference between the value before and the value at the maximum is bigger than a number DELTA. The problem is that my spectrum is composed of points equally distributed, even near the maximum, so that DELTA is never exceeded and the function peakdet returns an empty array.
Do you have any idea how to overcome this problem? I would really appreciate comments to understand better the code since I am quite new in python.
Thanks!
import sys
from numpy import NaN, Inf, arange, isscalar, asarray, array
def peakdet(v, delta, x = None):
maxtab = []
mintab = []
if x is None:
x = arange(len(v))
v = asarray(v)
if len(v) != len(x):
sys.exit('Input vectors v and x must have same length')
if not isscalar(delta):
sys.exit('Input argument delta must be a scalar')
if delta <= 0:
sys.exit('Input argument delta must be positive')
mn, mx = Inf, -Inf
mnpos, mxpos = NaN, NaN
lookformax = True
for i in arange(len(v)):
this = v[i]
if this > mx:
mx = this
mxpos = x[i]
if this < mn:
mn = this
mnpos = x[i]
if lookformax:
if this < mx-delta:
maxtab.append((mxpos, mx))
mn = this
mnpos = x[i]
lookformax = False
else:
if this > mn+delta:
mintab.append((mnpos, mn))
mx = this
mxpos = x[i]
lookformax = True
return array(maxtab), array(mintab)
Below is shown part of the spectrum. I actually have more peaks than those shown here.
This, I think could work as a starting point. I'm not a signal-processing expert, but I tried this on a generated signal Y that looks quite like yours and one with much more noise:
from scipy.signal import convolve
import numpy as np
from matplotlib import pyplot as plt
#Obtaining derivative
kernel = [1, 0, -1]
dY = convolve(Y, kernel, 'valid')
#Checking for sign-flipping
S = np.sign(dY)
ddS = convolve(S, kernel, 'valid')
#These candidates are basically all negative slope positions
#Add one since using 'valid' shrinks the arrays
candidates = np.where(dY < 0)[0] + (len(kernel) - 1)
#Here they are filtered on actually being the final such position in a run of
#negative slopes
peaks = sorted(set(candidates).intersection(np.where(ddS == 2)[0] + 1))
plt.plot(Y)
#If you need a simple filter on peak size you could use:
alpha = -0.0025
peaks = np.array(peaks)[Y[peaks] < alpha]
plt.scatter(peaks, Y[peaks], marker='x', color='g', s=40)
The sample outcomes:
For the noisy one, I filtered peaks with alpha:
If the alpha needs more sophistication you could try dynamically setting alpha from the peaks discovered using e.g. assumptions about them being a mixed gaussian (my favourite being the Otsu threshold, exists in cv and skimage) or some sort of clustering (k-means could work).
And for reference, this I used to generate the signal:
Y = np.zeros(1000)
def peaker(Y, alpha=0.01, df=2, loc=-0.005, size=-.0015, threshold=0.001, decay=0.5):
peaking = False
for i, v in enumerate(Y):
if not peaking:
peaking = np.random.random() < alpha
if peaking:
Y[i] = loc + size * np.random.chisquare(df=2)
continue
elif Y[i - 1] < threshold:
peaking = False
if i > 0:
Y[i] = Y[i - 1] * decay
peaker(Y)
EDIT: Support for degrading base-line
I simulated a slanting base-line by doing this:
Z = np.log2(np.arange(Y.size) + 100) * 0.001
Y = Y + Z[::-1] - Z[-1]
Then to detect with a fixed alpha (note that I changed sign on alpha):
from scipy.signal import medfilt
alpha = 0.0025
Ybase = medfilt(Y, 51) # 51 should be large in comparison to your peak X-axis lengths and an odd number.
peaks = np.array(peaks)[Ybase[peaks] - Y[peaks] > alpha]
Resulting in the following outcome (the base-line is plotted as dashed black line):
EDIT 2: Simplification and a comment
I simplified the code to use one kernel for both convolves as #skymandr commented. This also removed the magic number in adjusting the shrinkage so that any size of the kernel should do.
For the choice of "valid" as option to convolve. It would probably have worked just as well with "same", but I choose "valid" so I didn't have to think about the edge-conditions and if the algorithm could detect spurios peaks there.
As of SciPy version 1.1, you can also use find_peaks:
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import find_peaks
np.random.seed(0)
Y = np.zeros(1000)
# insert #deinonychusaur's peaker function here
peaker(Y)
# make data noisy
Y = Y + 10e-4 * np.random.randn(len(Y))
# find_peaks gets the maxima, so we multiply our signal by -1
Y *= -1
# get the actual peaks
peaks, _ = find_peaks(Y, height=0.002)
# multiply back for plotting purposes
Y *= -1
plt.plot(Y)
plt.plot(peaks, Y[peaks], "x")
plt.show()
This will plot (note that we use height=0.002 which will only find peaks higher than 0.002):
In addition to height, we can also set the minimal distance between two peaks. If you use distance=100, the plot then looks as follows:
You can use
peaks, _ = find_peaks(Y, height=0.002, distance=100)
in the code above.
After looking at the answers and suggestions I decided to offer a solution I often use because it is straightforward and easier to tweak.
It uses a sliding window and counts how many times a local peak appears as a maximum as window shifts along the x-axis. As #DrV suggested, no universal definition of "local maximum" exists, meaning that some tuning parameters are unavoidable. This function uses "window size" and "frequency" to fine tune the outcome. Window size is measured in number of data points of independent variable (x) and frequency counts how sensitive should peak detection be (also expressed as a number of data points; lower values of frequency produce more peaks and vice versa). The main function is here:
def peak_finder(x0, y0, window_size, peak_threshold):
# extend x, y using window size
y = numpy.concatenate([y0, numpy.repeat(y0[-1], window_size)])
x = numpy.concatenate([x0, numpy.arange(x0[-1], x0[-1]+window_size)])
local_max = numpy.zeros(len(x0))
for ii in range(len(x0)):
local_max[ii] = x[y[ii:(ii + window_size)].argmax() + ii]
u, c = numpy.unique(local_max, return_counts=True)
i_return = numpy.where(c>=peak_threshold)[0]
return(list(zip(u[i_return], c[i_return])))
along with a snippet used to produce the figure shown below:
import numpy
from matplotlib import pyplot
def plot_case(axx, w_f):
p = peak_finder(numpy.arange(0, len(Y)), -Y, w_f[0], w_f[1])
r = .9*min(Y)/10
axx.plot(Y)
for ip in p:
axx.text(ip[0], r + Y[int(ip[0])], int(ip[0]),
rotation=90, horizontalalignment='center')
yL = pyplot.gca().get_ylim()
axx.set_ylim([1.15*min(Y), yL[1]])
axx.set_xlim([-50, 1100])
axx.set_title(f'window: {w_f[0]}, count: {w_f[1]}', loc='left', fontsize=10)
return(None)
window_frequency = {1:(15, 15), 2:(100, 100), 3:(100, 5)}
f, ax = pyplot.subplots(1, 3, sharey='row', figsize=(9, 4),
gridspec_kw = {'hspace':0, 'wspace':0, 'left':.08,
'right':.99, 'top':.93, 'bottom':.06})
for k, v in window_frequency.items():
plot_case(ax[k-1], v)
pyplot.show()
Three cases show parameter values that render (from left to right panel):
(1) too many, (2) too few, and (3) an intermediate amount of peaks.
To generate Y data, I used the function #deinonychusaur gave above, and added some noise to it from #Cleb's answer.
I hope some might find this useful, but it's efficiency primarily depends on actual peak shapes and distances.
Finding a minimum or a maximum is not that simple, because there is no universal definition for "local maximum".
Your code seems to look for a miximum and then accept it as a maximum if the signal falls after the maximum below the maximum minus some delta value. After that it starts to look for a minimum with similar criteria. It does not really matter if your data falls or rises slowly, as the maximum is recorded when it is reached and appended to the list of maxima once the level fallse below the hysteresis threshold.
This is a possible way to find local minima and maxima, but it has several shortcomings. One of them is that the method is not symmetric, i.e. if the same data is run backwards, the results are not necessarily the same.
Unfortunately, I cannot help much more, because the correct method really depends on the data you are looking at, its shape and its noisiness. If you have some samples, then we might be able to come up with some suggestions.
I have encountered a strange problem: when I store a huge amount of data points from a nonlinear equation to 3 arrays (x, y ,and z) and then tried to plot them in a 2D graph (theta-phi plot, hence its 2D).
I tried to eliminate points needed to be plotted by sampling points from every 20 data points, since the z-data is approximately periodic. I picked those points with z value just above zero to make sure I picked one point for every period.
The problem arises when I tried to do the above. I got only a very limited number of points on the graph, approximately 152 points, regardless of how I changed my initial number of data points (as long as it surpassed a certain number of course).
I suspect that it might be some command I use wrongly or the capacity of array is smaller then I expected (seems unlikely), could anyone help me find out where is the problem?
def drawstaticplot(m,n, d_n, n_o):
counter=0
for i in range(0,m):
n=vector.rungekutta1(n, d_n)
d_n=vector.rungekutta2(n, d_n, i)
x1 = n[0]
y1 = n[1]
z1 = n[2]
if i%20==0:
xarray.append(x1)
yarray.append(y1)
zarray.append(z1)
for j in range(0,(m/20)-20):
if (((zarray[j]-n_o)>0) and ((zarray[j+1]-n_o)<0)):
counter= counter +1
print zarray[j]-n_o,counter
plotthetaphi(xarray[j],yarray[j],zarray[j])
def plotthetaphi(x,y,z):
phi= math.acos(z/math.sqrt(x**2+y**2+z**2))
theta = math.acos(x/math.sqrt(x**2 + y**2))
plot(theta, phi,'.',color='red')
Besides, I tried to apply the code in the following SO question to my code, I want a very similar result except that my data points are not randomly generated.
Shiuan,
I am still investigating your problem, how ever a few notes:
Instead of looping and appending to an array you could do:
select every nth element:
# inside IPython console:
[2]: a=np.arange(0,10)
In [3]: a[::2] # here we select every 2nd element.
Out[3]: array([0, 2, 4, 6, 8])
so instead of calcultating runga-kutta on all elements of m:
new_m = m[::20] # select every element of m.
now call your function like this:
def drawstaticplot(new_m,n, d_n, n_o):
n=vector.rungekutta1(n, d_n)
d_n=vector.rungekutta2(n, d_n, i)
x1 = n[0]
y1 = n[1]
z1 = n[2]
xarray.append(x1)
yarray.append(y1)
zarray.append(z1)
...
about appending, and iterating over large data sets:
append in general is slow, because it copies the whole array and then
stacks the new element. Instead, you already know the size of n, so you could do:
def drawstaticplot(new_m,n, d_n, n_o):
# create the storage based on n,
# notice i assumed that rungekutta, returns n the size of new_m,
# but you can change it.
x,y,z = np.zeros(n.shape[0]),np.zeros(n.shape[0]), np.zeros(n.shape[0])
for idx, itme in enumerate(new_m): # notice the function enumerate, make it your friend!
n=vector.rungekutta1(n, d_n)
d_n=vector.rungekutta2(n, d_n, ite,)
x1 = n[0]
y1 = n[1]
z1 = n[2]
#if i%20==0: # we don't need to check for the 20th element, m is already filtered...
xarray[idx] = n[0]
yarray[idx] = n[1]
zarray[idx] = n[2]
# is the second loop necessary?
if (((zarray[idx]-n_o)>0) and ((zarray[j+1]-n_o)<0)):
print zarray[idx]-n_o,counter
plotthetaphi(xarray[idx],yarray[idx],zarray[idx])
You can use the approach suggested here:
Efficiently create a density plot for high-density regions, points for sparse regions
e.g. histogram where you have too many points and points where the density is low.
Or also you can use rasterized flag for matplotlib, which speeds up matplotlib.