I'm trying to plot a 3rd-order polynomial, and two linear fits on the same set of data. My data looks like this:
,Frequency,Flux Density,log_freq,log_flux
0,1.25e+18,1.86e-07,18.096910013008056,-6.730487055782084
1,699000000000000.0,1.07e-06,14.84447717574568,-5.97061622231479
2,541000000000000.0,1.1e-06,14.73319726510657,-5.958607314841775
3,468000000000000.0,1e-06,14.670245853074125,-6.0
4,458000000000000.0,1.77e-06,14.660865478003869,-5.752026733638194
5,89400000000000.0,3.01e-05,13.951337518795917,-4.521433504406157
6,89400000000000.0,9.3e-05,13.951337518795917,-4.031517051446065
7,89400000000000.0,0.00187,13.951337518795917,-2.728158393463501
8,65100000000000.0,2.44e-05,13.813580988568193,-4.61261017366127
9,65100000000000.0,6.28e-05,13.813580988568193,-4.2020403562628035
10,65100000000000.0,0.00108,13.813580988568193,-2.96657624451305
11,25900000000000.0,0.000785,13.413299764081252,-3.1051303432547472
12,25900000000000.0,0.00106,13.413299764081252,-2.9746941347352296
13,25900000000000.0,0.000796,13.413299764081252,-3.099086932262331
14,13600000000000.0,0.00339,13.133538908370218,-2.469800301796918
15,13600000000000.0,0.00372,13.133538908370218,-2.4294570601181023
16,13600000000000.0,0.00308,13.133538908370218,-2.5114492834995557
17,12700000000000.0,0.00222,13.103803720955957,-2.653647025549361
18,12700000000000.0,0.00204,13.103803720955957,-2.6903698325741012
19,230000000000.0,0.133,11.361727836017593,-0.8761483590329142
22,90000000000.0,0.518,10.954242509439325,-0.28567024025476695
23,61000000000.0,1.0,10.785329835010767,0.0
24,61000000000.0,0.1,10.785329835010767,-1.0
25,61000000000.0,0.4,10.785329835010767,-0.3979400086720376
26,42400000000.0,0.8,10.627365856592732,-0.09691001300805639
27,41000000000.0,0.9,10.612783856719735,-0.045757490560675115
28,41000000000.0,0.7,10.612783856719735,-0.1549019599857432
29,41000000000.0,0.8,10.612783856719735,-0.09691001300805639
30,41000000000.0,0.6,10.612783856719735,-0.2218487496163564
31,41000000000.0,0.7,10.612783856719735,-0.1549019599857432
32,37000000000.0,1.0,10.568201724066995,0.0
33,36800000000.0,1.0,10.565847818673518,0.0
34,36800000000.0,0.98,10.565847818673518,-0.00877392430750515
35,33000000000.0,0.8,10.518513939877888,-0.09691001300805639
36,33000000000.0,1.0,10.518513939877888,0.0
37,31400000000.0,0.92,10.496929648073214,-0.036212172654444715
38,23000000000.0,1.4,10.361727836017593,0.146128035678238
39,23000000000.0,1.1,10.361727836017593,0.04139268515822508
40,23000000000.0,1.11,10.361727836017593,0.045322978786657475
41,23000000000.0,1.1,10.361727836017593,0.04139268515822508
42,22200000000.0,1.23,10.346352974450639,0.08990511143939793
43,22200000000.0,1.24,10.346352974450639,0.09342168516223506
44,21700000000.0,0.98,10.33645973384853,-0.00877392430750515
45,21700000000.0,1.07,10.33645973384853,0.029383777685209667
46,20000000000.0,1.44,10.301029995663981,0.15836249209524964
47,15400000000.0,1.32,10.187520720836464,0.12057393120584989
48,15000000000.0,1.5,10.176091259055681,0.17609125905568124
49,15000000000.0,1.5,10.176091259055681,0.17609125905568124
50,15000000000.0,1.42,10.176091259055681,0.15228834438305647
51,15000000000.0,1.43,10.176091259055681,0.1553360374650618
52,15000000000.0,1.42,10.176091259055681,0.15228834438305647
53,15000000000.0,1.47,10.176091259055681,0.1673173347481761
54,15000000000.0,1.38,10.176091259055681,0.13987908640123647
55,10700000000.0,2.59,10.02938377768521,0.4132997640812518
56,8870000000.0,2.79,9.947923619831727,0.44560420327359757
57,8460000000.0,2.69,9.927370363039023,0.42975228000240795
58,8400000000.0,2.8,9.924279286061882,0.4471580313422192
59,8400000000.0,2.53,9.924279286061882,0.40312052117581787
60,8400000000.0,2.06,9.924279286061882,0.31386722036915343
61,8300000000.0,2.58,9.919078092376074,0.41161970596323016
62,8080000000.0,2.76,9.907411360774587,0.4409090820652177
63,5010000000.0,3.68,9.699837725867246,0.5658478186735176
64,5000000000.0,0.81,9.698970004336019,-0.09151498112135022
65,5000000000.0,3.5,9.698970004336019,0.5440680443502757
66,5000000000.0,3.57,9.698970004336019,0.5526682161121932
67,4980000000.0,3.46,9.697229342759718,0.5390760987927766
68,4900000000.0,2.95,9.690196080028514,0.46982201597816303
69,4850000000.0,3.46,9.685741738602264,0.5390760987927766
70,4850000000.0,3.45,9.685741738602264,0.5378190950732742
71,4780000000.0,2.16,9.679427896612118,0.3344537511509309
72,4540000000.0,3.61,9.657055852857104,0.557507201905658
73,2700000000.0,3.5,9.431363764158988,0.5440680443502757
74,2700000000.0,3.7,9.431363764158988,0.568201724066995
75,2700000000.0,3.92,9.431363764158988,0.5932860670204573
76,2700000000.0,3.92,9.431363764158988,0.5932860670204573
77,2250000000.0,4.21,9.352182518111363,0.6242820958356683
78,1660000000.0,3.69,9.220108088040055,0.5670263661590603
79,1660000000.0,3.8,9.220108088040055,0.5797835966168101
80,1410000000.0,3.5,9.14921911265538,0.5440680443502757
81,1400000000.0,3.45,9.146128035678238,0.5378190950732742
82,1400000000.0,3.28,9.146128035678238,0.5158738437116791
83,1400000000.0,3.19,9.146128035678238,0.5037906830571811
84,1400000000.0,3.51,9.146128035678238,0.5453071164658241
85,1340000000.0,3.31,9.127104798364808,0.5198279937757188
86,1340000000.0,3.31,9.127104798364808,0.5198279937757188
87,750000000.0,3.14,8.8750612633917,0.49692964807321494
88,408000000.0,1.46,8.61066016308988,0.1643528557844371
89,408000000.0,1.46,8.61066016308988,0.1643528557844371
90,365000000.0,1.62,8.562292864456476,0.20951501454263097
91,365000000.0,1.56,8.562292864456476,0.1931245983544616
92,333000000.0,1.32,8.52244423350632,0.12057393120584989
93,302000000.0,1.23,8.48000694295715,0.08990511143939793
94,151000000.0,2.13,8.178976947293169,0.3283796034387377
95,73800000.0,3.58,7.868056361823042,0.5538830266438743
and my code is
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import numpy.polynomial.polynomial as poly
def find_extrema(poly, bounds):
'''
Finds the extrema of the polynomial; ensure real.
https://stackoverflow.com/questions/72932816/python-finding-local-maxima-minima-for-multiple-polynomials-efficiently
'''
deriv = poly.deriv()
extrema = deriv.roots()
# Filter out complex roots
extrema = extrema[np.isreal(extrema)]
# Get real part of root
extrema = np.real(extrema)
# Apply bounds check
lb, ub = bounds
extrema = extrema[(lb <= extrema) & (extrema <= ub)]
return extrema
def find_maximum(poly, bounds):
'''
Find the maximum point; returns the value of the turnover frequency.
https://stackoverflow.com/questions/72932816/python-finding-local-maxima-minima-for-multiple-polynomials-efficiently
'''
extrema = find_extrema(poly, bounds)
# Either bound could end up being the minimum. Check those too.
extrema = np.concatenate((extrema, bounds))
value_at_extrema = poly(extrema)
maximum_index = np.argmax(value_at_extrema)
return extrema[maximum_index]
# LOAD THE DATA FROM FILE HERE
# CARRY ON...
xvar = 'log_freq'
yvar = 'log_flux'
x, y = pks[xvar], pks[yvar]
lower = min(x)
upper = max(x)
# Find the 3rd-order polynomial which fits the SED
coefs = poly.polyfit(x, y, 3) # find the coeffs
x_new = np.linspace(lower, upper, num=len(x)*10) # space to plot the fit
ffit = poly.Polynomial(coefs) # find the polynomial
# Find turnover frequency and peak flux
nu_to = find_maximum(ffit, (lower, upper))
F_p = ffit(nu_to)
# HERE'S THE TRICKY BIT
# Find the straight line to fit to the left of nu_to
left_linefit = poly.polyfit(x, y, 1)
x_left = np.linspace(lower, nu_to, num=len(x)*10) # space to plot the fit
ffit_thin = poly.Polynomial(left_linefit,
domain = (lower, nu_to)
)
# PLOTS THE POLYNOMIAL WELL
ax1 = plt.subplot(1, 1, 1)
ax1.scatter(pks[xvar], pks[yvar], label = 'PKS 0742+10', c = 'b')
ax1.plot(x_new, ffit(x_new), color = 'r')
ax1.plot(x_left, ffit_left(x_left), color = 'gold')
ax1.set_yscale('linear')
ax1.set_xscale('linear')
ax1.legend()
ax1.set_xlabel(r'$\log\nu$ ($\nu$ in Hz)')
ax1.set_ylabel(r'$\log F_{\nu}$ ($F_{\nu}$ in Jy)')
ax1.grid(axis = 'both', which = 'major')
The code produces the poly fit well:
I'm trying to plot the straight-line fits for the points on either side of the maximum, as shown schematically below:
I thought I could do it with
ffit_left = poly.Polynomial(left_linefit,
domain = (lower, nu_to)
)
and similar for ffit_right, but that produces
which is actually the straight-line fit for the whole dataset, plotted only for that domain. I don't want to manipulate the dataset, because eventually I'll have to do it on a lot of datasets.
The fitting part of the code comes from an answer to this question .
How can I fit a straight line to just set of points without manipulating the dataset?
My guess is that I have to make left_linefit = poly.polyfit(x, y, 1) recognise a domain, but I can't see anything in the numpy polyfit docs.
Sorry for the long question!
I am not sure to well understand your request. If you want to fit a piecewise function made of three linear segments a method is described in https://fr.scribd.com/document/380941024/Regression-par-morceaux-Piecewise-Regression-pdf with theory and numerical examples.
Several cases are considered. Among them the case below might be convenient for you.
H(*) is the Heaviside step function.
I'm wondering if there's a way I can find the range of local maxima of a histogram. For instance, suppose I have the following histogram (just ignore the orange curve):
The histogram is actually obtained from a dictionary. I'm hoping to find the range of local maxima of this histogram (on the horizontal axis), which are, say, 1.3-1.6, and 2.1-2.4 in this case. I have no idea which tools would be helpful or which techniques I may want to use. I know there's a tool to find local maxima of a 1-D array:
from scipy.signal import argrelextrema
x = np.random.random(12)
argrelextrema(x, np.greater)
but I don't think it would work here since I'm looking for a range, and there're some 'wiggles' on the histogram. Can anyone give me some suggestions/examples about how I can obtain the range I'm looking for? Thanks a lot for the help
PS: I trying to not just search for the ranges of x whose y values are above a certain limit:)
I don't know if I correctly understand what you want to do, but you can treat the histogram as a Probability Density Function (PDF) of a bimodal distribution, then find the modes and the Highest Density Intervals (HDIs) around the two modes.
So, I create some sample data
import numpy as np
import pandas as pd
import scipy.stats as sps
from scipy.signal import find_peaks, argrelextrema
import matplotlib.pyplot as plt
d1 = sps.norm(loc=1.3, scale=.2)
d2 = sps.norm(loc=2.2, scale=.3)
r1 = d1.rvs(size=5000, random_state=1)
r2 = d2.rvs(size=5000, random_state=1)
r = np.concatenate((r1, r2))
h = plt.hist(r, bins=100, density=True);
We have only h, the result of the hist function that will contains the density (100) and the ranges of the bins (101).
print(h[0].size)
100
print(h[1].size)
101
So we first need to choose the mean of each bin
density = h[0]
values = h[1][:-1] + np.diff(h[1])[0] / 2
plt.hist(r, bins=100, density=True, alpha=.25)
plt.plot(values, density);
Now we can normalize the PDF (to sum to 1) and smooth the data with moving average that we'll use only to get the peaks (maxima) and minima
norm_density = density / density.sum()
norm_density_ma = pd.Series(norm_density).rolling(7, center=True).mean().values
plt.plot(values, norm_density_ma)
plt.plot(values, norm_density);
Now we can obtain indexes of maxima
peaks = find_peaks(norm_density_ma)[0]
peaks
array([24, 57])
and minima
minima = argrelextrema(norm_density_ma, np.less)[0]
minima
array([40])
and check they're correct
plt.plot(values, norm_density_ma)
plt.plot(values, norm_density)
for peak in peaks:
plt.axvline(values[peak], color='r')
plt.axvline(values[minima], color='k', ls='--');
Finally, we have to find out the HDIs around the two modes (peaks) from the normalized h histogram data. We can use a simple function to get the HDI of grid (see HDI_of_grid for details and Doing Bayesian Data Analysis by John K. Kruschke)
def HDI_of_grid(probMassVec, credMass=0.95):
sortedProbMass = np.sort(probMassVec, axis=None)[::-1]
HDIheightIdx = np.min(np.where(np.cumsum(sortedProbMass) >= credMass))
HDIheight = sortedProbMass[HDIheightIdx]
HDImass = np.sum(probMassVec[probMassVec >= HDIheight])
idx = np.where(probMassVec >= HDIheight)[0]
return {'indexes':idx, 'mass':HDImass, 'height':HDIheight}
Let's say we want the HDIs to contain a mass of 0.3
# HDI around the 1st mode
hdi1 = HDI_of_grid(norm_density, credMass=.3)
plt.plot(values, norm_density_ma)
plt.plot(values, norm_density)
plt.fill_between(
values[hdi1['indexes']],
0, norm_density[hdi1['indexes']],
alpha=.25
)
for peak in peaks:
plt.axvline(values[peak], color='r')
for the 2nd mode, we'll get HDI from minima to avoid the 1st mode
# HDI around the 2nd mode
hdi2 = HDI_of_grid(norm_density[minima[0]:], credMass=.3)
plt.plot(values, norm_density_ma)
plt.plot(values, norm_density)
plt.fill_between(
values[hdi1['indexes']],
0, norm_density[hdi1['indexes']],
alpha=.25
)
plt.fill_between(
values[hdi2['indexes']+minima],
0, norm_density[hdi2['indexes']+minima],
alpha=.25
)
for peak in peaks:
plt.axvline(values[peak], color='r')
And we have the values of the two HDIs
# 1st mode
values[peaks[0]]
1.320249129265321
# 0.3 HDI
values[hdi1['indexes']].take([0, -1])
array([1.12857599, 1.45715851])
# 2nd mode
values[peaks[1]]
2.2238510564735363
# 0.3 HDI
values[hdi2['indexes']+minima].take([0, -1])
array([1.95003229, 2.47028795])
I have a TOF spectrum and I would like to implement an algorithm using python (numpy) that finds all the maxima of the spectrum and returns the corresponding x values.
I have looked up online and I found the algorithm reported below.
The assumption here is that near the maximum the difference between the value before and the value at the maximum is bigger than a number DELTA. The problem is that my spectrum is composed of points equally distributed, even near the maximum, so that DELTA is never exceeded and the function peakdet returns an empty array.
Do you have any idea how to overcome this problem? I would really appreciate comments to understand better the code since I am quite new in python.
Thanks!
import sys
from numpy import NaN, Inf, arange, isscalar, asarray, array
def peakdet(v, delta, x = None):
maxtab = []
mintab = []
if x is None:
x = arange(len(v))
v = asarray(v)
if len(v) != len(x):
sys.exit('Input vectors v and x must have same length')
if not isscalar(delta):
sys.exit('Input argument delta must be a scalar')
if delta <= 0:
sys.exit('Input argument delta must be positive')
mn, mx = Inf, -Inf
mnpos, mxpos = NaN, NaN
lookformax = True
for i in arange(len(v)):
this = v[i]
if this > mx:
mx = this
mxpos = x[i]
if this < mn:
mn = this
mnpos = x[i]
if lookformax:
if this < mx-delta:
maxtab.append((mxpos, mx))
mn = this
mnpos = x[i]
lookformax = False
else:
if this > mn+delta:
mintab.append((mnpos, mn))
mx = this
mxpos = x[i]
lookformax = True
return array(maxtab), array(mintab)
Below is shown part of the spectrum. I actually have more peaks than those shown here.
This, I think could work as a starting point. I'm not a signal-processing expert, but I tried this on a generated signal Y that looks quite like yours and one with much more noise:
from scipy.signal import convolve
import numpy as np
from matplotlib import pyplot as plt
#Obtaining derivative
kernel = [1, 0, -1]
dY = convolve(Y, kernel, 'valid')
#Checking for sign-flipping
S = np.sign(dY)
ddS = convolve(S, kernel, 'valid')
#These candidates are basically all negative slope positions
#Add one since using 'valid' shrinks the arrays
candidates = np.where(dY < 0)[0] + (len(kernel) - 1)
#Here they are filtered on actually being the final such position in a run of
#negative slopes
peaks = sorted(set(candidates).intersection(np.where(ddS == 2)[0] + 1))
plt.plot(Y)
#If you need a simple filter on peak size you could use:
alpha = -0.0025
peaks = np.array(peaks)[Y[peaks] < alpha]
plt.scatter(peaks, Y[peaks], marker='x', color='g', s=40)
The sample outcomes:
For the noisy one, I filtered peaks with alpha:
If the alpha needs more sophistication you could try dynamically setting alpha from the peaks discovered using e.g. assumptions about them being a mixed gaussian (my favourite being the Otsu threshold, exists in cv and skimage) or some sort of clustering (k-means could work).
And for reference, this I used to generate the signal:
Y = np.zeros(1000)
def peaker(Y, alpha=0.01, df=2, loc=-0.005, size=-.0015, threshold=0.001, decay=0.5):
peaking = False
for i, v in enumerate(Y):
if not peaking:
peaking = np.random.random() < alpha
if peaking:
Y[i] = loc + size * np.random.chisquare(df=2)
continue
elif Y[i - 1] < threshold:
peaking = False
if i > 0:
Y[i] = Y[i - 1] * decay
peaker(Y)
EDIT: Support for degrading base-line
I simulated a slanting base-line by doing this:
Z = np.log2(np.arange(Y.size) + 100) * 0.001
Y = Y + Z[::-1] - Z[-1]
Then to detect with a fixed alpha (note that I changed sign on alpha):
from scipy.signal import medfilt
alpha = 0.0025
Ybase = medfilt(Y, 51) # 51 should be large in comparison to your peak X-axis lengths and an odd number.
peaks = np.array(peaks)[Ybase[peaks] - Y[peaks] > alpha]
Resulting in the following outcome (the base-line is plotted as dashed black line):
EDIT 2: Simplification and a comment
I simplified the code to use one kernel for both convolves as #skymandr commented. This also removed the magic number in adjusting the shrinkage so that any size of the kernel should do.
For the choice of "valid" as option to convolve. It would probably have worked just as well with "same", but I choose "valid" so I didn't have to think about the edge-conditions and if the algorithm could detect spurios peaks there.
As of SciPy version 1.1, you can also use find_peaks:
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import find_peaks
np.random.seed(0)
Y = np.zeros(1000)
# insert #deinonychusaur's peaker function here
peaker(Y)
# make data noisy
Y = Y + 10e-4 * np.random.randn(len(Y))
# find_peaks gets the maxima, so we multiply our signal by -1
Y *= -1
# get the actual peaks
peaks, _ = find_peaks(Y, height=0.002)
# multiply back for plotting purposes
Y *= -1
plt.plot(Y)
plt.plot(peaks, Y[peaks], "x")
plt.show()
This will plot (note that we use height=0.002 which will only find peaks higher than 0.002):
In addition to height, we can also set the minimal distance between two peaks. If you use distance=100, the plot then looks as follows:
You can use
peaks, _ = find_peaks(Y, height=0.002, distance=100)
in the code above.
After looking at the answers and suggestions I decided to offer a solution I often use because it is straightforward and easier to tweak.
It uses a sliding window and counts how many times a local peak appears as a maximum as window shifts along the x-axis. As #DrV suggested, no universal definition of "local maximum" exists, meaning that some tuning parameters are unavoidable. This function uses "window size" and "frequency" to fine tune the outcome. Window size is measured in number of data points of independent variable (x) and frequency counts how sensitive should peak detection be (also expressed as a number of data points; lower values of frequency produce more peaks and vice versa). The main function is here:
def peak_finder(x0, y0, window_size, peak_threshold):
# extend x, y using window size
y = numpy.concatenate([y0, numpy.repeat(y0[-1], window_size)])
x = numpy.concatenate([x0, numpy.arange(x0[-1], x0[-1]+window_size)])
local_max = numpy.zeros(len(x0))
for ii in range(len(x0)):
local_max[ii] = x[y[ii:(ii + window_size)].argmax() + ii]
u, c = numpy.unique(local_max, return_counts=True)
i_return = numpy.where(c>=peak_threshold)[0]
return(list(zip(u[i_return], c[i_return])))
along with a snippet used to produce the figure shown below:
import numpy
from matplotlib import pyplot
def plot_case(axx, w_f):
p = peak_finder(numpy.arange(0, len(Y)), -Y, w_f[0], w_f[1])
r = .9*min(Y)/10
axx.plot(Y)
for ip in p:
axx.text(ip[0], r + Y[int(ip[0])], int(ip[0]),
rotation=90, horizontalalignment='center')
yL = pyplot.gca().get_ylim()
axx.set_ylim([1.15*min(Y), yL[1]])
axx.set_xlim([-50, 1100])
axx.set_title(f'window: {w_f[0]}, count: {w_f[1]}', loc='left', fontsize=10)
return(None)
window_frequency = {1:(15, 15), 2:(100, 100), 3:(100, 5)}
f, ax = pyplot.subplots(1, 3, sharey='row', figsize=(9, 4),
gridspec_kw = {'hspace':0, 'wspace':0, 'left':.08,
'right':.99, 'top':.93, 'bottom':.06})
for k, v in window_frequency.items():
plot_case(ax[k-1], v)
pyplot.show()
Three cases show parameter values that render (from left to right panel):
(1) too many, (2) too few, and (3) an intermediate amount of peaks.
To generate Y data, I used the function #deinonychusaur gave above, and added some noise to it from #Cleb's answer.
I hope some might find this useful, but it's efficiency primarily depends on actual peak shapes and distances.
Finding a minimum or a maximum is not that simple, because there is no universal definition for "local maximum".
Your code seems to look for a miximum and then accept it as a maximum if the signal falls after the maximum below the maximum minus some delta value. After that it starts to look for a minimum with similar criteria. It does not really matter if your data falls or rises slowly, as the maximum is recorded when it is reached and appended to the list of maxima once the level fallse below the hysteresis threshold.
This is a possible way to find local minima and maxima, but it has several shortcomings. One of them is that the method is not symmetric, i.e. if the same data is run backwards, the results are not necessarily the same.
Unfortunately, I cannot help much more, because the correct method really depends on the data you are looking at, its shape and its noisiness. If you have some samples, then we might be able to come up with some suggestions.