scipy.signal peak_widths not evaluating at correct index - python

I'm trying to find the FWHM of a curve I've generated. This is the code for the curve and a picture of what it looks like.
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
from scipy.signal import peak_widths, find_peaks
k = np.load('/data/Var/test.npy')
x = np.load('/data/Var/sigrng.npy')
peakk = find_peaks(k)
kfwhm = peak_widths(k, peakk[0], rel_height=0.5)
plt.plot(x, k, ls='--', color='red')
plt.show()
Which produces this curve:
However, when I print out the output from what is supposed to be the FWHM and plot it on the curve it's not evaluating on the actual curve and is giving values much larger than what I expect.
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
from scipy.signal import peak_widths, find_peaks
k = np.load('/data/Var/test.npy')
x = np.load('/data/Var/sigrng.npy')
peakk = find_peaks(k)
kfwhm = peak_widths(k, peakk[0], rel_height=0.5)
plt.plot(x, k, ls='--', color='red')
plt.hlines(*kfwhm[1:], color="red")
plt.show()
Can anyone see anything I'm doing wrong that could cause this? When checking where the peak in the curve is manually the find_peaks function is working correctly.

I worked out the problem. It is due to the fact that when it calculates the FWHM it does not take into account the x values that you plot the curve against. Since my plots are supposed to go between 0 and 1 I simply divide the values by the size of the array on the y axis (here the array k) and it fixes the issue. Code below:
peakk, _ = find_peaks(k)
kfwhm = peak_widths(k, peakk, rel_height=0.5)
kfwhm = np.asarray(kfwhm)
plt.plot(x_k, k, ls='--', color='red')
plt.hlines(0.5, *(kfwhm[2:]/k.size), color="blue")
plt.show()
Giving:
and a FWMH of:
print(kfwhm[0]/k.size)
Out[384]: array([0.02415405])

Related

Mark the maximum deviation/distance between curves

I just want to mark where the maximum deviation is occurred between two curves using matplotlib. Please help me.
The vertical distance is for the Kolmogorov–Smirnov test
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import scipy.stats as stats
#------------------------------------
data=np.random.uniform(low=1,high=10,size=300)
standardized_data=np.sort(data-np.mean(data))/np.std(data)
probs=np.arange(1.0, 301)/300
plt.plot(standardized_data,probs) #curve1
plt.plot(stats.norm.ppf(probs),probs) #curve2
plt.show()
output:
like this
or like this
You would need to interpolate one of the curves on the x values of the other. Then find the maximum of their difference.
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
#------------------------------------
data=np.random.uniform(low=1,high=10,size=300)
x1= np.sort(data-np.mean(data))/np.std(data)
y = np.arange(1.0, 301)/300
x2 = stats.norm.ppf(y)
yc = np.interp(x1, x2, y)
ind_max = np.argmax((yc-y)**2)
plt.plot(x1, y) #curve1
plt.plot(x2, y) #curve2
plt.axvline(x1[ind_max], color="red", linestyle="dashed", alpha=0.4)
plt.plot([x1[ind_max], x1[ind_max]], [y[ind_max], yc[ind_max]], color="red")
plt.show()

Marking y value using dotted line in matplotlib.pyplot

I am trying to plot a graph using matplotlib.pyplot.
import matplotlib.pyplot as plt
import numpy as np
x = [i for i in range (1,201)]
y = np.loadtxt('final_fscore.txt', dtype=np.float128)
plt.plot(x, y, lw=2)
plt.show()
It looks something like this:
I want to mark the first value of x where y has reached the highest ( which is already known, say for x= 23, y= y[23]), like this figure shown below:
I have been searching this for some time now, with little success. I have tried adding a straight line for now, which is not behaving the desired way:
import matplotlib.pyplot as plt
import numpy as np
x = [i for i in range (1,201)]
y = np.loadtxt('final_fscore.txt', dtype=np.float128)
plt.plot(x, y, lw=2)
plt.plot([23,y[23]], [23,0])
plt.show()
Resulting graph:
Note: I want to make the figure like in the second graph.
It's not clear what y[23] would do here. You would need to find out the maximum value and the index at which this occurs (np.argmax). You may then use this to plot a 3 point line with those coordinates.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(9)
x = np.arange(200)
y = np.cumsum(np.random.randn(200))
plt.plot(x, y, lw=2)
amax = np.argmax(y)
xlim,ylim = plt.xlim(), plt.ylim()
plt.plot([x[amax], x[amax], xlim[0]], [xlim[0], y[amax], y[amax]],
linestyle="--")
plt.xlim(xlim)
plt.ylim(ylim)
plt.show()

Problems with unpacking Matplotlib hist2d outputs

I'm using Matplotlib's function hist2d() and I want to unpack the output in order to further use it. Here's what I do: I simply load with numpy a 2-column file containing my data and use the following code
import matplotlib.pyplot as plt
from matplotlib.colors import LogNorm
import numpy as np
traj = np.loadtxt('trajectory.txt')
x = traj[:,0]
y = traj[:,1]
M, xe, ye, img = plt.hist2d(x, y, bins = 80, norm = LogNorm())
plt.imshow(M)
plt.show()
The result I get is the following:
Instead, if I try to directly plot the hist2d results without unpacking them:
import matplotlib.pyplot as plt
from matplotlib.colors import LogNorm
import numpy as np
traj = np.loadtxt('trajectory.txt')
x = traj[:,0]
y = traj[:,1]
plt.hist2d(x, y, bins = 80, norm = LogNorm())
plt.show()
I get the whole plot without the strange blue box. What am I doing wrong?
You can create a histogram plot directly with plt.hist2d. This calculates the histogram and plots it to the current axes. There is no need to show it yet another time using imshow.
import matplotlib.pyplot as plt
from matplotlib.colors import LogNorm
import numpy as np; np.random.seed(9)
x = np.random.rayleigh(size=9900)
y = np.random.rayleigh(size=9900)
M, xe, ye, img = plt.hist2d(x, y, bins = 80, norm = LogNorm())
plt.show()
Or, you may first calculate the histogram and afterwards plot the result as an image to the current axes. Note that the histogram produced by numpy is transposed, see Matplotlib 2D histogram seems transposed, making it necessary to call imshow(M.T). Also note that in order to obtain the correct axes labeling, you need to set the imshow's extent to the extremal values of the xe and ye edge arrays.
import matplotlib.pyplot as plt
from matplotlib.colors import LogNorm
import numpy as np; np.random.seed(9)
x = np.random.rayleigh(size=9900)
y = np.random.rayleigh(size=9900)
M, xe, ye = np.histogram2d(x, y, bins = 80)
extent = [xe[0], xe[-1], ye[0], ye[-1]]
plt.imshow(M.T, extent=extent, norm = LogNorm(), origin="lower")
plt.show()

Plot a density function above a histogram

In Python, I have estimated the parameters for the density of a model of my distribution and I would like to plot the density function above the histogram of the distribution. In R it is similar to using the option prop=TRUE.
import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
# initialization of the list "data"
# estimation of the parameter, in my case, mean and variance of a normal distribution
plt.hist(data, bins="auto") # data is the list of data
# here I would like to draw the density above the histogram
plt.show()
I guess the trickiest part is to make it fit.
Edit: I have tried this according to the first answer:
mean = np.mean(logdata)
var = np.var(logdata)
std = np.sqrt(var) # standard deviation, used by numpy as a replacement of the variance
plt.hist(logdata, bins="auto", alpha=0.5, label="données empiriques")
x = np.linspace(min(logdata), max(logdata), 100)
plt.plot(x, mlab.normpdf(x, mean, std))
plt.xlabel("log(taille des fichiers)")
plt.ylabel("nombre de fichiers")
plt.legend(loc='upper right')
plt.grid(True)
plt.show()
But it doesn't fit the graph, here is how it looks:
** Edit 2 ** Works with the option normed=True in the histogram function.
If I understand you correctly you have the mean and standard deviation of some data. You have plotted a histogram of this and would like to plot the normal distribution line over the histogram. This line can be generated using matplotlib.mlab.normpdf(), the documentation can be found here.
import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
mean = 100
sigma = 5
data = np.random.normal(mean,sigma,1000) # generate fake data
x = np.linspace(min(data), max(data), 100)
plt.hist(data, bins="auto",normed=True)
plt.plot(x, mlab.normpdf(x, mean, sigma))
plt.show()
Which gives the following figure:
Edit: The above only works with normed = True. If this is not an option, we can define our own function:
def gauss_function(x, a, x0, sigma):
return a * np.exp(-(x - x0) ** 2 / (2 * sigma ** 2))
mean = 100
sigma = 5
data = np.random.normal(mean,sigma,1000) # generate fake data
x = np.linspace(min(data), max(data), 1000)
test = gauss_function(x, max(data), mean, sigma)
plt.hist(data, bins="auto")
plt.plot(x, test)
plt.show()
All what you are looking for, already are in seaborn.
You just have to use distplot
import seaborn as sns
import numpy as np
data = np.random.normal(5, 2, size=1000)
sns.distplot(data)

python: plotting a histogram with a function line on top

I'm trying to do a little bit of distribution plotting and fitting in Python using SciPy for stats and matplotlib for the plotting. I'm having good luck with some things like creating a histogram:
seed(2)
alpha=5
loc=100
beta=22
data=ss.gamma.rvs(alpha,loc=loc,scale=beta,size=5000)
myHist = hist(data, 100, normed=True)
Brilliant!
I can even take the same gamma parameters and plot the line function of the probability distribution function (after some googling):
rv = ss.gamma(5,100,22)
x = np.linspace(0,600)
h = plt.plot(x, rv.pdf(x))
How would I go about plotting the histogram myHist with the PDF line h superimposed on top of the histogram? I'm hoping this is trivial, but I have been unable to figure it out.
just put both pieces together.
import scipy.stats as ss
import numpy as np
import matplotlib.pyplot as plt
alpha, loc, beta=5, 100, 22
data=ss.gamma.rvs(alpha,loc=loc,scale=beta,size=5000)
myHist = plt.hist(data, 100, normed=True)
rv = ss.gamma(alpha,loc,beta)
x = np.linspace(0,600)
h = plt.plot(x, rv.pdf(x), lw=2)
plt.show()
to make sure you get what you want in any specific plot instance, try to create a figure object first
import scipy.stats as ss
import numpy as np
import matplotlib.pyplot as plt
# setting up the axes
fig = plt.figure(figsize=(8,8))
ax = fig.add_subplot(111)
# now plot
alpha, loc, beta=5, 100, 22
data=ss.gamma.rvs(alpha,loc=loc,scale=beta,size=5000)
myHist = ax.hist(data, 100, normed=True)
rv = ss.gamma(alpha,loc,beta)
x = np.linspace(0,600)
h = ax.plot(x, rv.pdf(x), lw=2)
# show
plt.show()
One could be interested in plotting the distibution function of any histogram.
This can be done using seaborn kde function
import numpy as np # for random data
import pandas as pd # for convinience
import matplotlib.pyplot as plt # for graphics
import seaborn as sns # for nicer graphics
v1 = pd.Series(np.random.normal(0,10,1000), name='v1')
v2 = pd.Series(2*v1 + np.random.normal(60,15,1000), name='v2')
# plot a kernel density estimation over a stacked barchart
plt.figure()
plt.hist([v1, v2], histtype='barstacked', normed=True);
v3 = np.concatenate((v1,v2))
sns.kdeplot(v3);
plt.show()
from a coursera course on data visualization with python
Expanding on Malik's answer, and trying to stick with vanilla NumPy, SciPy and Matplotlib. I've pulled in Seaborn, but it's only used to provide nicer defaults and small visual tweaks:
import numpy as np
import scipy.stats as sps
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style='ticks')
# parameterise our distributions
d1 = sps.norm(0, 10)
d2 = sps.norm(60, 15)
# sample values from above distributions
y1 = d1.rvs(300)
y2 = d2.rvs(200)
# combine mixture
ys = np.concatenate([y1, y2])
# create new figure with size given explicitly
plt.figure(figsize=(10, 6))
# add histogram showing individual components
plt.hist([y1, y2], 31, histtype='barstacked', density=True, alpha=0.4, edgecolor='none')
# get X limits and fix them
mn, mx = plt.xlim()
plt.xlim(mn, mx)
# add our distributions to figure
x = np.linspace(mn, mx, 301)
plt.plot(x, d1.pdf(x) * (len(y1) / len(ys)), color='C0', ls='--', label='d1')
plt.plot(x, d2.pdf(x) * (len(y2) / len(ys)), color='C1', ls='--', label='d2')
# estimate Kernel Density and plot
kde = sps.gaussian_kde(ys)
plt.plot(x, kde.pdf(x), label='KDE')
# finish up
plt.legend()
plt.ylabel('Probability density')
sns.despine()
gives us the following plot:
I've tried to stick with a minimal feature set while producing relatively nice output, notably using SciPy to estimate the KDE is very easy.

Categories

Resources