I am so stuck trying to fit 3D gaussians, and I am hoping someone can see some silly mistake I am making, because I have spent hours debugging to no avail.
I have a 3d image stored in an array called "data", where data[x, y, z] gives the grayscale intensity at the point (x, y, z). I know that this 3d image follows a 3D Gaussian distribution, with a peak near the center of the image, but I am interested in the amplitude and spread. I am trying to fit this 3d array to a gaussian of the form My function in Python is:
def gaussian_3d(X, A, x0, y0, z0, sigx, sigy, sigz, offset):
x, y, z = X
return offset + A*np.exp(-(x-x0)**2/(2*sigx**2) - \
(y-y0)**2/(2*sigy**2) - (z-z0)**2/(2*sigz**2))
And the way I am doing this is as follows: if my image is of size 3 x 4 x 5, then I create a meshgrid (0...2) x (0...3) x (0...4), and then try to fit the intensity values to the function above.
My code looks like this:
def fit_gauss_3d(data):
dim = data.shape
# Step 1: set up meshgrid
x, y, z = np.arange(0, dim[0]), np.arange(0, dim[1]), np.arange(0, dim[2])
X, Y, Z = np.meshgrid(x, y, z)
data_in = np.vstack((X.ravel(),Y.ravel(),Z.ravel()))
data_out = data.ravel()
# Step 2: make good guess of the center "peak" point of the gaussian (x0, y0, z0)
# by using slices along the middle and finding the position of the maxes
mid1, mid2, mid3 = dim[0]//2, dim[1]//2, dim[2]//2
x0, y0, z0 = np.argmax(data[:, mid2, mid3]), np.argmax(data[mid1, :, mid3]), np.argmax(data[mid1, mid2, :])
# Step 3: Set lower/upper bounds for parameter search
delta = 0.5 # I am saying that the fit peak must be within +/- 0.5 of the initial guess
p0 = (data_max + 0.05, x0, y0, z0, 0.9, 0.9, 0.9, 0.05)
# Note: I know that sigmas are between 0.7 and 2.5, and offset is between 0 and 5
lower_bound = [data_max * 0.9, x0 - delta, y0 - delta, z0 - delta, 0.7, 0.7, 0.7, 0]
upper_bound = [data_max*1.1 + 0.1, x0 + delta, y0 + delta, z0 + delta, 2.5, 2.5, 2.5, 5]
# Step 4: Fit
p_param, p_cov = opt.curve_fit(gauss_3d, data_in, data_out, p0=p0, maxfev=50000, bounds=(lower_bound, upper_bound))
return p_param
def predict_gauss_3d(params, dims):
x = np.arange(0, dims[0])
y = np.arange(0, dims[1])
z = np.arange(0, dims[2])
XX, YY, ZZ = np.meshgrid(x, y, z)
X = np.vstack((XX.ravel(),YY.ravel(),ZZ.ravel()))
return gaussian_3d(X, *params).reshape(dims)
def plot_results(orig, sec):
''' Plot original and second fitted image'''
mid1, mid2, mid3 = dim[0]//2, dim[1]//2, dim[2]//2
fig = plt.figure()
ax1 = fig.add_subplot(3, 1, 1)
ax1.plot(orig[:, mid2, mid3], label='orig')
ax1.plot(sec[:, mid2, mid3], label='fitted')
ax1.legend(loc="upper left")
ax2 = fig.add_subplot(3, 1, 2)
ax2.plot(orig[mid1, :, mid3], label='orig')
ax2.plot(sec[mid1, :, mid3], label='fitted')
ax2.legend(loc="upper left")
ax3 = fig.add_subplot(3, 1, 3)
ax3.plot(orig[mid1, mid2], label='orig')
ax3.plot(sec[mid1, mid2], label='fitted')
ax3.legend(loc="upper left")
plt.tight_layout()
plt.show()
I plotted the projection of the fits along the middle axes. The first pic is varying x and keeping y, z at their midpoints, the second is varying y and keeping x, z at their midpoints, and so forth.
Some of my fits are reasonable, something like this:
While most are insanely bad, and not even Gaussian looking! For the below image, it chose the following parameters: . Clearly, I am either plotting wrong or fitting wrong. Can someone help me out? Is my meshgridding messed up somehow?
Related
I want to plot a function with 2 variables to a 2D plain. I also want some trial points in it. However, the result I get is not what I want. The X and Y values range from 0 to 40 (I wanted them to be between -1.5 and 2) and completely mismatch the function's actual values (with x = 15 and y = 15 it is definitely not 0).
However, the color map and the contour lines with numbers seem correct when I compare this to a top-down view of a 3D plot (it actually is between -1.5 and 2)
Lastly, the trail points (red dots in the first picture) are plotted according to the wrong X and Y values and are in the wrong place. How do I fix these problems? I am quite new to Numpy and matplotlib and I am guessing, that I used either np.arange or np.meshgrid wrong. I would like to either have a 2D image with the correct X and Y axis and trial points or a 3D plot with at least trial points that are always visible( the way I tried it, they seemed to be drawn before the 3D plot and would not be visible most of the time, having contour lines would be perfect). There is code I use to do calculations and plotting
import numpy as np
import matplotlib.pyplot as plt
def func(x: float, y: float) -> float:
return -( x * y * (1 - x - y) ) / 8
def partial_x_der(x: float, y: float) -> float:
return (-y * (-2 * x - y + 1)) / 8
def partial_y_der(x: float, y: float) -> float:
return (-x * (1 - x - 2 * y) ) / 8
def gradient(x: float, y: float):
return (partial_x_der(x, y), partial_y_der(x, y))
def tuple_minus(my_tuple1: tuple, my_tuple2: tuple) -> tuple:
return (my_tuple1[0] - my_tuple2[0], my_tuple1[1] - my_tuple2[1])
def tuple_multi_val(my_tuple: tuple, value: float) -> tuple:
return (my_tuple[0] * value, my_tuple[1] * value)
def gradiant_descent(start_point: tuple, learning_rate: float, epsilon: float) -> list:
max_iterations = 1000
cur_point = start_point
result_points = list()
for iter in range(max_iterations):
next_point = tuple_multi_val(gradient(cur_point[0], cur_point[1]), learning_rate)
if np.abs(next_point[0]) < epsilon and np.abs(next_point[1]) < epsilon:
print(f"With learning rate {learning_rate} minimum found at ({cur_point[0] - next_point[0]};{cur_point[1] - next_point[1]}) with value of {func(cur_point[0], cur_point[1])}, iterations needed: {iter}")
result_points.append((cur_point[0], cur_point[1], func(cur_point[0], cur_point[1])))
return result_points
cur_point = tuple_minus(cur_point, next_point)
if iter % 10 == 0:
result_points.append((cur_point[0], cur_point[1], func(cur_point[0], cur_point[1])))
print(f"Max iteraration count of {max_iterations} reached, Minimum value found at ({cur_point[0]};{cur_point[1]}) ")
result_points.append((cur_point[0], cur_point[1], func(cur_point[0], cur_point[1])))
return result_points
def show_gradient_descend(points_to_show: list):
x = np.arange(-1.5,2,0.1)
y = np.arange(-1.5,2,0.1)
X,Y = np.meshgrid(x, y)
Z = func(X, Y)
# 2D plot
im = plt.imshow(Z, 'GnBu', origin='lower') # drawing the function
# adding the Contour lines with labels
plt.cset = plt.contour(Z, np.arange(-0.5, 1, 0.1), linewidths=1, cmap=plt.cm.Set2)
plt.clabel(plt.cset, inline=True, fmt='%1.1f', fontsize=8)
plt.colorbar(im) # adding the colobar on the right
for point in points_to_show:
plt.scatter(point[0], point[1], color='r', marker='o')
# 3D plot
fig = plt.figure()
ax = fig.gca(projection='3d')
surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap='GnBu',linewidth=0, antialiased=False)
# title
plt.title('$z= -(x*y*(1-x-y)) / 8$')
plt.xlabel("X")
plt.ylabel("Y")
plt.show()
def main():
x0 = (0, 0)
x1 = (1, 1)
xm = (5/10, 9/10)
epsilon = 0.0001
points = [x0, x1, xm]
#for learning_rate in np.linspace(0,1,101):
# gradiant_descent(xm, learning_rate, epsilon)
for point in points:
points_to_show = gradiant_descent(point, 0.42, 0.0001)
show_gradient_descend(points_to_show)
if __name__ == "__main__":
main()
Thanks for any advice or help in advance.
I have a question considering a 3D dataset.
I have two different datasets consisting of 3D coordinates, one of these datasets I use to create surfaces in the form of cylinders (lets call it blue for now), from the other dataset I should be able to count the amount of 'points' (x,y,z) that are in this cylinder surface (lets call this dataset orange for now).
I found some code on stackoverflow which I use to create a cylinder in 3D for 2 points out the blue dataset, this all works.
However, now I should classify for every coordinate of the orange dataset if it falls inside this cylinder surface.
This is the code I use to plot the cylinder surface (found here: Plotting a solid cylinder centered on a plane in Matplotlib):
p0 = np.array([-0.0347944, 0.0058072, -0.022887199999999996]) #point at one end
p1 = np.array([-0.0366488, 0.0061488, -0.023424) #point at other end
R = 0.00005
#vector in direction of axis
v = p1 - p0
#find magnitude of vector
mag = norm(v)
#unit vector in direction of axis
v = v / mag
#make some vector not in the same direction as v
not_v = np.array([1, 0, 0])
if (v == not_v).all():
not_v = np.array([0, 1, 0])
#make vector perpendicular to v
n1 = np.cross(v, not_v)
#normalize n1
n1 /= norm(n1)
#make unit vector perpendicular to v and n1
n2 = np.cross(v, n1)
#surface ranges over t from 0 to length of axis and 0 to 2*pi
t = np.linspace(0, mag, 2)
theta = np.linspace(0, 2 * np.pi, 100)
rsample = np.linspace(0, R, 2)
#use meshgrid to make 2d arrays
t, theta2 = np.meshgrid(t, theta)
rsample,theta = np.meshgrid(rsample, theta)
#generate coordinates for surface
# "Tube"
X, Y, Z = [p0[i] + v[i] * t + R * np.sin(theta2) * n1[i] + R * np.cos(theta2) * n2[i] for i in [0, 1, 2]]
# "Bottom"
X2, Y2, Z2 = [p0[i] + rsample[i] * np.sin(theta) * n1[i] + rsample[i] * np.cos(theta) * n2[i] for i in [0, 1, 2]]
# "Top"
X3, Y3, Z3 = [p0[i] + v[i]*mag + rsample[i] * np.sin(theta) * n1[i] + rsample[i] * np.cos(theta) * n2[i] for i in [0, 1, 2]]
ax=plt.subplot(111, projection='3d')
ax.plot_surface(X, Y, Z, color='blue', alpha=0.5)
ax.plot_surface(X2, Y2, Z2, color='blue', alpha=0.5)
ax.plot_surface(X3, Y3, Z3, color='green', alpha=0.7)
plt.show()
Lets now say I need to classify the following points as 'inside' or 'outside' the cylinder surface:
point1 = (-0.0321, 0.003, -0.01) point2 = (-0.5, 0.004, 0.03) point3 = (0.0002, -0.02, 0.00045)
It is important to mention that these points in the orange dataset of which I need to find out whether or not these are inside or outside the cylinder surface, do not have to be points on the top and bottom part enclosing the cylinder surface, these points can be anywhere in a 3D space in and outside the cylinder surface.
The code mentioned here outputs the following result.
Cylinder surface enclosing two point of the blue dataset
UPDATE: Thanks for the help! However, I found the solution here: How to check if a 3D point is inside a cylinder
Your question sounds more about geometry than about coding. What exactly do you need?
Geometrically, if you want to check whether the point (a, b, c) lies inside of a (canonical) cylindrical surface
(x, y, z) such that {x^2 + y^2 = r^2, z in [zi, zf] },
simply check whether a^2 + b^2 < r^2 and c in [zi, zf] is True.
If the axis of your cylinder is not aligned to the z-axis, just bring the point (a, b, c) to a new system where the cylinder is in canonical form. In general, if the cylinder axis lies along the (unit) vector v. the angle with respect to the z-axis is simply beta = arccos(v[2]) (v is already normalized!). Then you want to use Rodrigues' rotation formula with k = v and theta = -beta (note the sign) to transform (a, b, c). You should also check whether the rotated p0 and p1 already lie on the (rotated) z-axis, otherwise you should apply an additional translation to (a, b, c) before making the comparison.
If what you need is help coding this up, well, post your attempt and ask away :)
I have a grid with some given data. This data is given by its angle (from 0 to π).
Within this grid I have another smaller grid.
This might look like this:
Now I want to interpolate the angles on that grid.
I tried this by using scipy.interpolate.griddata what gives a good result. But there is a problem when the angles change from almost 0 to almost π (because the middle is π/2 ...)
Here is the result and it is easy to see what's going wrong.
How can I deal with that problem? Thank you! :)
Here is the code to reproduce:
import numpy as np
from matplotlib import pyplot as plt
from scipy.interpolate import griddata
ax = plt.subplot()
ax.set_aspect(1)
# Simulate some given data.
x, y = np.meshgrid(np.linspace(-10, 10, 20), np.linspace(-10, 10, 20))
data = np.arctan(y / 10) % np.pi
u = np.cos(data)
v = np.sin(data)
ax.quiver(x, y, u, v, headlength=0.01, headaxislength=0, pivot='middle', units='xy')
# Create a smaller grid within.
x1, y1 = np.meshgrid(np.linspace(-1, 5, 15), np.linspace(-6, 2, 20))
# ax.plot(x1, y1, '.', color='red', markersize=2)
# Interpolate data on grid.
interpolation = griddata((x.flatten(), y.flatten()), data.flatten(), (x1.flatten(), y1.flatten()))
u1 = np.cos(interpolation)
v1 = np.sin(interpolation)
ax.quiver(x1, y1, u1, v1, headlength=0.01, headaxislength=0, pivot='middle', units='xy',
color='red', scale=3, width=0.03)
plt.show()
Edit:
Thanks to #bubble, there is a way to adjust the given angles before interpolation such that the result will be as desired.
Therefore:
Define a rectifying function:
def RectifyData(data):
for j in range(len(data)):
step = data[j] - data[j - 1]
if abs(step) > np.pi / 2:
data[j] += np.pi * (2 * (step < 0) - 1)
return data
Interpolate as follows:
interpolation = griddata((x.flatten(), y.flatten()),
RectifyData(data.flatten()),
(x1.flatten(), y1.flatten()))
u1 = np.cos(interpolation)
v1 = np.sin(interpolation)
I tried direct interpolation of cos(angle) and sin(angle) values, but this still yielded to discontinues, that cause wrong line directions. The main idea consist in reducing discontinues, e.g. [2.99,3.01, 0.05,0.06] should be transformed to something like this: [2.99, 3.01, pi+0.05, pi+0.06]. This is needed to apply 2D interpolation algorithm correctly. Almost the same problem raises in the following post.
def get_rectified_angles(u, v):
angles = np.arcsin(v)
inds = u < 0
angles[inds] *= -1
# Direct approach of removing discontinues
# for j in range(len(angles[1:])):
# if abs(angles[j] - angles[j - 1]) > np.pi / 2:
# sel = [abs(angles[j] + np.pi - angles[j - 1]), abs(angles[j] - np.pi - angles[j-1])]
# if np.argmin(sel) == 0:
# angles[j] += np.pi
# else:
# angles[j] -= np.pi
return angles
ax.quiver(x, y, u, v, headlength=0.01, headaxislength=0, pivot='middle', units='xy')
# # Create a smaller grid within.
x1, y1 = np.meshgrid(np.linspace(-1, 5, 15), np.linspace(-6, 2, 20))
angles = get_rectified_angles(u.flatten(), v.flatten())
interpolation = griddata((x.flatten(), y.flatten()), angles, (x1.flatten(), y1.flatten()))
u1 = np.cos(interpolation)
v1 = np.sin(interpolation)
ax.quiver(x1, y1, u1, v1, headlength=0.01, headaxislength=0, pivot='middle', units='xy',
color='red', scale=3, width=0.03)
Probably, numpy.unwrap function could be used to fix discontinues. In case of 1d data, numpy.interp has keyword period to handle periodic data.
To test my 2D Gaussian fitting code for images of bright objects, I'm running it on a Point Spread Function (PSF) that was constructed and fit by the WISE team. On their website, they list the parameters for the central PSF for each band: the FWHM along the major and minor axes, and position angle (this is the angle from the y-axis). All the information is available here: WISE PSF information, but below is an image of those parameters for the central PSF, and the corresponding image of the PSF for band 3.
So, I have downloaded the corresponding FITS image for the central PSF in band 3 (all the images are available to download via the above link), and tried running my code on it. However, my code does not return the parameters I would expect, and the parameters change depending on if I fit to a subimage (and depending on the size of this), or the whole image, which is kind of worrying.
I am wondering if there's a way to make my Gaussian fit code recover the most accurate parameters -- or maybe another fitting method would be more effective. But I'm mostly concerned at the fact that the output parameters of my Gaussian fit can become so obviously wrong. Below is my code.
from scipy import optimize
import numpy as np
from astropy.io import fits
image = 'wise-w3-psf-wpro-09x09-05x05.fits' #WISE central PSF
stacked_image = fits.getdata(image)
# Center of image (the PSF is centered)
x0 = np.shape(stacked_image)[1]//2
y0 = np.shape(stacked_image)[0]//2
# Normalize image so peak = 1
def normalize(image):
image *= 1/np.max(image)
return image
stacked_image = normalize(stacked_image)
def gaussian_func(xy, x0, y0, sigma_x, sigma_y, amp, theta, offset):
x, y = xy
a = (np.cos(theta))**2/(2*sigma_x**2) + (np.sin(theta))**2/(2*sigma_y**2)
b = -np.sin(2*theta)/(4*sigma_x**2) + np.sin(2*theta)/(4*sigma_y**2)
c = (np.sin(theta))**2/(2*sigma_x**2) + (np.cos(theta))**2/(2*sigma_y**2)
inner = a * (x-x0)**2
inner += 2*b*(x-x0)*(y-y0)
inner += c * (y-y0)**2
return (offset + amp * np.exp(-inner)).ravel()
def Sigma2width(sigma):
return 2 * np.sqrt(2*np.log(2)) * sigma
def generate(data_set):
xvec = np.arange(0, np.shape(data_set)[1], 1)
yvec = np.arange(0, np.shape(data_set)[0], 1)
X, Y = np.meshgrid(xvec, yvec)
return X, Y
# METHOD 1: Fit subimage of PSF to Gaussian
# Guesses
theta_guess = np.deg2rad(96) #I believe that the angle in the Gaussian corresponds to CCW from the x-axis (so use WISE position angle + 90 degrees)
sigma_x = 5
sigma_y = 4
amp = 1 #I know this is true since I normalized it
subimage = stacked_image[y0-50:y0+50, x0-50:x0+50]
offset = np.min(subimage)
guesses = [np.shape(subimage)[1]//2, np.shape(subimage)[0]//2, sigma_x, sigma_y, amp, theta_guess, offset]
xx, yy = generate(subimage)
pred_params, uncert_cov = optimize.curve_fit(gaussian_func, (xx.ravel(), yy.ravel()), subimage.ravel(), p0=guesses)
width_x, width_y = Sigma2width(np.abs(pred_params[2]))*0.275, Sigma2width(np.abs(pred_params[3]))*0.275 #multiply by pixel scale factor (available on website) to get FWHMs in arcseconds
x_0, y_0 = pred_params[0]+(x0-50), pred_params[1]+(y0-50) #add back origin
theta_deg = np.rad2deg(pred_params[5])
pred_params[5] = theta_deg
pred_params[0] = x_0
pred_params[1] = y_0
if theta_deg < 90:
pos_angle = theta_deg+90
elif theta_deg >= 90:
pos_angle = theta_deg-90
print('PREDICTED FWHM x, y in arcsecs:', width_x, width_y)
print('FIT PARAMS [x0, y0, sigma_x, sigma_y, amp, theta, offset]:', pred_params)
print('POSITION ANGLE:', pos_angle)
# Output: PREDICTED FWHM x, y in arcsecs: 6.4917 5.4978
# FIT PARAMS [x0, y0, sigma_x, sigma_y, amp, theta, offset]: [3.195e+02 3.189e+02 1.002e+01 8.489e+00 8.695e-01 8.655e+01 2.613e-02]
# POSITION ANGLE: 176.556
# METHOD 2: Fit whole image to Gaussian
# Guesses
theta_guess = np.deg2rad(96)
sigma_x = 5
sigma_y = 4
amp = 1
offset = np.median(stacked_image)
guesses = [x0, y0, sigma_x, sigma_y, amp, theta_guess, offset]
# Sigmas - manual estimation
ylim, xlim = np.shape(stacked_image)
x, y = np.arange(0, xlim, 1), np.arange(0, ylim, 1)
ypix, xpix = np.where(stacked_image==amp)
y_range = np.take(stacked_image, ypix[0], axis=0)
x_range = np.take(stacked_image, xpix[0], axis=1)
xx, yy = generate(stacked_image)
pred_params, uncert_cov = optimize.curve_fit(gaussian_func, (xx.ravel(), yy.ravel()), stacked_image.ravel(), p0=guesses)
width_x, width_y = Sigma2width(np.abs(pred_params[2]))*0.275, Sigma2width(np.abs(pred_params[3]))*0.275 #in arcsecs
theta = pred_params[5]
print('PREDICTED FWHM x, y in arcsecs:', width_x, width_y)
print('FIT PARAMS [x0, y0, sigma_x, sigma_y, amp, theta, offset]:', pred_params)
# Output:
# PREDICTED FWHM x, y in arcsecs: 7.088 6.106
# FIT PARAMS [x0, y0, sigma_x, sigma_y, amp, theta, offset]: [3.195e+02 3.190e+02 1.095e+01 9.429e+00 8.378e-01 1.521e+00 7.998e-04]
if theta < 90:
pos_angle = 90+np.rad2deg(theta)
elif theta >= 90:
pos_angle = 90-np.rad2deg(theta)
print('POSITION ANGLE:', pos_angle)
# POSITION ANGLE: 177.147
You can see in the (rounded) outputs that the amplitudes my Gaussian fits return aren't even 1, and the other parameters (FWHMs and angles) don't match up with the correct parameters shown in the table either.
If I fit a subimage, it seems that the amplitude gets closer and closer to (but never reaches) 1 the smaller I make the subimage, but then the FWHMs might get too small compared to the real values. Why am I not getting back the correct results, and how can I make the fit as accurate as possible?
I have been trying to figure out the full width half maximum (FWHM) of the the blue peak (see image). The green peak and the magenta peak combined make up the blue peak. I have been using the following equation to find the FWHM of the green and magenta peaks: fwhm = 2*np.sqrt(2*(math.log(2)))*sd where sd = standard deviation. I created the green and magenta peaks and I know the standard deviation which is why I can use that equation.
I created the green and magenta peaks using the following code:
def make_norm_dist(self, x, mean, sd):
import numpy as np
norm = []
for i in range(x.size):
norm += [1.0/(sd*np.sqrt(2*np.pi))*np.exp(-(x[i] - mean)**2/(2*sd**2))]
return np.array(norm)
If I did not know the blue peak was made up of two peaks and I only had the blue peak in my data, how would I find the FWHM?
I have been using this code to find the peak top:
peak_top = 0.0e-1000
for i in x_axis:
if i > peak_top:
peak_top = i
I could divide the peak_top by 2 to find the half height and then try and find y-values corresponding to the half height, but then I would run into trouble if there are no x-values exactly matching the half height.
I am pretty sure there is a more elegant solution to the one I am trying.
You can use spline to fit the [blue curve - peak/2], and then find it's roots:
import numpy as np
from scipy.interpolate import UnivariateSpline
def make_norm_dist(x, mean, sd):
return 1.0/(sd*np.sqrt(2*np.pi))*np.exp(-(x - mean)**2/(2*sd**2))
x = np.linspace(10, 110, 1000)
green = make_norm_dist(x, 50, 10)
pink = make_norm_dist(x, 60, 10)
blue = green + pink
# create a spline of x and blue-np.max(blue)/2
spline = UnivariateSpline(x, blue-np.max(blue)/2, s=0)
r1, r2 = spline.roots() # find the roots
import pylab as pl
pl.plot(x, blue)
pl.axvspan(r1, r2, facecolor='g', alpha=0.5)
pl.show()
Here is the result:
This worked for me in iPython (quick and dirty, can be reduced to 3 lines):
def FWHM(X,Y):
half_max = max(Y) / 2.
#find when function crosses line half_max (when sign of diff flips)
#take the 'derivative' of signum(half_max - Y[])
d = sign(half_max - array(Y[0:-1])) - sign(half_max - array(Y[1:]))
#plot(X[0:len(d)],d) #if you are interested
#find the left and right most indexes
left_idx = find(d > 0)[0]
right_idx = find(d < 0)[-1]
return X[right_idx] - X[left_idx] #return the difference (full width)
Some additions can be made to make the resolution more accurate, but in the limit that there are many samples along the X axis and the data is not too noisy, this works great.
Even when the data are not Gaussian and a little noisy, it worked for me (I just take the first and last time half max crosses the data).
If your data has noise (and it always does in the real world), a more robust solution would be to fit a Gaussian to the data and extract FWHM from that:
import numpy as np
import scipy.optimize as opt
def gauss(x, p): # p[0]==mean, p[1]==stdev
return 1.0/(p[1]*np.sqrt(2*np.pi))*np.exp(-(x-p[0])**2/(2*p[1]**2))
# Create some sample data
known_param = np.array([2.0, .7])
xmin,xmax = -1.0, 5.0
N = 1000
X = np.linspace(xmin,xmax,N)
Y = gauss(X, known_param)
# Add some noise
Y += .10*np.random.random(N)
# Renormalize to a proper PDF
Y /= ((xmax-xmin)/N)*Y.sum()
# Fit a guassian
p0 = [0,1] # Inital guess is a normal distribution
errfunc = lambda p, x, y: gauss(x, p) - y # Distance to the target function
p1, success = opt.leastsq(errfunc, p0[:], args=(X, Y))
fit_mu, fit_stdev = p1
FWHM = 2*np.sqrt(2*np.log(2))*fit_stdev
print "FWHM", FWHM
The plotted image can be generated by:
from pylab import *
plot(X,Y)
plot(X, gauss(X,p1),lw=3,alpha=.5, color='r')
axvspan(fit_mu-FWHM/2, fit_mu+FWHM/2, facecolor='g', alpha=0.5)
show()
An even better approximation would filter out the noisy data below a given threshold before the fit.
Here is a nice little function using the spline approach.
from scipy.interpolate import splrep, sproot, splev
class MultiplePeaks(Exception): pass
class NoPeaksFound(Exception): pass
def fwhm(x, y, k=10):
"""
Determine full-with-half-maximum of a peaked set of points, x and y.
Assumes that there is only one peak present in the datasset. The function
uses a spline interpolation of order k.
"""
half_max = amax(y)/2.0
s = splrep(x, y - half_max, k=k)
roots = sproot(s)
if len(roots) > 2:
raise MultiplePeaks("The dataset appears to have multiple peaks, and "
"thus the FWHM can't be determined.")
elif len(roots) < 2:
raise NoPeaksFound("No proper peaks were found in the data set; likely "
"the dataset is flat (e.g. all zeros).")
else:
return abs(roots[1] - roots[0])
You should use scipy to solve it: first find_peaks and then peak_widths.
With default value in rel_height(0.5) you're measuring the width at half maximum of the peak.
If you prefer interpolation over fitting:
import numpy as np
def get_full_width(x: np.ndarray, y: np.ndarray, height: float = 0.5) -> float:
height_half_max = np.max(y) * height
index_max = np.argmax(y)
x_low = np.interp(height_half_max, y[:index_max+1], x[:index_max+1])
x_high = np.interp(height_half_max, np.flip(y[index_max:]), np.flip(x[index_max:]))
return x_high - x_low
For monotonic functions with many data points and if there's no need for perfect accuracy, I would use:
def FWHM(X, Y):
deltax = x[1] - x[0]
half_max = max(Y) / 2.
l = np.where(y > half_max, 1, 0)
return np.sum(l) * deltax
I implemented an empirical solution which works for noisy and not-quite-Gaussian data fairly well in haggis.math.full_width_half_max. The usage is extremely straightforward:
fwhm = full_width_half_max(x, y)
The function is robust: it simply finds the maximum of the data and the nearest points crossing the "halfway down" threshold using the requested interpolation scheme.
Here are a couple of examples using data from the other answers.
#HYRY's smooth data
def make_norm_dist(x, mean, sd):
return 1.0/(sd*np.sqrt(2*np.pi))*np.exp(-(x - mean)**2/(2*sd**2))
x = np.linspace(10, 110, 1000)
green = make_norm_dist(x, 50, 10)
pink = make_norm_dist(x, 60, 10)
blue = green + pink
# create a spline of x and blue-np.max(blue)/2
spline = UnivariateSpline(x, blue-np.max(blue)/2, s=0)
r1, r2 = spline.roots() # find the roots
# Compute using my function
fwhm, (x1, y1), (x2, y2) = full_width_half_max(x, blue, return_points=True)
# Print comparison
print('HYRY:', r2 - r1, 'MP:', fwhm)
plt.plot(x, blue)
plt.axvspan(r1, r2, facecolor='g', alpha=0.5)
plt.plot(x1, y1, 'r.')
plt.plot(x2, y2, 'r.')
For smooth data, the results are pretty exact:
HYRY: 26.891157007233254 MP: 26.891193606203814
#Hooked's Noisy Data
def gauss(x, p): # p[0]==mean, p[1]==stdev
return 1.0/(p[1]*np.sqrt(2*np.pi))*np.exp(-(x-p[0])**2/(2*p[1]**2))
# Create some sample data
known_param = np.array([2.0, .7])
xmin,xmax = -1.0, 5.0
N = 1000
X = np.linspace(xmin,xmax,N)
Y = gauss(X, known_param)
# Add some noise
Y += .10*np.random.random(N)
# Renormalize to a proper PDF
Y /= ((xmax-xmin)/N)*Y.sum()
# Fit a guassian
p0 = [0,1] # Inital guess is a normal distribution
errfunc = lambda p, x, y: gauss(x, p) - y # Distance to the target function
p1, success = opt.leastsq(errfunc, p0[:], args=(X, Y))
fit_mu, fit_stdev = p1
FWHM = 2*np.sqrt(2*np.log(2))*fit_stdev
# Compute using my function
fwhm, (x1, y1), (x2, y2) = full_width_half_max(X, Y, return_points=True)
# Print comparison
print('Hooked:', FWHM, 'MP:', fwhm)
plt.plot(X, Y)
plt.plot(X, gauss(X, p1), lw=3, alpha=.5, color='r')
plt.axvspan(fit_mu - FWHM / 2, fit_mu + FWHM / 2, facecolor='g', alpha=0.5)
plt.plot(x1, y1, 'r.')
plt.plot(x2, y2, 'r.')
For noisy data (with a biased baseline), the results are not as consistent.
Hooked: 1.9903193212254346 MP: 1.5039676990530118
On the one hand the Gaussian fit is not very optimal for the data, but on the other hand, the strategy of picking the nearest point that intersects the half-max threshold is likely not optimal either.