Normalize histogram2d by bin area

Normalize histogram2d by bin area - python

I have a 2D histogram that I generate with numpy:
H, xedges, yedges = np.histogram2d(y, x, weights=mass * (1.0 - pf),
bins=(yrange,xrange))
Note that I'm currently weighing the bins with a function of mass (mass is a numpy array with the same dimensions as x and y). The bins are logarithmic (generated via xrange = np.logspace(minX, maxX, 100)).
However, I really want to weight the bins by the mass function but normalize them (i.e. divide) each by the area of the bin: e.g. - each bin has area xrange[i] * yrange[i]. However, since xrange and yrange don't have the same dimensions as mass, x and y ... I can't simply put the normalization in the np.histogram2d call.
How can I normalize the bin counts by the area in each log bin?
For reference, here's the plot (I've added x and y 1D histograms that I'll also need to normalize by the width of the bin, but once I figure out how to do it for 2D it should be analogous).
FYI - I generate the main (and axes-histograms) with matplotlib:
X,Y=np.meshgrid(xrange,yrange)
H = np.log10(H)
masked_array = np.ma.array(H, mask=np.isnan(H)) # mask out all nan, i.e. log10(0.0)
cax = (ax2dhist.pcolormesh(X,Y,masked_array, cmap=cmap, norm=LogNorm(vmin=1,vmax=8)))

I think you just want to pass normed=True to np.histogram2d:
normed: bool, optional
If False, returns the number of samples in each bin. If True, returns the bin density bin_count / sample_count / bin_area.
If you wanted to compute the bin areas and do the normalization manually , the simplest way would probably be to use broadcasting:
x, y = np.random.rand(2, 1000)
xbin = np.logspace(-1, 0, 101)
ybin = np.logspace(-1, 0, 201)
# raw bin counts
counts, xe, ye = np.histogram2d(x, y, [xbin, ybin])
# size of each bin in x and y dimensions
dx = np.diff(xbin)
dy = np.diff(ybin)
# compute the area of each bin using broadcasting
area = dx[:, None] * dy
# normalized counts
manual_norm = counts / area / x.shape[0]
# using normed=True
counts_norm, xe, ye = np.histogram2d(x, y, [xbin, ybin], normed=True)
print(np.allclose(manual_norm, counts_norm))
# True

Related

Interpolating non-uniformly distributed points on a 3D sphere

I have several points on the unit sphere that are distributed according to the algorithm described in https://www.cmu.edu/biolphys/deserno/pdf/sphere_equi.pdf (and implemented in the code below). On each of these points, I have a value that in my particular case represents 1 minus a small error. The errors are in [0, 0.1] if this is important, so my values are in [0.9, 1].
Sadly, computing the errors is a costly process and I cannot do this for as many points as I want. Still, I want my plots to look like I am plotting something "continuous".
So I want to fit an interpolation function to my data, to be able to sample as many points as I want.
After a little bit of research I found scipy.interpolate.SmoothSphereBivariateSpline which seems to do exactly what I want. But I cannot make it work properly.
Question: what can I use to interpolate (spline, linear interpolation, anything would be fine for the moment) my data on the unit sphere? An answer can be either "you misused scipy.interpolation, here is the correct way to do this" or "this other function is better suited to your problem".
Sample code that should be executable with numpy and scipy installed:
import typing as ty
import numpy
import scipy.interpolate
def get_equidistant_points(N: int) -> ty.List[numpy.ndarray]:
"""Generate approximately n points evenly distributed accros the 3-d sphere.
This function tries to find approximately n points (might be a little less
or more) that are evenly distributed accros the 3-dimensional unit sphere.
The algorithm used is described in
https://www.cmu.edu/biolphys/deserno/pdf/sphere_equi.pdf.
"""
# Unit sphere
r = 1
points: ty.List[numpy.ndarray] = list()
a = 4 * numpy.pi * r ** 2 / N
d = numpy.sqrt(a)
m_v = int(numpy.round(numpy.pi / d))
d_v = numpy.pi / m_v
d_phi = a / d_v
for m in range(m_v):
v = numpy.pi * (m + 0.5) / m_v
m_phi = int(numpy.round(2 * numpy.pi * numpy.sin(v) / d_phi))
for n in range(m_phi):
phi = 2 * numpy.pi * n / m_phi
points.append(
numpy.array(
[
numpy.sin(v) * numpy.cos(phi),
numpy.sin(v) * numpy.sin(phi),
numpy.cos(v),
]
)
)
return points
def cartesian2spherical(x: float, y: float, z: float) -> numpy.ndarray:
r = numpy.linalg.norm([x, y, z])
theta = numpy.arccos(z / r)
phi = numpy.arctan2(y, x)
return numpy.array([r, theta, phi])
n = 100
points = get_equidistant_points(n)
# Random here, but costly in real life.
errors = numpy.random.rand(len(points)) / 10
# Change everything to spherical to use the interpolator from scipy.
ideal_spherical_points = numpy.array([cartesian2spherical(*point) for point in points])
r_interp = 1 - errors
theta_interp = ideal_spherical_points[:, 1]
phi_interp = ideal_spherical_points[:, 2]
# Change phi coordinate from [-pi, pi] to [0, 2pi] to please scipy.
phi_interp[phi_interp < 0] += 2 * numpy.pi
# Create the interpolator.
interpolator = scipy.interpolate.SmoothSphereBivariateSpline(
theta_interp, phi_interp, r_interp
)
# Creating the finer theta and phi values for the final plot
theta = numpy.linspace(0, numpy.pi, 100, endpoint=True)
phi = numpy.linspace(0, numpy.pi * 2, 100, endpoint=True)
# Creating the coordinate grid for the unit sphere.
X = numpy.outer(numpy.sin(theta), numpy.cos(phi))
Y = numpy.outer(numpy.sin(theta), numpy.sin(phi))
Z = numpy.outer(numpy.cos(theta), numpy.ones(100))
thetas, phis = numpy.meshgrid(theta, phi)
heatmap = interpolator(thetas, phis)
Issue with the code above:
With the code as-is, I have a
ValueError: The required storage space exceeds the available storage space: nxest or nyest too small, or s too small. The weighted least-squares spline corresponds to the current set of knots.
that is raised when initialising the interpolator instance.
The issue above seems to say that I should change the value of s that is one on the parameters of scipy.interpolate.SmoothSphereBivariateSpline. I tested different values of s ranging from 0.0001 to 100000, the code above always raise, either the exception described above or:
ValueError: Error code returned by bispev: 10
Edit: I am including my findings here. They can't really be considered as a solution, that is why I am editing and not posting as an answer.
With more research I found this question Using Radial Basis Functions to Interpolate a Function on a Sphere. The author has exactly the same problem as me and use a different interpolator: scipy.interpolate.Rbf. I changed the above code by replacing the interpolator and plotting:
# Create the interpolator.
interpolator = scipy.interpolate.Rbf(theta_interp, phi_interp, r_interp)
# Creating the finer theta and phi values for the final plot
plot_points = 100
theta = numpy.linspace(0, numpy.pi, plot_points, endpoint=True)
phi = numpy.linspace(0, numpy.pi * 2, plot_points, endpoint=True)
# Creating the coordinate grid for the unit sphere.
X = numpy.outer(numpy.sin(theta), numpy.cos(phi))
Y = numpy.outer(numpy.sin(theta), numpy.sin(phi))
Z = numpy.outer(numpy.cos(theta), numpy.ones(plot_points))
thetas, phis = numpy.meshgrid(theta, phi)
heatmap = interpolator(thetas, phis)
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib import cm
colormap = cm.inferno
normaliser = mpl.colors.Normalize(vmin=numpy.min(heatmap), vmax=1)
scalar_mappable = cm.ScalarMappable(cmap=colormap, norm=normaliser)
scalar_mappable.set_array([])
fig = plt.figure()
ax = fig.add_subplot(111, projection="3d")
ax.plot_surface(
X,
Y,
Z,
facecolors=colormap(normaliser(heatmap)),
alpha=0.7,
cmap=colormap,
)
plt.colorbar(scalar_mappable)
plt.show()
This code runs smoothly and gives the following result:
The interpolation seems OK except on one line that is discontinuous, just like in the question that led me to this class. One of the answer give the idea of using a different distance, more adapted the the spherical coordinates: the Haversine distance.
def haversine(x1, x2):
theta1, phi1 = x1
theta2, phi2 = x2
return 2 * numpy.arcsin(
numpy.sqrt(
numpy.sin((theta2 - theta1) / 2) ** 2
+ numpy.cos(theta1) * numpy.cos(theta2) * numpy.sin((phi2 - phi1) / 2) ** 2
)
)
# Create the interpolator.
interpolator = scipy.interpolate.Rbf(theta_interp, phi_interp, r_interp, norm=haversine)
which, when executed, gives a warning:
LinAlgWarning: Ill-conditioned matrix (rcond=1.33262e-19): result may not be accurate.
self.nodes = linalg.solve(self.A, self.di)
and a result that is not at all the one expected: the interpolated function have values that may go up to -1 which is clearly wrong.

You can use Cartesian coordinate instead of Spherical coordinate.
The default norm parameter ('euclidean') used by Rbf is sufficient
# interpolation
x, y, z = numpy.array(points).T
interpolator = scipy.interpolate.Rbf(x, y, z, r_interp)
# predict
heatmap = interpolator(X, Y, Z)
Here the result:
ax.plot_surface(
X, Y, Z,
rstride=1, cstride=1,
# or rcount=50, ccount=50,
facecolors=colormap(normaliser(heatmap)),
cmap=colormap,
alpha=0.7, shade=False
)
ax.set_xlabel('x axis')
ax.set_ylabel('y axis')
ax.set_zlabel('z axis')
You can also use a cosine distance if you want (norm parameter):
def cosine(XA, XB):
if XA.ndim == 1:
XA = numpy.expand_dims(XA, axis=0)
if XB.ndim == 1:
XB = numpy.expand_dims(XB, axis=0)
return scipy.spatial.distance.cosine(XA, XB)
In order to better see the differences,
I stacked the two images, substracted them and inverted the layer.

Matplotlib draw vertical lines up to a curve

I am currently using Rectangle in an attempt to fill an area under a curve with a single colour per rectangle. However the Rectanges are > 1 pixel wide. I want to draw lines 1 pixel wide so that they dont overlap. Currently the vertical rectangles under the curve overlap horizontally by 1 or two pixels.
def rect(x,y,w,h,c):
ax = plt.gca()
polygon = plt.Rectangle((x,y),w,h,color=c, antialiased=False)
ax.add_patch(polygon)
def mask_fill(X,Y, fa, cmap='Set1'):
plt.plot(X,Y,lw=0)
plt.xlim([X[0], X[-1]])
plt.ylim([0, MAX])
dx = X[1]-X[0]
for n, (x,y, f) in enumerate(zip(X,Y, fa)):
color = cmap(f)
rect(x,0,dx,y,color)
If I use the code below to draw lines, the overlap is reduced but there is still an overlap
def vlines(x_pos, y1, y2, c):
plt.vlines(x_pos, ymin=y1, ymax=y2, color=c)
def draw_lines(X, Y, trend_len, cmap='Blues_r']):
plt.plot(X, Y, lw=0)
plt.xlim([X[0], X[-1]])
plt.ylim([0, MAX])
dx = X[1] - X[0]
ydeltas = y_trend(Y, trend_len)
for n, (x, y, yd) in enumerate(zip(X, Y, ydeltas)):
color = cmap(y / MAX)
vlines(x, y1=0, y2=y, c=color)
Printing the first 3 iterations of values of parameters into vlines we can see that x_pos is incrementing by 1 - yet the red line clearly overlaps the first blue line as per image below (NB first (LHS) blue line is 1 pixel wide):
x_pos: 0, y1: 0, y2: 143.51, c: (0.7816378316032295, 0.8622683583237216, 0.9389773164167627, 1.0)
x_pos: 1, y1: 0, y2: 112.79092811646952, c: (0.9872049211841599, 0.5313341022683583, 0.405843906189927, 1.0)
x_pos: 2, y1: 0, y2: 123.53185623293905, c: (0.9882352941176471, 0.6059669357939254, 0.4853671664744329, 1.0)
Sample data:
47.8668447889, 1
78.5668447889, 1
65.9768447889, 1
139.658525932, 2
123.749454049, 2
116.660382165, 3
127.771310282, 3
114.792238398, 3
The first column above corresponds the the y value of the series (x values just number of values, counting from 0)
The second column corresponds to the class.
I am generating two images:
One with unique values per class (0-6), each a different colour (7 unique colours), with colour filled up the to the y value this will be used as a mask over data image below.
Second image (example shown) uses different colour maps for different class values (eg 0=Blues_r, 1=Reds_r etc) and the intensity of the colour is given by the value of y.
The code for calculating the colours is fine, but I just cant get matplotlib to plot vertical lines a sigle pixel wide.

Since your goal is not to create an interactive figure, and you are trying to manipulate columns of pixels, you can use numpy instead of matplotlib to generate the result.
Here is a function that will take in y and category arrays, and create an image that's as wide as y is long, with the specified height. Color scaling is done similarly to your solution, where y is divided by the max.
from matplotlib import pyplot as plt
import numpy as np
def draw_lines(y, category, filename, cmap='Set1', max=None, height=None):
y = np.asanyarray(y).ravel()
category = np.asanyarray(category).ravel()
assert y.size == category.size
if max is None:
max = y.max()
if height is None:
height = int(np.ceil(max))
if isinstance(cmap, str):
cmap = plt.get_cmap(cmap)
colors = cmap(category)
colors[:, 3] = y / max
colors = (255 * colors).astype(np.uint8)
output = np.repeat(colors[None, ...], height, axis=0)
heights = np.round(height * (y / max))
mask = np.arange(height)[:, None] >= heights
mask = np.broadcast_to(mask[::-1, :, None], output.shape)
output[mask] = 0
plt.imsave(filename, output)
return output
The first part just sets up the input values. The second part gets the color values. Calling a colormap with an array of n values returns an (n, 4) array of colors in the range [0, 1.0]. colors[:, 3] = y / max sets the alpha channel proportional to the height. The colors are then smeared vertically to the desired height. The last part creates a mask to set the top part of each column to zero, according to the method proposed here.
This version uses transparency to turn off the colors, and to trim the shape. You can do the same thing with a white background, if you are willing to scale the colors instead of adjusting the transparency:
def draw_lines_b(y, category, filename, cmap='Set1', max=None, height=None):
y = np.asanyarray(y).ravel()
category = np.asanyarray(category).ravel()
assert y.size == category.size
if max is None:
max = y.max()
if height is None:
height = int(np.ceil(max))
if isinstance(cmap, str):
cmap = plt.get_cmap(cmap)
colors = cmap(category)
colors[..., :3] *= (y / max)[..., None]
colors = (255 * colors).astype(np.uint8)
output = np.repeat(colors[None, ...], height, axis=0)
heights = np.round(height * (y / max))
mask = np.arange(height)[:, None] >= heights
mask = np.broadcast_to(mask[::-1, :, None], output.shape)
output[mask] = 255
plt.imsave(filename, output)
return output
In both cases, as you can imagine, matplotlib is not strictly necessary. You can define your own list of colors, and use a more appropriate library, such as PIL, to save the images.

How can I calculate arbitrary values from a spline created with scipy.interpolate.Rbf?

I have several data points in 3 dimensional space (x, y, z) and have interpolated them using scipy.interpolate.Rbf. This gives me a spline nicely representing the surface of my 3D object. I would now like to determine several x and y pairs that have the same, arbitrary z value. I would like to do that in order to compute the cross section of my 3D object at any given value of z. Does someone know how to do that? Maybe there is also a better way to do that instead of using scipy.interpolate.Rbf.
Up to now I have evaluated the cross sections by making a contour plot using matplotlib.pyplot and extracting the displayed segments. 3D points and interpolated spline
segments extracted using a contour plot

I was able to solve the problem. I have calculated the area by triangulating the x-y data and cutting the triangles with the z-plane I wanted to calculate the cross-sectional area of (z=z0). Specifically, I have searched for those triangles whose z-values are both above and below z0. Then I have calculated the x and y values of the sides of these triangles where the sides are equal to z0. Then I use scipy.spatial.ConvexHull to sort the intersected points. Using the shoelace formula I can then determine the area.
I have attached the example code here:
import numpy as np
from scipy import spatial
import matplotlib.pyplot as plt
# Generation of random test data
n = 500
x = np.random.random(n)
y = np.random.random(n)
z = np.exp(-2*(x-.5)**2-4*(y-.5)**2)
z0 = .75
# Triangulation of the test data
triang= spatial.Delaunay(np.array([x, y]).T)
# Determine all triangles where not all points are above or below z0, i.e. the triangles that intersect z0
tri_inter=np.zeros_like(triang.simplices, dtype=np.int) # The triangles which intersect the plane at z0, filled below
i = 0
for tri in triang.simplices:
if ~np.all(z[tri] > z0) and ~np.all(z[tri] < z0):
tri_inter[i,:] = tri
i += 1
tri_inter = tri_inter[~np.all(tri_inter==0, axis=1)] # Remove all rows with only 0
# The number of interpolated values for x and y has twice the length of the triangles
# Because each triangle intersects the plane at z0 twice
x_inter = np.zeros(tri_inter.shape[0]*2)
y_inter = np.zeros(tri_inter.shape[0]*2)
for j, tri in enumerate(tri_inter):
# Determine which of the three points are above and which are below z0
points_above = []
points_below = []
for i in tri:
if z[i] > z0:
points_above.append(i)
else:
points_below.append(i)
# Calculate the intersections and put the values into x_inter and y_inter
t = (z0-z[points_below[0]])/(z[points_above[0]]-z[points_below[0]])
x_new = t * (x[points_above[0]]-x[points_below[0]]) + x[points_below[0]]
y_new = t * (y[points_above[0]]-y[points_below[0]]) + y[points_below[0]]
x_inter[j*2] = x_new
y_inter[j*2] = y_new
if len(points_above) > len(points_below):
t = (z0-z[points_below[0]])/(z[points_above[1]]-z[points_below[0]])
x_new = t * (x[points_above[1]]-x[points_below[0]]) + x[points_below[0]]
y_new = t * (y[points_above[1]]-y[points_below[0]]) + y[points_below[0]]
else:
t = (z0-z[points_below[1]])/(z[points_above[0]]-z[points_below[1]])
x_new = t * (x[points_above[0]]-x[points_below[1]]) + x[points_below[1]]
y_new = t * (y[points_above[0]]-y[points_below[1]]) + y[points_below[1]]
x_inter[j*2+1] = x_new
y_inter[j*2+1] = y_new
# sort points to calculate area
hull = spatial.ConvexHull(np.array([x_inter, y_inter]).T)
x_hull, y_hull = x_inter[hull.vertices], y_inter[hull.vertices]
# Calculation of are using the shoelace formula
area = 0.5*np.abs(np.dot(x_hull,np.roll(y_hull,1))-np.dot(y_hull,np.roll(x_hull,1)))
print('Area:', area)
plt.figure()
plt.plot(x_inter, y_inter, 'ro')
plt.plot(x_hull, y_hull, 'b--')
plt.triplot(x, y, triangles=tri_inter, color='k')
plt.show()

Two dimensional FFT showing unexpected frequencies above Nyquisit limit

Note: This question is building on another question of mine:
Two dimensional FFT using python results in slightly shifted frequency
I have some data, basically a function E(x,y) with (x,y) being a (discrete) subset of R^2, mapping to real numbers. For the (x,y) plane i have a fixed distance between data points in x- as well as in y direction (0,2). I want to analyze the frequency spectrum of my E(x,y) signal using a two dimensional fast fourier transform (FFT) using python.
As far as i know, no matter which frequencies are actually contained in my signal, using FFT, i will only be able to see signals below the Nyquisit limit Ny, which is Ny = sampling frequency / 2. In my case i have a real spacing of 0,2, leading to a sampling frequency of 1 / 0,2 = 5 and therefore my Nyquisit limit is Ny = 5 / 2 = 2,5.
If my signal does have frequencies above the Nyquisit limit, they will be "folded" back into the Nyquisit domain, leading to false results (aliasing). But even though i might sample with a too low frequency, it should in theory not be possible to see any frequencies above the Niquisit limit, correct?
So here is my issue: Analyzing my signal should only lead to frequencies of 2,5 max., but i cleary get frequencies higher than that. Given that i am pretty sure about the theory here, there has to be some mistake in my code. I will provide a shortened code version, only providing necessary information for this issue:
simulationArea =... # length of simulation area in x and y direction
x = np.linspace(0, simulationArea, numberOfGridPointsInX, endpoint=False)
y = x
xx, yy = np.meshgrid(x, y)
Ex = np.genfromtxt('E_field_x100.txt') # this is the actual signal to be analyzed, which may have arbitrary frequencies
FTEx = np.fft.fft2(Ex) # calculating fft coefficients of signal
dx = x[1] - x[0] # calculating spacing of signals in real space. 'print(dx)' results in '0.2'
sampleFrequency = 1.0 / dx
nyquisitFrequency = sampleFrequency / 2.0
half = len(FTEx) / 2
fig, axarr = plt.subplots(2, 1)
im1 = axarr[0, 0].imshow(Ex,
origin='lower',
cmap='jet',
extent=(0, simulationArea, 0, simulationArea))
axarr[0, 0].set_xlabel('X', fontsize=14)
axarr[0, 0].set_ylabel('Y', fontsize=14)
axarr[0, 0].set_title('$E_x$', fontsize=14)
fig.colorbar(im1, ax=axarr[0, 0])
im2 = axarr[1, 0].matshow(2 * abs(FTEx[:half, :half]) / half,
aspect='equal',
origin='lower',
interpolation='nearest')
axarr[1, 0].set_xlabel('Frequency wx')
axarr[1, 0].set_ylabel('Frequency wy')
axarr[1, 0].xaxis.set_ticks_position('bottom')
axarr[1, 0].set_title('$FFT(E_x)$', fontsize=14)
fig.colorbar(im2, ax=axarr[1, 0])
The result of this is:
How is that possible? When i am using the same code for very simple signals, it works just fine (e.g. a sine wave in x or y direction with a specific frequency).

Ok here we go! Here’s a couple of simple functions and a complete example that you can use: it’s got a little bit of extra cruft related to plotting and for data generation but the first function, makeSpectrum shows you how to use fftfreq and fftshift plus fft2 to achieve what you want. Let me know if you have questions.
import numpy as np
import numpy.fft as fft
import matplotlib.pylab as plt
def makeSpectrum(E, dx, dy, upsample=10):
"""
Convert a time-domain array `E` to the frequency domain via 2D FFT. `dx` and
`dy` are sample spacing in x (left-right, 1st axis) and y (up-down, 0th
axis) directions. An optional `upsample > 1` will zero-pad `E` to obtain an
upsampled spectrum.
Returns `(spectrum, xf, yf)` where `spectrum` contains the 2D FFT of `E`. If
`Ny, Nx = spectrum.shape`, `xf` and `yf` will be vectors of length `Nx` and
`Ny` respectively, containing the frequencies corresponding to each pixel of
`spectrum`.
The returned spectrum is zero-centered (via `fftshift`). The 2D FFT, and
this function, assume your input `E` has its origin at the top-left of the
array. If this is not the case, i.e., your input `E`'s origin is translated
away from the first pixel, the returned `spectrum`'s phase will *not* match
what you expect, since a translation in the time domain is a modulation of
the frequency domain. (If you don't care about the spectrum's phase, i.e.,
only magnitude, then you can ignore all these origin issues.)
"""
zeropadded = np.array(E.shape) * upsample
F = fft.fftshift(fft.fft2(E, zeropadded)) / E.size
xf = fft.fftshift(fft.fftfreq(zeropadded[1], d=dx))
yf = fft.fftshift(fft.fftfreq(zeropadded[0], d=dy))
return (F, xf, yf)
def extents(f):
"Convert a vector into the 2-element extents vector imshow needs"
delta = f[1] - f[0]
return [f[0] - delta / 2, f[-1] + delta / 2]
def plotSpectrum(F, xf, yf):
"Plot a spectrum array and vectors of x and y frequency spacings"
plt.figure()
plt.imshow(abs(F),
aspect="equal",
interpolation="none",
origin="lower",
extent=extents(xf) + extents(yf))
plt.colorbar()
plt.xlabel('f_x (Hz)')
plt.ylabel('f_y (Hz)')
plt.title('|Spectrum|')
plt.show()
if __name__ == '__main__':
# In seconds
x = np.linspace(0, 4, 20)
y = np.linspace(0, 4, 30)
# Uncomment the next two lines and notice that the spectral peak is no
# longer equal to 1.0! That's because `makeSpectrum` expects its input's
# origin to be at the top-left pixel, which isn't the case for the following
# two lines.
# x = np.linspace(.123 + 0, .123 + 4, 20)
# y = np.linspace(.123 + 0, .123 + 4, 30)
# Sinusoid frequency, in Hz
x0 = 1.9
y0 = -2.9
# Generate data
im = np.exp(2j * np.pi * (y[:, np.newaxis] * y0 + x[np.newaxis, :] * x0))
# Generate spectrum and plot
spectrum, xf, yf = makeSpectrum(im, x[1] - x[0], y[1] - y[0])
plotSpectrum(spectrum, xf, yf)
# Report peak
peak = spectrum[:, np.isclose(xf, x0)][np.isclose(yf, y0)]
peak = peak[0, 0]
print('spectral peak={}'.format(peak))
Results in the following image, and prints out, spectral peak=(1+7.660797103157986e-16j), which is exactly the correct value for the spectrum at the frequency of a pure complex exponential.

Finding the full width half maximum of a peak

I have been trying to figure out the full width half maximum (FWHM) of the the blue peak (see image). The green peak and the magenta peak combined make up the blue peak. I have been using the following equation to find the FWHM of the green and magenta peaks: fwhm = 2*np.sqrt(2*(math.log(2)))*sd where sd = standard deviation. I created the green and magenta peaks and I know the standard deviation which is why I can use that equation.
I created the green and magenta peaks using the following code:
def make_norm_dist(self, x, mean, sd):
import numpy as np
norm = []
for i in range(x.size):
norm += [1.0/(sd*np.sqrt(2*np.pi))*np.exp(-(x[i] - mean)**2/(2*sd**2))]
return np.array(norm)
If I did not know the blue peak was made up of two peaks and I only had the blue peak in my data, how would I find the FWHM?
I have been using this code to find the peak top:
peak_top = 0.0e-1000
for i in x_axis:
if i > peak_top:
peak_top = i
I could divide the peak_top by 2 to find the half height and then try and find y-values corresponding to the half height, but then I would run into trouble if there are no x-values exactly matching the half height.
I am pretty sure there is a more elegant solution to the one I am trying.

You can use spline to fit the [blue curve - peak/2], and then find it's roots:
import numpy as np
from scipy.interpolate import UnivariateSpline
def make_norm_dist(x, mean, sd):
return 1.0/(sd*np.sqrt(2*np.pi))*np.exp(-(x - mean)**2/(2*sd**2))
x = np.linspace(10, 110, 1000)
green = make_norm_dist(x, 50, 10)
pink = make_norm_dist(x, 60, 10)
blue = green + pink
# create a spline of x and blue-np.max(blue)/2
spline = UnivariateSpline(x, blue-np.max(blue)/2, s=0)
r1, r2 = spline.roots() # find the roots
import pylab as pl
pl.plot(x, blue)
pl.axvspan(r1, r2, facecolor='g', alpha=0.5)
pl.show()
Here is the result:

This worked for me in iPython (quick and dirty, can be reduced to 3 lines):
def FWHM(X,Y):
half_max = max(Y) / 2.
#find when function crosses line half_max (when sign of diff flips)
#take the 'derivative' of signum(half_max - Y[])
d = sign(half_max - array(Y[0:-1])) - sign(half_max - array(Y[1:]))
#plot(X[0:len(d)],d) #if you are interested
#find the left and right most indexes
left_idx = find(d > 0)[0]
right_idx = find(d < 0)[-1]
return X[right_idx] - X[left_idx] #return the difference (full width)
Some additions can be made to make the resolution more accurate, but in the limit that there are many samples along the X axis and the data is not too noisy, this works great.
Even when the data are not Gaussian and a little noisy, it worked for me (I just take the first and last time half max crosses the data).

If your data has noise (and it always does in the real world), a more robust solution would be to fit a Gaussian to the data and extract FWHM from that:
import numpy as np
import scipy.optimize as opt
def gauss(x, p): # p[0]==mean, p[1]==stdev
return 1.0/(p[1]*np.sqrt(2*np.pi))*np.exp(-(x-p[0])**2/(2*p[1]**2))
# Create some sample data
known_param = np.array([2.0, .7])
xmin,xmax = -1.0, 5.0
N = 1000
X = np.linspace(xmin,xmax,N)
Y = gauss(X, known_param)
# Add some noise
Y += .10*np.random.random(N)
# Renormalize to a proper PDF
Y /= ((xmax-xmin)/N)*Y.sum()
# Fit a guassian
p0 = [0,1] # Inital guess is a normal distribution
errfunc = lambda p, x, y: gauss(x, p) - y # Distance to the target function
p1, success = opt.leastsq(errfunc, p0[:], args=(X, Y))
fit_mu, fit_stdev = p1
FWHM = 2*np.sqrt(2*np.log(2))*fit_stdev
print "FWHM", FWHM
The plotted image can be generated by:
from pylab import *
plot(X,Y)
plot(X, gauss(X,p1),lw=3,alpha=.5, color='r')
axvspan(fit_mu-FWHM/2, fit_mu+FWHM/2, facecolor='g', alpha=0.5)
show()
An even better approximation would filter out the noisy data below a given threshold before the fit.

Here is a nice little function using the spline approach.
from scipy.interpolate import splrep, sproot, splev
class MultiplePeaks(Exception): pass
class NoPeaksFound(Exception): pass
def fwhm(x, y, k=10):
"""
Determine full-with-half-maximum of a peaked set of points, x and y.
Assumes that there is only one peak present in the datasset. The function
uses a spline interpolation of order k.
"""
half_max = amax(y)/2.0
s = splrep(x, y - half_max, k=k)
roots = sproot(s)
if len(roots) > 2:
raise MultiplePeaks("The dataset appears to have multiple peaks, and "
"thus the FWHM can't be determined.")
elif len(roots) < 2:
raise NoPeaksFound("No proper peaks were found in the data set; likely "
"the dataset is flat (e.g. all zeros).")
else:
return abs(roots[1] - roots[0])

You should use scipy to solve it: first find_peaks and then peak_widths.
With default value in rel_height(0.5) you're measuring the width at half maximum of the peak.

If you prefer interpolation over fitting:
import numpy as np
def get_full_width(x: np.ndarray, y: np.ndarray, height: float = 0.5) -> float:
height_half_max = np.max(y) * height
index_max = np.argmax(y)
x_low = np.interp(height_half_max, y[:index_max+1], x[:index_max+1])
x_high = np.interp(height_half_max, np.flip(y[index_max:]), np.flip(x[index_max:]))
return x_high - x_low

For monotonic functions with many data points and if there's no need for perfect accuracy, I would use:
def FWHM(X, Y):
deltax = x[1] - x[0]
half_max = max(Y) / 2.
l = np.where(y > half_max, 1, 0)
return np.sum(l) * deltax

I implemented an empirical solution which works for noisy and not-quite-Gaussian data fairly well in haggis.math.full_width_half_max. The usage is extremely straightforward:
fwhm = full_width_half_max(x, y)
The function is robust: it simply finds the maximum of the data and the nearest points crossing the "halfway down" threshold using the requested interpolation scheme.
Here are a couple of examples using data from the other answers.
#HYRY's smooth data
def make_norm_dist(x, mean, sd):
return 1.0/(sd*np.sqrt(2*np.pi))*np.exp(-(x - mean)**2/(2*sd**2))
x = np.linspace(10, 110, 1000)
green = make_norm_dist(x, 50, 10)
pink = make_norm_dist(x, 60, 10)
blue = green + pink
# create a spline of x and blue-np.max(blue)/2
spline = UnivariateSpline(x, blue-np.max(blue)/2, s=0)
r1, r2 = spline.roots() # find the roots
# Compute using my function
fwhm, (x1, y1), (x2, y2) = full_width_half_max(x, blue, return_points=True)
# Print comparison
print('HYRY:', r2 - r1, 'MP:', fwhm)
plt.plot(x, blue)
plt.axvspan(r1, r2, facecolor='g', alpha=0.5)
plt.plot(x1, y1, 'r.')
plt.plot(x2, y2, 'r.')
For smooth data, the results are pretty exact:
HYRY: 26.891157007233254 MP: 26.891193606203814
#Hooked's Noisy Data
def gauss(x, p): # p[0]==mean, p[1]==stdev
return 1.0/(p[1]*np.sqrt(2*np.pi))*np.exp(-(x-p[0])**2/(2*p[1]**2))
# Create some sample data
known_param = np.array([2.0, .7])
xmin,xmax = -1.0, 5.0
N = 1000
X = np.linspace(xmin,xmax,N)
Y = gauss(X, known_param)
# Add some noise
Y += .10*np.random.random(N)
# Renormalize to a proper PDF
Y /= ((xmax-xmin)/N)*Y.sum()
# Fit a guassian
p0 = [0,1] # Inital guess is a normal distribution
errfunc = lambda p, x, y: gauss(x, p) - y # Distance to the target function
p1, success = opt.leastsq(errfunc, p0[:], args=(X, Y))
fit_mu, fit_stdev = p1
FWHM = 2*np.sqrt(2*np.log(2))*fit_stdev
# Compute using my function
fwhm, (x1, y1), (x2, y2) = full_width_half_max(X, Y, return_points=True)
# Print comparison
print('Hooked:', FWHM, 'MP:', fwhm)
plt.plot(X, Y)
plt.plot(X, gauss(X, p1), lw=3, alpha=.5, color='r')
plt.axvspan(fit_mu - FWHM / 2, fit_mu + FWHM / 2, facecolor='g', alpha=0.5)
plt.plot(x1, y1, 'r.')
plt.plot(x2, y2, 'r.')
For noisy data (with a biased baseline), the results are not as consistent.
Hooked: 1.9903193212254346 MP: 1.5039676990530118
On the one hand the Gaussian fit is not very optimal for the data, but on the other hand, the strategy of picking the nearest point that intersects the half-max threshold is likely not optimal either.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Normalize histogram2d by bin area - python

Related

Interpolating non-uniformly distributed points on a 3D sphere

Matplotlib draw vertical lines up to a curve

How can I calculate arbitrary values from a spline created with scipy.interpolate.Rbf?

Two dimensional FFT showing unexpected frequencies above Nyquisit limit

Finding the full width half maximum of a peak

Categories

Resources