Python find peaks - wrong x axis - python

i got the following code:
Frequency = df['x [Hz]']
Spectrum = df['test_spec']
x = Spectrum
peaks, _ = find_peaks(x, distance=20)
plt.plot(peaks, x[peaks], "xr"); plt.plot(x); plt.legend(['distance'])
plt.show()
The variable "Frequency" contains the frequencies of an a third band octave band spectrum from 5 - 315 HZ. "Spectrum" contains the associated Noisepressurelevels. Now i want to find peaks in that spectrum. the Value i need is the Frequency, where the peak is located.
The problem is that the plot shows a x-axis with the steps 0,5,10,15, but i want a x-axis-scale with my Frequencies saved in the variable "Frequency".
Hope you can help me.
Thank you for your support.

The documentation of find_peaks() can be a bit confusing, as it calls its input x while in most situations that input would be drawn on the y-axis. find_peaks() doesn't care about the x-axis, supposing it is just the same as an array index (0,1,2,...).
To draw your curve, you need to plot using Frequency on the x-axis, and Spectrum on the y-axis. You can visualize the peaks by using them as an index in both arrays:
import matplotlib.pyplot as plt
from scipy.signal import find_peaks
import numpy as np
Frequency = np.linspace(5, 315, 200)
Spectrum = np.random.randn(200).cumsum()
Spectrum += 1 - Spectrum.min()
peaks, _ = find_peaks(Spectrum, distance=20)
plt.plot(Frequency[peaks], Spectrum[peaks], "xr")
plt.plot(Frequency, Spectrum)
plt.legend(['distance'])
plt.tight_layout()
plt.show()

Related

Python Fourier zero padding

Problem
I have a spectrum that can be download here: https://www.dropbox.com/s/ax1b32aotuzx9f1/example_spectrum.npy?dl=0
Using Python, I am trying to use zero padding to increase the number of points in the frequency domain. To do so I rely on scipy.fft and scipy.ifft functions. I do not obtain the desired result, and would be grateful for anyone that could explain why that is.
Code
Here is the code I have tried:
import numpy as np
from scipy.fft import fft, ifft
import matplotlib.pyplot as plt
spectrum = np.load('example_spectrum.npy')
spectrum_time = ifft(spectrum) # In time domain
spectrum_oversampled = fft(spectrum_time, len(spectrum)+1000) # FFT of zero padded spectrum
xaxis = np.linspace(0, len(spectrum)-1, len(spectrum_oversampled)) # to plot oversampled spectrum
fig, (ax1, ax2) = plt.subplots(2,1)
ax1.plot(spectrum, '.-')
ax1.plot(xaxis, spectrum_oversampled)
ax1.set_xlim(500, 1000)
ax1.set_xlabel('Arbitrary units')
ax1.set_ylabel('Normalized flux')
ax1.set_title('Frequency domain')
ax2.plot(spectrum_time)
ax2.set_ylim(-0.02, 0.02)
ax2.set_title('Time domain')
ax2.set_xlabel('bin number')
plt.tight_layout()
plt.show()
Results
Added figure to show results. Blue is original spectrum, orange is zero padded spectrum.
Expected behavior
I would expect the zero padding to result in a sort of sinc interpolation of the original spectrum. However, the orange curve does not go through the points of the original spectrum.
Does anyone have any idea why I obtain this behavior and/or how to fix this?

How to group peak points corresponding to the same hill in signal plot

I have a list of points where I need to find peak points and group them. I am using find_peak() function from scipy.signal to find the peak points, Now I need to group the peak points which correspond to the same hill (as mentioned below). How can we do this, any suggestion would be of great help.
Sample images
Code
from matplotlib import pyplot as plt
from scipy.signal import find_peaks
# lst has list of points
A = np.array(lst)
peaks, _ = find_peaks(A)
plt.figure()
plt.plot(lst)
plt.plot(peaks, A[peaks], "ro")
plt.grid()
plt.show()
A typical way to group peaks is to low-pass filter the waveform. Lower the frequency cut-off of the low-pass filter until the peaks you think belong together for your "hills" merge. Then try the find peaks function.

matplotlib hist function argument density not working

plt.hist's density argument does not work.
I tried to use the density argument in the plt.hist function to normalize stock returns in my plot, but it didn't work.
The following code worked fine for me and give me the probability density function which I desired.
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(19680801)
# example data
mu = 100 # mean of distribution
sigma = 15 # standard deviation of distribution
x = mu + sigma * np.random.randn(437)
num_bins = 50
plt.hist(x, num_bins, density=1)
plt.show()
But when I tried it with stock data, it simply didn't work. The result gave the unnormalized data. I didn't find any abnormal data in my data array.
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
plt.hist(returns, 50,density = True)
plt.show()
# "returns" is a np array consisting of 360 days of stock returns
This is a known issue in Matplotlib.
As stated in Bug Report: The density flag in pyplot.hist() does not work correctly
When density = False, the histogram plot would have counts on the Y-axis. But when density = True, the Y-axis does not mean anything useful. I think a better implementation would plot the PDF as the histogram when density = True.
The developers view this as a feature not a bug since it maintains compatibility with numpy. They have closed several the bug reports about it already with since it is working as intended. Creating even more confusion the example on the matplotlib site appears to show this feature working with the y-axis being assigned a meaningful value.
What you want to do with matplotlib is reasonable but matplotlib will not let you do it that way.
It is not a bug.
Area of the bars equal to 1.
Numbers only seem strange because your bin sizes are small
Since this isn't resolved; based on #user14518925's response which is actually correct, this is treating bin width as an actual valid number whereas from my understanding you want each bin to have a width of 1 such that the sum of frequencies is 1. More succinctly, what you're seeing is:
\sum_{i}y_{i}\times\text{bin size} =1
Whereas what you want is:
\sum_{i}y_{i} =1
therefore, all you really need to change is the tick labels on the y-axis. One way to this is to disable the density option :
density = false
and instead divide by the total sample size as such (shown in your example):
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(19680801)
# example data
mu = 0 # mean of distribution
sigma = 0.0000625 # standard deviation of distribution
x = mu + sigma * np.random.randn(437)
fig = plt.figure()
plt.hist(x, 50, density=False)
locs, _ = plt.yticks()
print(locs)
plt.yticks(locs,np.round(locs/len(x),3))
plt.show()
Another approach, besides that of tvbc, is to change the yticks on the plot.
import matplotlib.pyplot as plt
import numpy as np
steps = 10
bins = np.arange(0, 101, steps)
data = np.random.random(100000) * 100
plt.hist(data, bins=bins, density=True)
yticks = plt.gca().get_yticks()
plt.yticks(yticks, np.round(yticks * steps, 2))
plt.show()

How to plot the angle frequency distribution curve in python

I have only the angle values for a set of data. Now i need to plot a angle distribution curve ie., angle on the x axis v/s no.of times/frequency of angle occurring on the y axis.
These are the angles sorted out for a set of data:-
[98.1706427, 99.09896751, 99.10879006, 100.47518838, 101.22770381, 101.70374296,
103.15715294, 104.4653976,105.50441485, 106.82885361, 107.4605319, 108.93228646,
111.22463712, 112.23658018, 113.31223886, 113.4000603, 114.14565594, 114.79809084,
115.15788861, 115.42991416, 115.66216071, 115.69821092, 116.56319054, 117.09232139,
119.30835385, 119.31377834, 125.88278338, 127.80937901, 132.16187185, 132.61262906,
136.6751744, 138.34164387,]
How can i do this..??
How can i write a python program for this...?? and plot it in a graph as a distribution curve
Function hist actually returns the x and y coordinates of the bins. You can use this function to prepare the data for the line plot:
y, x, _ = plt.hist(angles) # No need for the 3rd return value
xc = (x[:-1] + x[1:]) / 2 # Take centerpoints
# plt.clf()
plt.plot(xc, y)
plt.show() # Etc.
You will end up having both the histogram and the line plot. If this is not desirable, clean the canvas before plotting the line by uncommenting the call to clf().
EDIT:
If you want a line plot as well, it is better to generate the histogram with numpy and then use that information also for the line:
from matplotlib import pyplot as plt
import numpy as np
angles = [98.1706427, 99.09896751, 99.10879006, 100.47518838, 101.22770381,
101.70374296, 103.15715294, 104.4653976, 105.50441485, 106.82885361,
107.4605319, 108.93228646, 111.22463712, 112.23658018, 113.31223886,
113.4000603, 114.14565594, 114.79809084, 115.15788861, 115.42991416,
115.66216071, 115.69821092, 116.56319054, 117.09232139, 119.30835385,
119.31377834, 125.88278338, 127.80937901, 132.16187185, 132.61262906,
136.6751744, 138.34164387, ]
hist,edges = np.histogram(angles, bins=20)
bin_centers = 0.5*(edges[:-1] + edges[1:])
bin_widths = (edges[1:]-edges[:-1])
plt.bar(bin_centers,hist,width=bin_widths)
plt.plot(bin_centers, hist,'r')
plt.xlabel('angle [$^\circ$]')
plt.ylabel('frequency')
plt.show()
this looks like this:
If you are not interested in the histogram itself, leave out the line plt.bar(bin_centers,hist,width=bin_widths).
EDIT2:
I don't really see the scientific value in a smoothed histogram. If you increase the resolution of the histogram (the bins parameter in the np.histogram command), it can change quite considerably. For instance, new peaks may occur if you increase the bin count, or two peaks may merge into one if you decrease the bin count. Keeping this in mind, smoothing the histogram curve suggests that you have more data than you do. However, if you really must, you can smooth a curve as explained in this answer, i.e.
from scipy.interpolate import spline
x = np.linspace(edges[0], edges[-1], 500)
y = spline(bin_centers, hist, x)
and then plot y over x.

How can I plot ca. 20 million points as a scatterplot?

I am trying to create a scatterplot with matplotlib that consists of ca. ca. 20 million data points. Even after setting the alpha value to its lowest before ending up with no visible data at all the result is just a completely black plot.
plt.scatter(timedPlotData, plotData, alpha=0.01, marker='.')
The x-axis is a continuous timeline of about 2 months and the y-axis consists of 150k consecutive integer values.
Is there any way to plot all the points so that their distribution over time is still visible?
Thank you for your help.
There's more than one way to do this. A lot of folks have suggested a heatmap/kernel-density-estimate/2d-histogram. #Bucky suggesed using a moving average. In addition, you can fill between a moving min and moving max, and plot the moving mean over the top. I often call this a "chunkplot", but that's a terrible name. The implementation below assumes that your time (x) values are monotonically increasing. If they're not, it's simple enough to sort y by x before "chunking" in the chunkplot function.
Here are a couple of different ideas. Which is best will depend on what you want to emphasize in the plot. Note that this will be rather slow to run, but that's mostly due to the scatterplot. The other plotting styles are much faster.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime as dt
np.random.seed(1977)
def main():
x, y = generate_data()
fig, axes = plt.subplots(nrows=3, sharex=True)
for ax in axes.flat:
ax.xaxis_date()
fig.autofmt_xdate()
axes[0].set_title('Scatterplot of all data')
axes[0].scatter(x, y, marker='.')
axes[1].set_title('"Chunk" plot of data')
chunkplot(x, y, chunksize=1000, ax=axes[1],
edgecolor='none', alpha=0.5, color='gray')
axes[2].set_title('Hexbin plot of data')
axes[2].hexbin(x, y)
plt.show()
def generate_data():
# Generate a very noisy but interesting timeseries
x = mdates.drange(dt.datetime(2010, 1, 1), dt.datetime(2013, 9, 1),
dt.timedelta(minutes=10))
num = x.size
y = np.random.random(num) - 0.5
y.cumsum(out=y)
y += 0.5 * y.max() * np.random.random(num)
return x, y
def chunkplot(x, y, chunksize, ax=None, line_kwargs=None, **kwargs):
if ax is None:
ax = plt.gca()
if line_kwargs is None:
line_kwargs = {}
# Wrap the array into a 2D array of chunks, truncating the last chunk if
# chunksize isn't an even divisor of the total size.
# (This part won't use _any_ additional memory)
numchunks = y.size // chunksize
ychunks = y[:chunksize*numchunks].reshape((-1, chunksize))
xchunks = x[:chunksize*numchunks].reshape((-1, chunksize))
# Calculate the max, min, and means of chunksize-element chunks...
max_env = ychunks.max(axis=1)
min_env = ychunks.min(axis=1)
ycenters = ychunks.mean(axis=1)
xcenters = xchunks.mean(axis=1)
# Now plot the bounds and the mean...
fill = ax.fill_between(xcenters, min_env, max_env, **kwargs)
line = ax.plot(xcenters, ycenters, **line_kwargs)[0]
return fill, line
main()
For each day, tally up the frequency of each value (a collections.Counter will do this nicely), then plot a heatmap of the values, one per day. For publication, use a grayscale for the heatmap colors.
My recommendation would be to use a sorting and moving average algorithm on the raw data before you plot it. This should leave the mean and trend intact over the time period of interest while providing you with a reduction in clutter on the plot.
Group values into bands on each day and use a 3d histogram of count, value band, day.
That way you can get the number of occurrences in a given band on each day clearly.

Categories

Resources