I am trying to perform Scipy FFT on my dataset. Basically, I have the acceleration in the time domain(obtained numerically) and I am just trying to perform Fourier transform, to obtain the spectrum. I have a theoretical expression for Fourier transformed acceleration in a small and large frequency limit. For large frequency, the Fourier transformed acceleration should drop exponentially. However, I am getting a valley in the graph after an initial decay. Below are my code and graph
a_w = []
for k in range(len(b)): # b is paramter to be varied
window = signal.kaiser(N, 30) # I am not sure about using Kaiser wind
ft = fft(solaccarr[k]*window)
ft = np.abs(ft[:N // 2])*1/N
freq = fftfreq(N, T)[:N // 2]
xf = np.linspace(0.0, 1.0 / (2.0 * float(T)), N // 2)
a_w.append(ft)
I am plotting the graph on a log-log scale. My question is that whether it's possible to get rid of the kinks in graph by appropriate use of windows or any other techniques?
Here is the dataset which I used
These valleys may correspond to the end of the main lobe of the Kaiser window after translations.
If the input signal features a finite number of well-defined frequencies (ex: a sum of two sine waves), its Fourier transform is a Dirac comb. Multiplying the signal and the window corresponds to convolving the DFT transform of the signal by the DFT transform of the window. As convolving with a Dirac signal corresponds to a translation, the outcome of the process is a finite sum of translated DFT transforms of the window.
The transform of the Kaiser window features a main lobe and side lobes separated by such valleys. Hence, the outcome may also features translated valleys. It can be tested by modifying 30 in window = signal.kaiser(N, 30): could you try figures like 0, 5, 6 and 8.6? It should translate the valley from left to right or right to left, as it modifies the width of the main lobe.
If you just want to get rid of the deep valleys, you can explore the exponential window eventually combined to an Hann window to form a Hann–Poisson window. This window does not feature any side lobes.
Finally, if your signal is periodic and if the length of the frame is a multiple of the period, there is no need for a window!
Related
This is my first post so apologies for any formatting related issues.
So I have a dataset which was obtained from an atomic microscope. The data looks like a 1024x1024 matrix which is composed of different measurements taken from the sample in units of meters, eg.
data = [[1e-07 ... 4e-08][ ... ... ... ][3e-09 ... 12e-06]]
np.size(data) == (1024,1024)
From this data, I was hoping to 1) derive some statistics about the real data; and 2) using the power spectrum density (PSD) distribution, hopefully create a new dataset which is different, but statistically similar to the characteristics of the original data. My plan to do this was 2a) take a 2d fft of data, calculate the power spectrum density 2b) some method?, 2c) take the 2d ifft of the modified signal to turn it back into a new sample with the same power spectrum density as the original.
Moreover, regarding part 2b) this was the closest link I could find regarding a time series based solution; however, I am not understanding exactly how to implement this so far, since I am not exactly sure what the phase, frequency, and amplitudes of the fft data represent in this 2d case, and also since we are now talking about a 2d ifft I'm not exactly sure how to construct this complex matrix while incorporating the random number generation, and amplitude/phase shifts in a way that will translate back to something meaningful.
So basically, I have been having some trouble with my intuition. For this problem, we are working with a 2d Fourier transform of spatial data with no temporal component, so I believe that methods which are applied to time series data could be applied here as well. Since the fft of the original data is the 'frequency in the spatial domain', the x-axis of the PSD should be either pixels or meters, but then what is the 'power' in the y-axis describing? I was hoping that someone could help me figure this problem out.
My code is below, hopefully someone could help me solve my problem. Bonus if someone could help me understand what this shifted frequency vs amplitude plot is saying:
here is the image with the fft, shifted fft, and freq. vs aplitude plots.
Fortunately the power spectrum density function is a bit easier to understand
Thank you all for your time.
data = np.genfromtxt('asample3.0_00001-filter.txt')
x = np.arange(0,int(np.size(data,0)),1)
y = np.arange(0,int(np.size(data,1)),1)
z = data
npix = data.shape[0]
#taking the fourier transform
fourier_image = np.fft.fft2(data)
#Get power spectral density
fourier_amplitudes = np.abs(fourier_image)**2
#calculate sampling frequency fs (physical distance between pixels)
fs = 92e-07/npix
freq_shifted = fs/2 * np.linspace(-1,1,npix)
freq = fs/2 * np.linspace(0,1,int(npix/2))
print("Plotting 2d Fourier Transform ...")
fig, axs = plt.subplots(2,2,figsize=(15, 15))
axs[0,0].imshow(10*np.log10(np.abs(fourier_image)))
axs[0,0].set_title('fft')
axs[0,1].imshow(10*np.log10(np.abs(np.fft.fftshift(fourier_image))))
axs[0,1].set_title('shifted fft')
axs[1,0].plot(freq,10*np.log10(np.abs(fourier_amplitudes[:npix//2])))
axs[1,0].set_title('freq vs amplitude')
for ii in list(range(npix//2)):
axs[1,1].plot(freq_shifted,10*np.log10(np.fft.fftshift(np.abs(fourier_amplitudes[ii]))))
axs[1,1].set_title('shifted freq vs amplitude')
#constructing a wave vector array
## Get frequencies corresponding to signal PSD
kfreq = np.fft.fftfreq(npix) * npix
kfreq2D = np.meshgrid(kfreq, kfreq)
knrm = np.sqrt(kfreq2D[0]**2 + kfreq2D[1]**2)
knrm = knrm.flatten()
fourier_amplitudes = fourier_amplitudes.flatten()
#creating the power spectrum
kbins = np.arange(0.5, npix//2+1, 1.)
kvals = 0.5 * (kbins[1:] + kbins[:-1])
Abins, _, _ = stats.binned_statistic(knrm, fourier_amplitudes,
statistic = "mean",
bins = kbins)
Abins *= np.pi * (kbins[1:]**2 - kbins[:-1]**2)
print("Plotting power spectrum of surface ...")
fig = plt.figure(figsize=(10, 10))
plt.loglog(fs/kvals, Abins)
plt.xlabel("Spatial Frequency $k$ [meters]")
plt.ylabel("Power per Spatial Frequency $P(k)$")
plt.tight_layout()
Suppose one wanted to find the period of a given sinusoidal wave signal. From what I have read online, it appears that the two main approaches employ either fourier analysis or autocorrelation. I am trying to automate the process using python and my usage case is to apply this concept to similar signals that come from the time-series of positions (or speeds or accelerations) of simulated bodies orbiting a star.
For simple-examples-sake, consider x = sin(t) for 0 ≤ t ≤ 10 pi.
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt
## sample data
t = np.linspace(0, 10 * np.pi, 100)
x = np.sin(t)
fig, ax = plt.subplots()
ax.plot(t, x, color='b', marker='o')
ax.grid(color='k', alpha=0.3, linestyle=':')
plt.show()
plt.close(fig)
Given a sine-wave of the form x = a sin(b(t+c)) + d, the period of the sine-wave is obtained as 2 * pi / b. Since b=1 (or by visual inspection), the period of our sine wave is 2 * pi. I can check the results obtained from other methods against this baseline.
Attempt 1: Autocorrelation
As I understand it (please correct me if I'm wrong), correlation can be used to see if one signal is a time-lagged copy of another signal (similar to how cosine and sine differ by a phase difference). So autocorrelation is testing a signal against itself to measure the times at which the time-lag repeats said signal. Using the example posted here:
result = np.correlate(x, x, mode='full')
Since x and t each consist of 100 elements and result consists of 199 elements, I am not sure why I should arbitrarily select the last 100 elements.
print("\n autocorrelation (shape={}):\n{}\n".format(result.shape, result))
autocorrelation (shape=(199,)):
[ 0.00000000e+00 -3.82130761e-16 -9.73648712e-02 -3.70014208e-01
-8.59889695e-01 -1.56185995e+00 -2.41986054e+00 -3.33109112e+00
-4.15799070e+00 -4.74662427e+00 -4.94918053e+00 -4.64762251e+00
-3.77524157e+00 -2.33298717e+00 -3.97976240e-01 1.87752669e+00
4.27722402e+00 6.54129270e+00 8.39434617e+00 9.57785701e+00
9.88331103e+00 9.18204933e+00 7.44791758e+00 4.76948221e+00
1.34963425e+00 -2.50822289e+00 -6.42666652e+00 -9.99116299e+00
-1.27937834e+01 -1.44791297e+01 -1.47873668e+01 -1.35893098e+01
-1.09091510e+01 -6.93157447e+00 -1.99159756e+00 3.45267493e+00
8.86228186e+00 1.36707567e+01 1.73433176e+01 1.94357232e+01
1.96463736e+01 1.78556800e+01 1.41478477e+01 8.81191526e+00
2.32100171e+00 -4.70897483e+00 -1.15775811e+01 -1.75696560e+01
-2.20296487e+01 -2.44327920e+01 -2.44454330e+01 -2.19677060e+01
-1.71533510e+01 -1.04037163e+01 -2.33560966e+00 6.27458308e+00
1.45655029e+01 2.16769872e+01 2.68391837e+01 2.94553896e+01
2.91697473e+01 2.59122266e+01 1.99154591e+01 1.17007613e+01
2.03381596e+00 -8.14633251e+00 -1.78184255e+01 -2.59814393e+01
-3.17580589e+01 -3.44884934e+01 -3.38046447e+01 -2.96763956e+01
-2.24244433e+01 -1.26974172e+01 -1.41464998e+00 1.03204331e+01
2.13281784e+01 3.04712823e+01 3.67721634e+01 3.95170295e+01
3.83356037e+01 3.32477037e+01 2.46710643e+01 1.33886439e+01
4.77778141e-01 -1.27924775e+01 -2.50860560e+01 -3.51343866e+01
-4.18671622e+01 -4.45258983e+01 -4.27482779e+01 -3.66140001e+01
-2.66465884e+01 -1.37700036e+01 7.76494745e-01 1.55574483e+01
2.90828312e+01 3.99582426e+01 4.70285203e+01 4.95000000e+01
4.70285203e+01 3.99582426e+01 2.90828312e+01 1.55574483e+01
7.76494745e-01 -1.37700036e+01 -2.66465884e+01 -3.66140001e+01
-4.27482779e+01 -4.45258983e+01 -4.18671622e+01 -3.51343866e+01
-2.50860560e+01 -1.27924775e+01 4.77778141e-01 1.33886439e+01
2.46710643e+01 3.32477037e+01 3.83356037e+01 3.95170295e+01
3.67721634e+01 3.04712823e+01 2.13281784e+01 1.03204331e+01
-1.41464998e+00 -1.26974172e+01 -2.24244433e+01 -2.96763956e+01
-3.38046447e+01 -3.44884934e+01 -3.17580589e+01 -2.59814393e+01
-1.78184255e+01 -8.14633251e+00 2.03381596e+00 1.17007613e+01
1.99154591e+01 2.59122266e+01 2.91697473e+01 2.94553896e+01
2.68391837e+01 2.16769872e+01 1.45655029e+01 6.27458308e+00
-2.33560966e+00 -1.04037163e+01 -1.71533510e+01 -2.19677060e+01
-2.44454330e+01 -2.44327920e+01 -2.20296487e+01 -1.75696560e+01
-1.15775811e+01 -4.70897483e+00 2.32100171e+00 8.81191526e+00
1.41478477e+01 1.78556800e+01 1.96463736e+01 1.94357232e+01
1.73433176e+01 1.36707567e+01 8.86228186e+00 3.45267493e+00
-1.99159756e+00 -6.93157447e+00 -1.09091510e+01 -1.35893098e+01
-1.47873668e+01 -1.44791297e+01 -1.27937834e+01 -9.99116299e+00
-6.42666652e+00 -2.50822289e+00 1.34963425e+00 4.76948221e+00
7.44791758e+00 9.18204933e+00 9.88331103e+00 9.57785701e+00
8.39434617e+00 6.54129270e+00 4.27722402e+00 1.87752669e+00
-3.97976240e-01 -2.33298717e+00 -3.77524157e+00 -4.64762251e+00
-4.94918053e+00 -4.74662427e+00 -4.15799070e+00 -3.33109112e+00
-2.41986054e+00 -1.56185995e+00 -8.59889695e-01 -3.70014208e-01
-9.73648712e-02 -3.82130761e-16 0.00000000e+00]
Attempt 2: Fourier
Since I am not sure where to go from the last attempt, I sought a new attempt. To my understanding, Fourier analysis basically shifts a signal from/to the time-domain (x(t) vs t) to/from the frequency domain (x(t) vs f=1/t); the signal in frequency-space should appear as a sinusoidal wave that dampens over time. The period is obtained from the most observed frequency since this is the location of the peak of the distribution of frequencies.
Since my values are all real-valued, applying the Fourier transform should mean my output values are all complex-valued. I wouldn't think this is a problem, except for the fact that scipy has methods for real-values. I do not fully understand the differences between all of the different scipy methods. That makes following the algorithm proposed in this posted solution hard for me to follow (ie, how/why is the threshold value picked?).
omega = np.fft.fft(x)
freq = np.fft.fftfreq(x.size, 1)
threshold = 0
idx = np.where(abs(omega)>threshold)[0][-1]
max_f = abs(freq[idx])
print(max_f)
This outputs 0.01, meaning the period is 1/0.01 = 100. This doesn't make sense either.
Attempt 3: Power Spectral Density
According to the scipy docs, I should be able to estimate the power spectral density (psd) of the signal using a periodogram (which, according to wikipedia, is the fourier transform of the autocorrelation function). By selecting the dominant frequency fmax at which the signal peaks, the period of the signal can be obtained as 1 / fmax.
freq, pdensity = signal.periodogram(x)
fig, ax = plt.subplots()
ax.plot(freq, pdensity, color='r')
ax.grid(color='k', alpha=0.3, linestyle=':')
plt.show()
plt.close(fig)
The periodogram shown below peaks at 49.076... at a frequency of fmax = 0.05. So, period = 1/fmax = 20. This doesn't make sense to me. I have a feeling it has something to do with the sampling rate, but don't know enough to confirm or progress further.
I realize I am missing some fundamental gaps in understanding how these things work. There are a lot of resources online, but it's hard to find this needle in the haystack. Can someone help me learn more about this?
Let's first look at your signal (I've added endpoint=False to make the division even):
t = np.linspace(0, 10*np.pi, 100, endpoint=False)
x = np.sin(t)
Let's divide out the radians (essentially by taking t /= 2*np.pi) and create the same signal by relating to frequencies:
fs = 20 # Sampling rate of 100/5 = 20 (e.g. Hz)
f = 1 # Signal frequency of 1 (e.g. Hz)
t = np.linspace(0, 5, 5*fs, endpoint=False)
x = np.sin(2*np.pi*f*t)
This makes it more salient that f/fs == 1/20 == 0.05 (i.e. the periodicity of the signal is exactly 20 samples). Frequencies in a digital signal always relate to its sampling rate, as you have already guessed. Note that the actual signal is exactly the same no matter what the values of f and fs are, as long as their ratio is the same:
fs = 1 # Natural units
f = 0.05
t = np.linspace(0, 100, 100*fs, endpoint=False)
x = np.sin(2*np.pi*f*t)
In the following I'll use these natural units (fs = 1). The only difference will be in t and hence the generated frequency axes.
Autocorrelation
Your understanding of what the autocorrelation function does is correct. It detects the correlation of a signal with a time-lagged version of itself. It does this by sliding the signal over itself as seen in the right column here (from Wikipedia):
Note that as both inputs to the correlation function are the same, the resulting signal is necessarily symmetric. That is why the output of np.correlate is usually sliced from the middle:
acf = np.correlate(x, x, 'full')[-len(x):]
Now index 0 corresponds to 0 delay between the two copies of the signal.
Next you'll want to find the index or delay that presents the largest correlation. Due to the shrinking overlap this will by default also be index 0, so the following won't work:
acf.argmax() # Always returns 0
Instead I recommend to find the largest peak instead, where a peak is defined to be any index with a larger value than both its direct neighbours:
inflection = np.diff(np.sign(np.diff(acf))) # Find the second-order differences
peaks = (inflection < 0).nonzero()[0] + 1 # Find where they are negative
delay = peaks[acf[peaks].argmax()] # Of those, find the index with the maximum value
Now delay == 20, which tells you that the signal has a frequency of 1/20 of its sampling rate:
signal_freq = fs/delay # Gives 0.05
Fourier transform
You used the following to calculate the FFT:
omega = np.fft.fft(x)
freq = np.fft.fftfreq(x.size, 1)
Thhese functions re designed for complex-valued signals. They will work for real-valued signals, but you'll get a symmetric output as the negative frequency components will be identical to the positive frequency components. NumPy provides separate functions for real-valued signals:
ft = np.fft.rfft(x)
freqs = np.fft.rfftfreq(len(x), t[1]-t[0]) # Get frequency axis from the time axis
mags = abs(ft) # We don't care about the phase information here
Let's have a look:
plt.plot(freqs, mags)
plt.show()
Note two things: the peak is at frequency 0.05, and the maximum frequency on the axis is 0.5 (the Nyquist frequency, which is exactly half the sampling rate). If we had picked fs = 20, this would be 10.
Now let's find the maximum. The thresholding method you have tried can work, but the target frequency bin is selected blindly and so this method would suffer in the presence of other signals. We could just select the maximum value:
signal_freq = freqs[mags.argmax()] # Gives 0.05
However, this would fail if, e.g., we have a large DC offset (and hence a large component in index 0). In that case we could just select the highest peak again, to make it more robust:
inflection = np.diff(np.sign(np.diff(mags)))
peaks = (inflection < 0).nonzero()[0] + 1
peak = peaks[mags[peaks].argmax()]
signal_freq = freqs[peak] # Gives 0.05
If we had picked fs = 20, this would have given signal_freq == 1.0 due to the different time axis from which the frequency axis was generated.
Periodogram
The method here is essentially the same. The autocorrelation function of x has the same time axis and period as x, so we can use the FFT as above to find the signal frequency:
pdg = np.fft.rfft(acf)
freqs = np.fft.rfftfreq(len(x), t[1]-t[0])
plt.plot(freqs, abs(pdg))
plt.show()
This curve obviously has slightly different characteristics from the direct FFT on x, but the main takeaways are the same: the frequency axis ranges from 0 to 0.5*fs, and we find a peak at the same signal frequency as before: freqs[abs(pdg).argmax()] == 0.05.
Edit:
To measure the actual periodicity of np.sin, we can just use the "angle axis" that we passed to np.sin instead of the time axis when generating the frequency axis:
freqs = np.fft.rfftfreq(len(x), 2*np.pi*f*(t[1]-t[0]))
rad_period = 1/freqs[mags.argmax()] # 6.283185307179586
Though that seems pointless, right? We pass in 2*np.pi and we get 2*np.pi. However, we can do the same with any regular time axis, without presupposing pi at any point:
fs = 10
t = np.arange(1000)/fs
x = np.sin(t)
rad_period = 1/np.fft.rfftfreq(len(x), 1/fs)[abs(np.fft.rfft(x)).argmax()] # 6.25
Naturally, the true value now lies in between two bins. That's where interpolation comes in and the associated need to choose a suitable window function.
I'm using FFT to extract the amplitude of each frequency components from an audio file. Actually, there is already a function called Plot Spectrum in Audacity that can help to solve the problem. Taking this example audio file which is composed of 3kHz sine and 6kHz sine, the spectrum result is like the following picture. You can see peaks are at 3KHz and 6kHz, no extra frequency.
Now I need to implement the same function and plot the similar result in Python. I'm close to the Audacity result with the help of rfft but I still have problems to solve after getting this result.
What's physical meaning of the amplitude in the second picture?
How to normalize the amplitude to 0dB like the one in Audacity?
Why do the frequency over 6kHz have such high amplitude (≥90)? Can I scale those frequency to relative low level?
Related code:
import numpy as np
from pylab import plot, show
from scipy.io import wavfile
sample_rate, x = wavfile.read('sine3k6k.wav')
fs = 44100.0
rfft = np.abs(np.fft.rfft(x))
p = 20*np.log10(rfft)
f = np.linspace(0, fs/2, len(p))
plot(f, p)
show()
Update
I multiplied Hanning window with the whole length signal (is that correct?) and get this. Most of the amplitude of skirts are below 40.
And scale the y-axis to decibel as #Mateen Ulhaq said. The result is more close to the Audacity one. Can I treat the amplitude below -90dB so low that it can be ignored?
Updated code:
fs, x = wavfile.read('input/sine3k6k.wav')
x = x * np.hanning(len(x))
rfft = np.abs(np.fft.rfft(x))
rfft_max = max(rfft)
p = 20*np.log10(rfft/rfft_max)
f = np.linspace(0, fs/2, len(p))
About the bounty
With the code in the update above, I can measure the frequency components in decibel. The highest possible value will be 0dB. But the method only works for a specific audio file because it uses rfft_max of this audio. I want to measure the frequency components of multiple audio files in one standard rule just like Audacity does.
I also started a discussion in Audacity forum, but I was still not clear how to implement my purpose.
After doing some reverse engineering on Audacity source code here some answers. First, they use Welch algorithm for estimating PSD. In short, it splits signal to overlapped segments, apply some window function, applies FFT and averages the result. Mostly as This helps to get better results when noise is present. Anyway, after extracting the necessary parameters here is the solution that approximates Audacity's spectrogram:
import numpy as np
from scipy.io import wavfile
from scipy import signal
from matplotlib import pyplot as plt
segment_size = 512
fs, x = wavfile.read('sine3k6k.wav')
x = x / 32768.0 # scale signal to [-1.0 .. 1.0]
noverlap = segment_size / 2
f, Pxx = signal.welch(x, # signal
fs=fs, # sample rate
nperseg=segment_size, # segment size
window='hanning', # window type to use
nfft=segment_size, # num. of samples in FFT
detrend=False, # remove DC part
scaling='spectrum', # return power spectrum [V^2]
noverlap=noverlap) # overlap between segments
# set 0 dB to energy of sine wave with maximum amplitude
ref = (1/np.sqrt(2)**2) # simply 0.5 ;)
p = 10 * np.log10(Pxx/ref)
fill_to = -150 * (np.ones_like(p)) # anything below -150dB is irrelevant
plt.fill_between(f, p, fill_to )
plt.xlim([f[2], f[-1]])
plt.ylim([-90, 6])
# plt.xscale('log') # uncomment if you want log scale on x-axis
plt.xlabel('f, Hz')
plt.ylabel('Power spectrum, dB')
plt.grid(True)
plt.show()
Some necessary explanations on parameters:
wave file is read as 16-bit PCM, in order to be compatible with Audacity it should be scaled to be |A|<1.0
segment_size is corresponding to Size in Audacity's GUI.
default window type is 'Hanning', you can change it if you want.
overlap is segment_size/2 as in Audacity code.
output window is framed to follow Audacity style. They throw away first low frequency bins and cut everything below -90dB
What's physical meaning of the amplitude in the second picture?
It is basically amount of energy in the frequency bin.
How to normalize the amplitude to 0dB like the one in Audacity?
You need choose some reference point. Graphs in decibels are always relevant to something. When you select maximum energy bin as a reference, your 0db point is the maximum energy (obviously). It is acceptable to set as a reference energy of the sine wave with maximum amplitude. See ref variable. Power in sinusoidal signal is simply squared RMS, and to get RMS, you just need to divide amplitude by sqrt(2). So the scaling factor is simply 0.5. Please note that factor before log10 is 10 and not 20, this is because we are dealing with power of signal and not amplitude.
Can I treat the amplitude below -90dB so low that it can be ignored?
Yes, anything below -40dB is usually considered negligeble
My question is about Fast Fourier Transforms, since this is the first time i'm using them.
So, I have a set of data by years (from 1700 - 2009) and each year corresponding to a certain value (a reading).
when i plot the readings against the years it gives me the first plot below. Now, my aim is to find the dominant period with the highest readings using FFT with python (From the graph it seems that it is around 1940 - 1950). So i performed an FFT and got its amplitude and power spectra (see second plot for power spectrum). The power spectrum shows that the dominant frequencies are between 0.08 and 0.1 (cycles/year). My question is, how do i link this to the Readings vs. years ? i.e how do i know from this dominant frequency what year (or period of years) is the dominant one (or how can i use it to find it) ?
The data list can be found here:
http://www.physics.utoronto.ca/%7Ephy225h/web-pages/sunspot_yearly.txt
the code i wrote is:
from pylab import *
from numpy import *
from scipy import *
from scipy.optimize import leastsq
import numpy.fft
#-------------------------------------------------------------------------------
# Defining the time array
tmin = 0
tmax = 100 * pi
delta = 0.1
t = arange(tmin, tmax, delta)
# Loading the data from the text file
year, N_sunspots = loadtxt('/Users/.../Desktop/sunspot_yearly.txt', unpack = True) # years and number of sunspots
# Ploting the data
figure(1)
plot(year, N_sunspots)
title('Number of Sunspots vs. Year')
xlabel('time(year)')
ylabel('N')
# Computing the FFT
N_w = fft(N_sunspots)
# Obtaining the frequencies
n = len(N_sunspots)
freq = fftfreq(n) # dt is default to 1
# keeping only positive terms
N = N_w[1:len(N_w)/2.0]/float(len(N_w[1:len(N_w)/2.0]))
w = freq[1:len(freq)/2.0]
figure(2)
plot(w, real(N))
plot(w, imag(N))
title('The data function f(w) vs. frequency')
xlabel('frequency(cycles/year)')
ylabel('f(w)')
grid(True)
# Amplitude spectrum
Amp_spec = abs(N)
figure(3)
plot(w, Amp_spec)
title('Amplitude spectrum')
xlabel('frequency(cycles/year)')
ylabel('Amplitude')
grid(True)
# Power spectrum
Pwr_spec = abs(N)**2
figure(4)
plot(w, Pwr_spec 'o')
title('Power spectrum')
xlabel('frequency(cycles/year)')
ylabel('Power')
grid(True)
show()
The graph below shows the data input to the FFT. The original data file contains a total of 309 samples. The zero values at the right end are added automatically by the FFT, to pad the number of input samples to the next higher power of two (2^9 = 512).
The graph below shows the data input to the FFT, with the Kaiser-Bessel a=3.5 window function applied. The window function reduces the spectral leakage errors in the FFT, when the input to the FFT is a non-periodic signal over the sampling interval, as in this case.
The graph below shows the FFT output at full scale. Without the window function. The peak is at 0.0917968 (47/512) frequency units, which corresponds to a time value of 10.89 years (1/0.0917968).
The graph below shows the FFT output at full scale. With the Kaiser-Bessel a=3.5 window applied. The peak remains in the same frequency location at 0.0917968 (47/512) frequency units, which corresponds to a time value of 10.89 years (1/0.0917968). The peak is more clearly visible above the background, due to the reduction in spectral leakage provided by the window function.
In conclusion, we can say with high certainty that the Sun spot data, provided in the original post, is periodic with a fundamental period of 10.89 years.
FFT and graphs were done with the Sooeet FFT calculator
I am new to python and programming and working on the wave peak detection algorithm on my spline - interpolated plot. I have used the code given on this link : https://gist.github.com/endolith/250860 . I have to make this algorithm work on any type of waveform i.e. low as well as high amplitude, baseline not aligned, etc. The goal is to calculate the no of waves in the plot. But my peak detection calculates "invalid" peaks and so gives the wrong answers. By "invalid" peaks I mean if there are two notches close to each other at the wave peak, the program detects 2 peaks i.e. 2 waves when actually its just 1 wave. I have tried changing the 'delta' parameter defined in the peak detection function given on the link but that doesn't solve the generalisation goal which I am working on..Please suggest any improvement on the algorithm or any other approach which I should be using. Any kind of help is welcomed. Thanks in advance.
P.S. I am unable to upload an image of the wrong peak-detected wave plot. I hope my explanation is sufficient enough...
Code is as follows
wave = f1(xnew)/(max(f1(xnew))) ##interpolated wave
maxtab1, mintab1 = peakdet(wave,.005)
maxi = max(maxtab1[:,1])
for i in range(len(maxtab1)):
if maxtab1[i,1] > (.55 * maxi) : ## Thresholding
maxtab.append((maxtab1[i,0],maxtab1[i,1]))
arr_maxtab = np.array(maxtab)
dist = 1500 ## Threshold defined for minimum distance between two notches to be considered as two separate waves
mtdiff = diff(arr_maxtabrr[:,0])
final_maxtab = []
new_arr = []
for i in range(len(mtdiff)):
if mtdiff[i] < dist :
new_arr.append((arr_maxtab[i+1,0],arr_maxtab[i+1,1]))
for i in range(len(arr_maxtab)):
if not arr_maxtab[i,0] in new_arr[:,0]:
final_maxtab.append((arr_maxtab[i,0], arr_maxtab[i,1]))
else:
final_maxtab = maxtab
The ability to distinguish notches from true peaks implies you have a fundamental spacing between peaks. Said differently, there is a minimum frequency resolution at which you would like to run your peak detection search. If you zoom into a signal at which you are well narrower than the noise floor, you'll observe zig zags that seem to 'peak' every few samples.
What it sounds like you want to do is the following:
Smooth the signal.
Find the 'real' peaks.
Or more precisely,
Run the signal through a low pass filter.
Find the peaks within your acceptable peak widths with sufficient signal to noise ratio.
Step 1: Low-Pass Filtering
To do step 1, I recommend you use the signal processing tools provided by scipy.
I'm adapted this cookbook entry, which shows how to use FIR filters to lowpass a signal using scipy.
from numpy import cos, sin, pi, absolute, arange, arange
from scipy.signal import kaiserord, lfilter, firwin, freqz, find_peaks_cwt
from pylab import figure, clf, plot, xlabel, ylabel, xlim, ylim, title, grid, axes, show, scatter
#------------------------------------------------
# Create a signal. Using wave for the signal name.
#------------------------------------------------
sample_rate = 100.0
nsamples = 400
t = arange(nsamples) / sample_rate
wave = cos(2*pi*0.5*t) + 0.2*sin(2*pi*2.5*t+0.1) + \
0.2*sin(2*pi*15.3*t) + 0.1*sin(2*pi*16.7*t + 0.1) + \
0.1*sin(2*pi*23.45*t+.8)
#------------------------------------------------
# Create a FIR filter and apply it to wave.
#------------------------------------------------
# The Nyquist rate of the signal.
nyq_rate = sample_rate / 2.0
# The desired width of the transition from pass to stop,
# relative to the Nyquist rate. We'll design the filter
# with a 5 Hz transition width.
width = 5.0/nyq_rate
# The desired attenuation in the stop band, in dB.
ripple_db = 60.0
# Compute the order and Kaiser parameter for the FIR filter.
N, beta = kaiserord(ripple_db, width)
# The cutoff frequency of the filter.
cutoff_hz = 10.0
# Use firwin with a Kaiser window to create a lowpass FIR filter.
taps = firwin(N, cutoff_hz/nyq_rate, window=('kaiser', beta))
# Use lfilter to filter x with the FIR filter.
filtered_x = lfilter(taps, 1.0, wave)
Step 2: Peak Finding
For step 2, I recommend you use the wavelet transformation peak finder provided by scipy. You must provide as an input your filtered signal and a vector running from the minimum to maximum possible peak widths. This vector will be used as the basis of the wavelet transformation.
#------------------------------------------------
# Step 2: Find the peaks
#------------------------------------------------
figure(4)
plot(t[N-1:]-delay, filtered_x[N-1:], 'g', linewidth=1)
peakind = find_peaks_cwt(filtered_x, arange(3,20))
scatter([t[i] - delay for i in peakind], [filtered_x[i] for i in peakind], color="red")
for i in peakind:
print t[i] + delay
xlabel('t')
grid(True)
show()