I'm still trying to get frequency analysis for this data using FFT in Python.
The sampling rate is 1 data point per minute.
My code is:
from scipy.fftpack import fft
df3 = pd.read_csv('Pressure - Dates by Minute.csv', sep=",", skiprows=0)
df3['Pressure FFT'] = df3['ATMOSPHERIC PRESSURE (hPa) mean'] - df3['ATMOSPHERIC PRESSURE (hPa) mean'].mean()
Pressure = df3['Pressure FFT']
Fs = 1/60
Ts = 1.0/Fs
n = len(Pressure)
k = np.arange(n)
T = n/Fs
t = np.arange(0,1,1/n) # time vector
frq = k/T # two sides frequency range
frq = frq[range(int(n/2))] # one side frequency range
Y = np.fft.fft(Pressure)/n # fft computing and normalization
Y = Y[range(int(n/2))]
fig, ax = plt.subplots(2, 1)
ax[0].plot(t,Pressure)
ax[0].set_xlabel('Time')
ax[0].set_ylabel('Amplitude')
ax[1].plot(frq,abs(Y),'r') # plotting the spectrum
ax[1].set_xlabel('Freq (Hz)')
ax[1].set_ylabel('|Y(freq)|')
But the result gives:
So my problems are:
1) Why there are no frequencies at all ? The data is clearly periodic.
2) Why the frequency spectrum is so low ? (0 - 0.009)
3) Maybe I should try different filtering technique?
Any insights ?
Thanks !!!
1) Why there are no frequencies at all ? The data is clearly periodic.
Well, there is frequency content, it's just not exactly visible because of its structure. Try changing the line that plots the frequency spectrum, from ax[1].plot(frq,abs(Y),'r') to ax[1].semilogy(frq,abs(Y),'r')
This will result to:
Where we have now applied a simple transformation that boosts low values and limits high values. For more information please see this link. Of course, having removed the DC (as you do on line 3 of your code) helps too.
This still seems a bit blurry and it is, but if we zoom in to the lower part of the spectrum, we see this:
Which shows a spike at approximately 2.3e-05 Hz which corresponds to approximately 12 hours.
2) Why the frequency spectrum is so low ? (0 - 0.009)
Because you sample once every 60 seconds, therefore your sampling frequency is (approximately) 0.016 Hz. Your spectrum contains everything between DC (0Hz) and 0.0083Hz. For more information, please see this link
3) Maybe I should try different filtering technique?
You can try windowing if you can't resolve a harmonic but it doesn't look like it's needed here.
Hope this helps.
Part of the reason why those frequencies seem so low is because the time axis in your amplitude plot is scaled weirdly. If you really have one sample per 60 seconds then the x-axis should range between 0 and 1690260 seconds (i.e. ~20 days!).
By eye, you seem to have about one small peak every 50000 seconds (~2 per day), which would correspond to a frequency of about 2x10⁻⁵ Hz. Your periodogram therefore looks pretty reasonable to me, given how massive the scale of the x-axis is.
Related
I am new to spectrogram and try to plot spectrogram by using relative velocity variations value of ambient seismic noise.
So the format of the data I have is 'time', 'station pair', 'velocity variation value' as below. (If error is needed, I can add it on the data)
2013-11-24,05_PK01_05_SS01,0.057039371136200
2013-11-25,05_PK01_05_SS01,-0.003328071661900
2013-11-26,05_PK01_05_SS01,0.137221779659000
2013-11-27,05_PK01_05_SS01,0.068823721831000
2013-11-28,05_PK01_05_SS01,-0.006876687060810
2013-11-29,05_PK01_05_SS01,-0.023895268916200
2013-11-30,05_PK01_05_SS01,-0.105762098404000
2013-12-01,05_PK01_05_SS01,-0.028069540807700
2013-12-02,05_PK01_05_SS01,0.015091601414300
2013-12-03,05_PK01_05_SS01,0.016353885353700
2013-12-04,05_PK01_05_SS01,-0.056654092859700
2013-12-05,05_PK01_05_SS01,-0.044520608528500
2013-12-06,05_PK01_05_SS01,0.020226437197700
...
But I searched for it, I can only see people using data of network, station, location, channel, or wav data.
Therefore, I have no idea what I have to start because the data format is different..
If you know some ways to get spectrogram by using 'value' of timeseries.
p.s. I would compute cross correlation with velocity variation value and other environmental data such as air temperature, air pressure etc.
###Edit (I add two pictures but the notice pops up that I cannot post images yet but only link)
I would write about groundwater level or other environmental data because those are easier to see variations.
The plot that I want to make similarly is from David et al., 2021 as below.
enter image description here
X axis shows time series and y axis shows cycles/day.
So when the light color is located at 1 then it means diurnal cycle (if 2, semidiurnal cycle).
Now I plot spectrogram and make the frequency as cycles / 1day.
enter image description here
But what I found to edit are two.
In the reference, it is normalized as log scale.
So I need to find the way to edit it as log scale.
In the reference, the x axis becomes 1*10^7.
But in my data, there are only 755 points in time series (dates in 2013-2015).
So what do I have to do to make x axis to time series?
p.s. The code I made
fil=pd.read_csv('myfile.csv')
cf=fil.iloc[:,1]
cf=cf/max(abs(cf))
nfft=128 #The number of data points
fs=1/86400 #Hz [0, fs/2] cycles / unit time
n=len(cf)
fr=fs/n
spec, freq, tt, pplot = pylab.specgram(cf, NFFT=nfft, Fs=fs, detrend=pylab.detrend,
window=pylab.window_hanning, noverlap=100, mode='psd')
pylab.title('%s' % e_n)
plt.colorbar()
plt.ylabel("Frequency (cycles / %s Day)" % str(1/fs/86400))
plt.xlabel("days")
plt.show()
If you look closely at it, wav data is basically just an array of numbers (sound amplitude), recorded at a certain interval.
Note: You have an array of equally spaced samples, but they are for velocity difference, not amplitude. So while the following is technically valid, I don't think that the resulting frequencies represent seismic sound frequencies?
So the discrete Fourier transform (in the form of np.fft.rfft) would normally be the right thing to use.
If you give the function np.fft.rfft() n numbers, it will return n/2+1 frequencies. This is because of the inherent symmetry in the transform.
However, one thing to keep in mind is the frequency resolution of FFT. For example if you take n=44100 samples from a wav file sampled at Fs=44100 Hz, you get a convenient frequency resolution of Fs/n = 1 Hz. Which means that the first number in the FFT result is 0 Hz, the second number is 1 Hz et cetera.
It seems that the sampling frequency in your dataset is once per day, i.e. Fs= 1/(24x3600) =0.000012 Hz. Suppose you have n = 10000 samples, then the FFT will return 5001 numbers, with a frequency resolution of Fs/n= 0.0000000012 Hz. That means that the highest frequency you will be able to detect from data sampled at this frequncy is 0.0000000012*5001 = 0.000006 Hz.
So the highest frequency you can detect is approximately Fs/2!
I'm no domain expert, but that value seems to be a bit low for seismic noise?
Suppose one wanted to find the period of a given sinusoidal wave signal. From what I have read online, it appears that the two main approaches employ either fourier analysis or autocorrelation. I am trying to automate the process using python and my usage case is to apply this concept to similar signals that come from the time-series of positions (or speeds or accelerations) of simulated bodies orbiting a star.
For simple-examples-sake, consider x = sin(t) for 0 ≤ t ≤ 10 pi.
import numpy as np
from scipy import signal
import matplotlib.pyplot as plt
## sample data
t = np.linspace(0, 10 * np.pi, 100)
x = np.sin(t)
fig, ax = plt.subplots()
ax.plot(t, x, color='b', marker='o')
ax.grid(color='k', alpha=0.3, linestyle=':')
plt.show()
plt.close(fig)
Given a sine-wave of the form x = a sin(b(t+c)) + d, the period of the sine-wave is obtained as 2 * pi / b. Since b=1 (or by visual inspection), the period of our sine wave is 2 * pi. I can check the results obtained from other methods against this baseline.
Attempt 1: Autocorrelation
As I understand it (please correct me if I'm wrong), correlation can be used to see if one signal is a time-lagged copy of another signal (similar to how cosine and sine differ by a phase difference). So autocorrelation is testing a signal against itself to measure the times at which the time-lag repeats said signal. Using the example posted here:
result = np.correlate(x, x, mode='full')
Since x and t each consist of 100 elements and result consists of 199 elements, I am not sure why I should arbitrarily select the last 100 elements.
print("\n autocorrelation (shape={}):\n{}\n".format(result.shape, result))
autocorrelation (shape=(199,)):
[ 0.00000000e+00 -3.82130761e-16 -9.73648712e-02 -3.70014208e-01
-8.59889695e-01 -1.56185995e+00 -2.41986054e+00 -3.33109112e+00
-4.15799070e+00 -4.74662427e+00 -4.94918053e+00 -4.64762251e+00
-3.77524157e+00 -2.33298717e+00 -3.97976240e-01 1.87752669e+00
4.27722402e+00 6.54129270e+00 8.39434617e+00 9.57785701e+00
9.88331103e+00 9.18204933e+00 7.44791758e+00 4.76948221e+00
1.34963425e+00 -2.50822289e+00 -6.42666652e+00 -9.99116299e+00
-1.27937834e+01 -1.44791297e+01 -1.47873668e+01 -1.35893098e+01
-1.09091510e+01 -6.93157447e+00 -1.99159756e+00 3.45267493e+00
8.86228186e+00 1.36707567e+01 1.73433176e+01 1.94357232e+01
1.96463736e+01 1.78556800e+01 1.41478477e+01 8.81191526e+00
2.32100171e+00 -4.70897483e+00 -1.15775811e+01 -1.75696560e+01
-2.20296487e+01 -2.44327920e+01 -2.44454330e+01 -2.19677060e+01
-1.71533510e+01 -1.04037163e+01 -2.33560966e+00 6.27458308e+00
1.45655029e+01 2.16769872e+01 2.68391837e+01 2.94553896e+01
2.91697473e+01 2.59122266e+01 1.99154591e+01 1.17007613e+01
2.03381596e+00 -8.14633251e+00 -1.78184255e+01 -2.59814393e+01
-3.17580589e+01 -3.44884934e+01 -3.38046447e+01 -2.96763956e+01
-2.24244433e+01 -1.26974172e+01 -1.41464998e+00 1.03204331e+01
2.13281784e+01 3.04712823e+01 3.67721634e+01 3.95170295e+01
3.83356037e+01 3.32477037e+01 2.46710643e+01 1.33886439e+01
4.77778141e-01 -1.27924775e+01 -2.50860560e+01 -3.51343866e+01
-4.18671622e+01 -4.45258983e+01 -4.27482779e+01 -3.66140001e+01
-2.66465884e+01 -1.37700036e+01 7.76494745e-01 1.55574483e+01
2.90828312e+01 3.99582426e+01 4.70285203e+01 4.95000000e+01
4.70285203e+01 3.99582426e+01 2.90828312e+01 1.55574483e+01
7.76494745e-01 -1.37700036e+01 -2.66465884e+01 -3.66140001e+01
-4.27482779e+01 -4.45258983e+01 -4.18671622e+01 -3.51343866e+01
-2.50860560e+01 -1.27924775e+01 4.77778141e-01 1.33886439e+01
2.46710643e+01 3.32477037e+01 3.83356037e+01 3.95170295e+01
3.67721634e+01 3.04712823e+01 2.13281784e+01 1.03204331e+01
-1.41464998e+00 -1.26974172e+01 -2.24244433e+01 -2.96763956e+01
-3.38046447e+01 -3.44884934e+01 -3.17580589e+01 -2.59814393e+01
-1.78184255e+01 -8.14633251e+00 2.03381596e+00 1.17007613e+01
1.99154591e+01 2.59122266e+01 2.91697473e+01 2.94553896e+01
2.68391837e+01 2.16769872e+01 1.45655029e+01 6.27458308e+00
-2.33560966e+00 -1.04037163e+01 -1.71533510e+01 -2.19677060e+01
-2.44454330e+01 -2.44327920e+01 -2.20296487e+01 -1.75696560e+01
-1.15775811e+01 -4.70897483e+00 2.32100171e+00 8.81191526e+00
1.41478477e+01 1.78556800e+01 1.96463736e+01 1.94357232e+01
1.73433176e+01 1.36707567e+01 8.86228186e+00 3.45267493e+00
-1.99159756e+00 -6.93157447e+00 -1.09091510e+01 -1.35893098e+01
-1.47873668e+01 -1.44791297e+01 -1.27937834e+01 -9.99116299e+00
-6.42666652e+00 -2.50822289e+00 1.34963425e+00 4.76948221e+00
7.44791758e+00 9.18204933e+00 9.88331103e+00 9.57785701e+00
8.39434617e+00 6.54129270e+00 4.27722402e+00 1.87752669e+00
-3.97976240e-01 -2.33298717e+00 -3.77524157e+00 -4.64762251e+00
-4.94918053e+00 -4.74662427e+00 -4.15799070e+00 -3.33109112e+00
-2.41986054e+00 -1.56185995e+00 -8.59889695e-01 -3.70014208e-01
-9.73648712e-02 -3.82130761e-16 0.00000000e+00]
Attempt 2: Fourier
Since I am not sure where to go from the last attempt, I sought a new attempt. To my understanding, Fourier analysis basically shifts a signal from/to the time-domain (x(t) vs t) to/from the frequency domain (x(t) vs f=1/t); the signal in frequency-space should appear as a sinusoidal wave that dampens over time. The period is obtained from the most observed frequency since this is the location of the peak of the distribution of frequencies.
Since my values are all real-valued, applying the Fourier transform should mean my output values are all complex-valued. I wouldn't think this is a problem, except for the fact that scipy has methods for real-values. I do not fully understand the differences between all of the different scipy methods. That makes following the algorithm proposed in this posted solution hard for me to follow (ie, how/why is the threshold value picked?).
omega = np.fft.fft(x)
freq = np.fft.fftfreq(x.size, 1)
threshold = 0
idx = np.where(abs(omega)>threshold)[0][-1]
max_f = abs(freq[idx])
print(max_f)
This outputs 0.01, meaning the period is 1/0.01 = 100. This doesn't make sense either.
Attempt 3: Power Spectral Density
According to the scipy docs, I should be able to estimate the power spectral density (psd) of the signal using a periodogram (which, according to wikipedia, is the fourier transform of the autocorrelation function). By selecting the dominant frequency fmax at which the signal peaks, the period of the signal can be obtained as 1 / fmax.
freq, pdensity = signal.periodogram(x)
fig, ax = plt.subplots()
ax.plot(freq, pdensity, color='r')
ax.grid(color='k', alpha=0.3, linestyle=':')
plt.show()
plt.close(fig)
The periodogram shown below peaks at 49.076... at a frequency of fmax = 0.05. So, period = 1/fmax = 20. This doesn't make sense to me. I have a feeling it has something to do with the sampling rate, but don't know enough to confirm or progress further.
I realize I am missing some fundamental gaps in understanding how these things work. There are a lot of resources online, but it's hard to find this needle in the haystack. Can someone help me learn more about this?
Let's first look at your signal (I've added endpoint=False to make the division even):
t = np.linspace(0, 10*np.pi, 100, endpoint=False)
x = np.sin(t)
Let's divide out the radians (essentially by taking t /= 2*np.pi) and create the same signal by relating to frequencies:
fs = 20 # Sampling rate of 100/5 = 20 (e.g. Hz)
f = 1 # Signal frequency of 1 (e.g. Hz)
t = np.linspace(0, 5, 5*fs, endpoint=False)
x = np.sin(2*np.pi*f*t)
This makes it more salient that f/fs == 1/20 == 0.05 (i.e. the periodicity of the signal is exactly 20 samples). Frequencies in a digital signal always relate to its sampling rate, as you have already guessed. Note that the actual signal is exactly the same no matter what the values of f and fs are, as long as their ratio is the same:
fs = 1 # Natural units
f = 0.05
t = np.linspace(0, 100, 100*fs, endpoint=False)
x = np.sin(2*np.pi*f*t)
In the following I'll use these natural units (fs = 1). The only difference will be in t and hence the generated frequency axes.
Autocorrelation
Your understanding of what the autocorrelation function does is correct. It detects the correlation of a signal with a time-lagged version of itself. It does this by sliding the signal over itself as seen in the right column here (from Wikipedia):
Note that as both inputs to the correlation function are the same, the resulting signal is necessarily symmetric. That is why the output of np.correlate is usually sliced from the middle:
acf = np.correlate(x, x, 'full')[-len(x):]
Now index 0 corresponds to 0 delay between the two copies of the signal.
Next you'll want to find the index or delay that presents the largest correlation. Due to the shrinking overlap this will by default also be index 0, so the following won't work:
acf.argmax() # Always returns 0
Instead I recommend to find the largest peak instead, where a peak is defined to be any index with a larger value than both its direct neighbours:
inflection = np.diff(np.sign(np.diff(acf))) # Find the second-order differences
peaks = (inflection < 0).nonzero()[0] + 1 # Find where they are negative
delay = peaks[acf[peaks].argmax()] # Of those, find the index with the maximum value
Now delay == 20, which tells you that the signal has a frequency of 1/20 of its sampling rate:
signal_freq = fs/delay # Gives 0.05
Fourier transform
You used the following to calculate the FFT:
omega = np.fft.fft(x)
freq = np.fft.fftfreq(x.size, 1)
Thhese functions re designed for complex-valued signals. They will work for real-valued signals, but you'll get a symmetric output as the negative frequency components will be identical to the positive frequency components. NumPy provides separate functions for real-valued signals:
ft = np.fft.rfft(x)
freqs = np.fft.rfftfreq(len(x), t[1]-t[0]) # Get frequency axis from the time axis
mags = abs(ft) # We don't care about the phase information here
Let's have a look:
plt.plot(freqs, mags)
plt.show()
Note two things: the peak is at frequency 0.05, and the maximum frequency on the axis is 0.5 (the Nyquist frequency, which is exactly half the sampling rate). If we had picked fs = 20, this would be 10.
Now let's find the maximum. The thresholding method you have tried can work, but the target frequency bin is selected blindly and so this method would suffer in the presence of other signals. We could just select the maximum value:
signal_freq = freqs[mags.argmax()] # Gives 0.05
However, this would fail if, e.g., we have a large DC offset (and hence a large component in index 0). In that case we could just select the highest peak again, to make it more robust:
inflection = np.diff(np.sign(np.diff(mags)))
peaks = (inflection < 0).nonzero()[0] + 1
peak = peaks[mags[peaks].argmax()]
signal_freq = freqs[peak] # Gives 0.05
If we had picked fs = 20, this would have given signal_freq == 1.0 due to the different time axis from which the frequency axis was generated.
Periodogram
The method here is essentially the same. The autocorrelation function of x has the same time axis and period as x, so we can use the FFT as above to find the signal frequency:
pdg = np.fft.rfft(acf)
freqs = np.fft.rfftfreq(len(x), t[1]-t[0])
plt.plot(freqs, abs(pdg))
plt.show()
This curve obviously has slightly different characteristics from the direct FFT on x, but the main takeaways are the same: the frequency axis ranges from 0 to 0.5*fs, and we find a peak at the same signal frequency as before: freqs[abs(pdg).argmax()] == 0.05.
Edit:
To measure the actual periodicity of np.sin, we can just use the "angle axis" that we passed to np.sin instead of the time axis when generating the frequency axis:
freqs = np.fft.rfftfreq(len(x), 2*np.pi*f*(t[1]-t[0]))
rad_period = 1/freqs[mags.argmax()] # 6.283185307179586
Though that seems pointless, right? We pass in 2*np.pi and we get 2*np.pi. However, we can do the same with any regular time axis, without presupposing pi at any point:
fs = 10
t = np.arange(1000)/fs
x = np.sin(t)
rad_period = 1/np.fft.rfftfreq(len(x), 1/fs)[abs(np.fft.rfft(x)).argmax()] # 6.25
Naturally, the true value now lies in between two bins. That's where interpolation comes in and the associated need to choose a suitable window function.
I'm having trouble getting a frequency spectrum out of a fourier transform... I have some data:
That I have mean-centered, and doesn't seem to have too much of a trend...
I plot the fourier transform of it:
And I get something that is not nice....
Here is my code:
def fourier_spectrum(X, sample_freq=1):
ps = np.abs(np.fft.fft(X))**2
freqs = np.fft.fftfreq(X.size, sample_freq)
idx = np.argsort(freqs)
plt.plot(freqs[idx], ps[idx])
As adapted from code taken from here.
It seems to work for some naive sin wave data:
fourier_spectrum(np.sin(2*np.pi*np.linspace(-10,10,400)), 20./400)
So my questions are: I'm expecting a non-zero-almost-everywhere-spectrum, what am I doing wrong? If I'm not doing anything wrong, what features of my data are causing this? Also, if I'm not doing anything wrong, and fft is just unsuited for my data for some reason, what should I do to extract important frequencies from my data?
It turns out that I just didn't understand the units of the x-axis in the frequency spectrum, which is Hz. Because my sample spacings were on the order of a second, and my period was on the order of a day, the only units really visible on my frequency spectrum were ~1/s (at the edges) to about ~1/m (near the middle), and anything with a longer period than that was indistinguishable from 0. My misunderstanding stemmed from the graph on this tutorial, where they do conversions so that the x-axis units are in time, as opposed to inverse time. I rewrote my frequency_spectrum plotting function to do the appropriate "zooming" on the resulting graph...
def fourier_spectrum(X, sample_spacing_in_s=1, min_period_in_s=5):
'''
X: is our data
sample_spacing_in_s: is the time spacing between samples
min_period_in_s: is the minimum period we want to show up in our
graph... this is handy because if our sample spacing is
small compared to the periods in our data, then our spikes
will all cluster near 0 (the infinite period) and we can't
see them. E.g. if you want to see periods on the order of
days, set min_period_in_s=5*60*60 #5 hours
'''
ps = np.abs(np.fft.fft(X))**2
freqs = np.fft.fftfreq(X.size, sample_spacing_in_s)
idx = np.argsort(freqs)
plt.plot(freqs[idx], ps[idx])
plt.xlim(-1./min_period_in_s,1./min_period_in_s) # the x-axis is in Hz
I am currently learning Fourier transformation and using Python to play around with it.
I have a code snippet here:
x = np.arange(0,50,0.1)
T = 5
y = np.sin(T*np.pi*x)
freq = np.fft.rfftfreq(x.size)
y_ft = np.fft.rfft(y)
plt.plot(freq, np.abs(y_ft))
It produces a correct chart as following:
But when I change the T into 10, the chart is like this:
I was expecting that I will get a similar chart like the first one with a right shift of the peak, because I just enlarged the cycle time.
Why increasing the cycle time would produces such an unexpected result?
You are effectively sampling a signal. With your code, the frequency you are sampling at is 1/0.1 or 10 rad/second. The frequency of your first sinusoid is just on the Nyquist frequency (5 rad/second). The frequency of your second sinusoid is beyond Nyquist, therefore your signal is not correctly sampled. Solution: increase your sampling frequency (x = np.arange(0, 50, 0.01) for example).
Look at what your T=10 signal looks like when plotted (you can see it doesn't resemble a single sinusoid at the sampling points):
My question is about Fast Fourier Transforms, since this is the first time i'm using them.
So, I have a set of data by years (from 1700 - 2009) and each year corresponding to a certain value (a reading).
when i plot the readings against the years it gives me the first plot below. Now, my aim is to find the dominant period with the highest readings using FFT with python (From the graph it seems that it is around 1940 - 1950). So i performed an FFT and got its amplitude and power spectra (see second plot for power spectrum). The power spectrum shows that the dominant frequencies are between 0.08 and 0.1 (cycles/year). My question is, how do i link this to the Readings vs. years ? i.e how do i know from this dominant frequency what year (or period of years) is the dominant one (or how can i use it to find it) ?
The data list can be found here:
http://www.physics.utoronto.ca/%7Ephy225h/web-pages/sunspot_yearly.txt
the code i wrote is:
from pylab import *
from numpy import *
from scipy import *
from scipy.optimize import leastsq
import numpy.fft
#-------------------------------------------------------------------------------
# Defining the time array
tmin = 0
tmax = 100 * pi
delta = 0.1
t = arange(tmin, tmax, delta)
# Loading the data from the text file
year, N_sunspots = loadtxt('/Users/.../Desktop/sunspot_yearly.txt', unpack = True) # years and number of sunspots
# Ploting the data
figure(1)
plot(year, N_sunspots)
title('Number of Sunspots vs. Year')
xlabel('time(year)')
ylabel('N')
# Computing the FFT
N_w = fft(N_sunspots)
# Obtaining the frequencies
n = len(N_sunspots)
freq = fftfreq(n) # dt is default to 1
# keeping only positive terms
N = N_w[1:len(N_w)/2.0]/float(len(N_w[1:len(N_w)/2.0]))
w = freq[1:len(freq)/2.0]
figure(2)
plot(w, real(N))
plot(w, imag(N))
title('The data function f(w) vs. frequency')
xlabel('frequency(cycles/year)')
ylabel('f(w)')
grid(True)
# Amplitude spectrum
Amp_spec = abs(N)
figure(3)
plot(w, Amp_spec)
title('Amplitude spectrum')
xlabel('frequency(cycles/year)')
ylabel('Amplitude')
grid(True)
# Power spectrum
Pwr_spec = abs(N)**2
figure(4)
plot(w, Pwr_spec 'o')
title('Power spectrum')
xlabel('frequency(cycles/year)')
ylabel('Power')
grid(True)
show()
The graph below shows the data input to the FFT. The original data file contains a total of 309 samples. The zero values at the right end are added automatically by the FFT, to pad the number of input samples to the next higher power of two (2^9 = 512).
The graph below shows the data input to the FFT, with the Kaiser-Bessel a=3.5 window function applied. The window function reduces the spectral leakage errors in the FFT, when the input to the FFT is a non-periodic signal over the sampling interval, as in this case.
The graph below shows the FFT output at full scale. Without the window function. The peak is at 0.0917968 (47/512) frequency units, which corresponds to a time value of 10.89 years (1/0.0917968).
The graph below shows the FFT output at full scale. With the Kaiser-Bessel a=3.5 window applied. The peak remains in the same frequency location at 0.0917968 (47/512) frequency units, which corresponds to a time value of 10.89 years (1/0.0917968). The peak is more clearly visible above the background, due to the reduction in spectral leakage provided by the window function.
In conclusion, we can say with high certainty that the Sun spot data, provided in the original post, is periodic with a fundamental period of 10.89 years.
FFT and graphs were done with the Sooeet FFT calculator