fft power spectrum woes - python

I'm having trouble getting a frequency spectrum out of a fourier transform... I have some data:
That I have mean-centered, and doesn't seem to have too much of a trend...
I plot the fourier transform of it:
And I get something that is not nice....
Here is my code:
def fourier_spectrum(X, sample_freq=1):
ps = np.abs(np.fft.fft(X))**2
freqs = np.fft.fftfreq(X.size, sample_freq)
idx = np.argsort(freqs)
plt.plot(freqs[idx], ps[idx])
As adapted from code taken from here.
It seems to work for some naive sin wave data:
fourier_spectrum(np.sin(2*np.pi*np.linspace(-10,10,400)), 20./400)
So my questions are: I'm expecting a non-zero-almost-everywhere-spectrum, what am I doing wrong? If I'm not doing anything wrong, what features of my data are causing this? Also, if I'm not doing anything wrong, and fft is just unsuited for my data for some reason, what should I do to extract important frequencies from my data?

It turns out that I just didn't understand the units of the x-axis in the frequency spectrum, which is Hz. Because my sample spacings were on the order of a second, and my period was on the order of a day, the only units really visible on my frequency spectrum were ~1/s (at the edges) to about ~1/m (near the middle), and anything with a longer period than that was indistinguishable from 0. My misunderstanding stemmed from the graph on this tutorial, where they do conversions so that the x-axis units are in time, as opposed to inverse time. I rewrote my frequency_spectrum plotting function to do the appropriate "zooming" on the resulting graph...
def fourier_spectrum(X, sample_spacing_in_s=1, min_period_in_s=5):
'''
X: is our data
sample_spacing_in_s: is the time spacing between samples
min_period_in_s: is the minimum period we want to show up in our
graph... this is handy because if our sample spacing is
small compared to the periods in our data, then our spikes
will all cluster near 0 (the infinite period) and we can't
see them. E.g. if you want to see periods on the order of
days, set min_period_in_s=5*60*60 #5 hours
'''
ps = np.abs(np.fft.fft(X))**2
freqs = np.fft.fftfreq(X.size, sample_spacing_in_s)
idx = np.argsort(freqs)
plt.plot(freqs[idx], ps[idx])
plt.xlim(-1./min_period_in_s,1./min_period_in_s) # the x-axis is in Hz

Related

python Spectrogram by using value in timeseries

I am new to spectrogram and try to plot spectrogram by using relative velocity variations value of ambient seismic noise.
So the format of the data I have is 'time', 'station pair', 'velocity variation value' as below. (If error is needed, I can add it on the data)
2013-11-24,05_PK01_05_SS01,0.057039371136200
2013-11-25,05_PK01_05_SS01,-0.003328071661900
2013-11-26,05_PK01_05_SS01,0.137221779659000
2013-11-27,05_PK01_05_SS01,0.068823721831000
2013-11-28,05_PK01_05_SS01,-0.006876687060810
2013-11-29,05_PK01_05_SS01,-0.023895268916200
2013-11-30,05_PK01_05_SS01,-0.105762098404000
2013-12-01,05_PK01_05_SS01,-0.028069540807700
2013-12-02,05_PK01_05_SS01,0.015091601414300
2013-12-03,05_PK01_05_SS01,0.016353885353700
2013-12-04,05_PK01_05_SS01,-0.056654092859700
2013-12-05,05_PK01_05_SS01,-0.044520608528500
2013-12-06,05_PK01_05_SS01,0.020226437197700
...
But I searched for it, I can only see people using data of network, station, location, channel, or wav data.
Therefore, I have no idea what I have to start because the data format is different..
If you know some ways to get spectrogram by using 'value' of timeseries.
p.s. I would compute cross correlation with velocity variation value and other environmental data such as air temperature, air pressure etc.
###Edit (I add two pictures but the notice pops up that I cannot post images yet but only link)
I would write about groundwater level or other environmental data because those are easier to see variations.
The plot that I want to make similarly is from David et al., 2021 as below.
enter image description here
X axis shows time series and y axis shows cycles/day.
So when the light color is located at 1 then it means diurnal cycle (if 2, semidiurnal cycle).
Now I plot spectrogram and make the frequency as cycles / 1day.
enter image description here
But what I found to edit are two.
In the reference, it is normalized as log scale.
So I need to find the way to edit it as log scale.
In the reference, the x axis becomes 1*10^7.
But in my data, there are only 755 points in time series (dates in 2013-2015).
So what do I have to do to make x axis to time series?
p.s. The code I made
fil=pd.read_csv('myfile.csv')
cf=fil.iloc[:,1]
cf=cf/max(abs(cf))
nfft=128 #The number of data points
fs=1/86400 #Hz [0, fs/2] cycles / unit time
n=len(cf)
fr=fs/n
spec, freq, tt, pplot = pylab.specgram(cf, NFFT=nfft, Fs=fs, detrend=pylab.detrend,
window=pylab.window_hanning, noverlap=100, mode='psd')
pylab.title('%s' % e_n)
plt.colorbar()
plt.ylabel("Frequency (cycles / %s Day)" % str(1/fs/86400))
plt.xlabel("days")
plt.show()
If you look closely at it, wav data is basically just an array of numbers (sound amplitude), recorded at a certain interval.
Note: You have an array of equally spaced samples, but they are for velocity difference, not amplitude. So while the following is technically valid, I don't think that the resulting frequencies represent seismic sound frequencies?
So the discrete Fourier transform (in the form of np.fft.rfft) would normally be the right thing to use.
If you give the function np.fft.rfft() n numbers, it will return n/2+1 frequencies. This is because of the inherent symmetry in the transform.
However, one thing to keep in mind is the frequency resolution of FFT. For example if you take n=44100 samples from a wav file sampled at Fs=44100 Hz, you get a convenient frequency resolution of Fs/n = 1 Hz. Which means that the first number in the FFT result is 0 Hz, the second number is 1 Hz et cetera.
It seems that the sampling frequency in your dataset is once per day, i.e. Fs= 1/(24x3600) =0.000012 Hz. Suppose you have n = 10000 samples, then the FFT will return 5001 numbers, with a frequency resolution of Fs/n= 0.0000000012 Hz. That means that the highest frequency you will be able to detect from data sampled at this frequncy is 0.0000000012*5001 = 0.000006 Hz.
So the highest frequency you can detect is approximately Fs/2!
I'm no domain expert, but that value seems to be a bit low for seismic noise?

Fourier Transform Time Series in Python

I've got a time series of sunspot numbers, where the mean number of sunspots is counted per month, and I'm trying to use a Fourier Transform to convert from the time domain to the frequency domain. The data used is from https://wwwbis.sidc.be/silso/infosnmtot.
The first thing I'm confused about is how to express the sampling frequency as once per month. Do I need to convert it to seconds, eg. 1/(seconds in 30 days)? Here's what I've got so far:
fs = 1/2592000
#the sampling frequency is 1/(seconds in a month)
fourier = np.fft.fft(sn_value)
#sn_value is the mean number of sunspots measured each month
freqs = np.fft.fftfreq(sn_value.size,d=fs)
power_spectrum = np.abs(fourier)
plt.plot(freqs,power_spectrum)
plt.xlim(0,max(freqs))
plt.title("Power Spectral Density of the Sunspot Number Time Series")
plt.grid(True)
I don't think this is correct - namely because I don't know what the scale of the x-axis is. However I do know that there should be a peak at (11years)^-1.
The second thing I'm wondering from this graph is why there seems to be two lines - one being a horizontal line just above y=0. It's more clear when I change the x-axis bounds to: plt.xlim(0,1).
Am I using the fourier transform functions incorrectly?
You can use any units you want. Feel free to express your sampling frequency as fs=12 (samples/year), the x-axis will then be 1/year units. Or use fs=1 (sample/month), the units will then be 1/month.
The extra line you spotted comes from the way you plot your data. Look at the output of the np.fft.fftfreq call. The first half of that array contains positive values from 0 to 1.2e6 or so, the other half contain negative values from -1.2e6 to almost 0. By plotting all your data, you get a data line from 0 to the right, then a straight line from the rightmost point to the leftmost point, then the rest of the data line back to zero. Your xlim call makes it so you don’t see half the data plotted.
Typically you’d plot only the first half of your data, just crop the freqs and power_spectrum arrays.

Calculating time-varying frequency and phase angle from a timeseries

I have data from a number of high frequency data capture devices connected to generators on an electricity grid. These meters collect data in ~1 second "bursts" at ~1.25ms frequency, ie. fast enough to actually see the waveform. See below graphs showing voltage and current for the three phases shown in different colours.
This timeseries has a changing fundamental frequency, ie the frequency of the electricity grid is changing over the length of the timeseries. I want to roll this (messy) waveform data up to summary statistics of frequency and phase angle for each phase, calculated/estimated every 20ms (approx once per cycle).
The simplest way that I can think of would be to just count the gap between the 0 passes (y=0) on each wave and use the offset to calculate phase angle. Is there a neat way to achieve this (ie. a table of interpolated x values for which y=0).
However the above may be quite noisy, and I was wondering if there is a more mathematically elegant way of estimating a changing frequency and phase angle with pandas/scipy etc. I know there are some sophisticated techniques available for periodic functions but I'm not familiar enough with them. Any suggestions would be appreciated :)
Here's a "toy" data set of the first few waves as a pandas Series:
import pandas as pd, datetime as dt
ds_waveform = pd.Series(
index = pd.date_range('2020-08-23 12:35:37.017625', '2020-08-23 12:35:37.142212890', periods=100),
data = [ -9982., -110097., -113600., -91812., -48691., -17532.,
24452., 75533., 103644., 110967., 114652., 92864.,
49697., 18402., -23309., -74481., -103047., -110461.,
-113964., -92130., -49373., -18351., 24042., 75033.,
103644., 111286., 115061., 81628., 61614., 19039.,
-34408., -62428., -103002., -110734., -114237., -92858.,
-49919., -19124., 23542., 74987., 103644., 111877.,
115379., 82720., 62251., 19949., -33953., -62382.,
-102820., -111053., -114555., -81941., -62564., -19579.,
34459., 62706., 103325., 111877., 115698., 83084.,
62888., 20949., -33362., -61791., -102547., -111053.,
-114919., -82805., -62882., -20261., 33777., 62479.,
103189., 112195., 116380., 83630., 63843., 21586.,
-32543., -61427., -102410., -111553., -115374., -83442.,
-63565., -21217., 33276., 62024., 103007., 112468.,
116471., 84631., 64707., 22405., -31952., -61108.,
-101955., -111780., -115647., -84261.])

FFT using Python - unexpected low frequencies

I'm still trying to get frequency analysis for this data using FFT in Python.
The sampling rate is 1 data point per minute.
My code is:
from scipy.fftpack import fft
df3 = pd.read_csv('Pressure - Dates by Minute.csv', sep=",", skiprows=0)
df3['Pressure FFT'] = df3['ATMOSPHERIC PRESSURE (hPa) mean'] - df3['ATMOSPHERIC PRESSURE (hPa) mean'].mean()
Pressure = df3['Pressure FFT']
Fs = 1/60
Ts = 1.0/Fs
n = len(Pressure)
k = np.arange(n)
T = n/Fs
t = np.arange(0,1,1/n) # time vector
frq = k/T # two sides frequency range
frq = frq[range(int(n/2))] # one side frequency range
Y = np.fft.fft(Pressure)/n # fft computing and normalization
Y = Y[range(int(n/2))]
fig, ax = plt.subplots(2, 1)
ax[0].plot(t,Pressure)
ax[0].set_xlabel('Time')
ax[0].set_ylabel('Amplitude')
ax[1].plot(frq,abs(Y),'r') # plotting the spectrum
ax[1].set_xlabel('Freq (Hz)')
ax[1].set_ylabel('|Y(freq)|')
But the result gives:
So my problems are:
1) Why there are no frequencies at all ? The data is clearly periodic.
2) Why the frequency spectrum is so low ? (0 - 0.009)
3) Maybe I should try different filtering technique?
Any insights ?
Thanks !!!
1) Why there are no frequencies at all ? The data is clearly periodic.
Well, there is frequency content, it's just not exactly visible because of its structure. Try changing the line that plots the frequency spectrum, from ax[1].plot(frq,abs(Y),'r') to ax[1].semilogy(frq,abs(Y),'r')
This will result to:
Where we have now applied a simple transformation that boosts low values and limits high values. For more information please see this link. Of course, having removed the DC (as you do on line 3 of your code) helps too.
This still seems a bit blurry and it is, but if we zoom in to the lower part of the spectrum, we see this:
Which shows a spike at approximately 2.3e-05 Hz which corresponds to approximately 12 hours.
2) Why the frequency spectrum is so low ? (0 - 0.009)
Because you sample once every 60 seconds, therefore your sampling frequency is (approximately) 0.016 Hz. Your spectrum contains everything between DC (0Hz) and 0.0083Hz. For more information, please see this link
3) Maybe I should try different filtering technique?
You can try windowing if you can't resolve a harmonic but it doesn't look like it's needed here.
Hope this helps.
Part of the reason why those frequencies seem so low is because the time axis in your amplitude plot is scaled weirdly. If you really have one sample per 60 seconds then the x-axis should range between 0 and 1690260 seconds (i.e. ~20 days!).
By eye, you seem to have about one small peak every 50000 seconds (~2 per day), which would correspond to a frequency of about 2x10⁻⁵ Hz. Your periodogram therefore looks pretty reasonable to me, given how massive the scale of the x-axis is.

numpy Fourier transformation produces unexpected results

I am currently learning Fourier transformation and using Python to play around with it.
I have a code snippet here:
x = np.arange(0,50,0.1)
T = 5
y = np.sin(T*np.pi*x)
freq = np.fft.rfftfreq(x.size)
y_ft = np.fft.rfft(y)
plt.plot(freq, np.abs(y_ft))
It produces a correct chart as following:
But when I change the T into 10, the chart is like this:
I was expecting that I will get a similar chart like the first one with a right shift of the peak, because I just enlarged the cycle time.
Why increasing the cycle time would produces such an unexpected result?
You are effectively sampling a signal. With your code, the frequency you are sampling at is 1/0.1 or 10 rad/second. The frequency of your first sinusoid is just on the Nyquist frequency (5 rad/second). The frequency of your second sinusoid is beyond Nyquist, therefore your signal is not correctly sampled. Solution: increase your sampling frequency (x = np.arange(0, 50, 0.01) for example).
Look at what your T=10 signal looks like when plotted (you can see it doesn't resemble a single sinusoid at the sampling points):

Categories

Resources