Unexpected FFT Results with Python - python

I'm analyzing what is essentially a respiratory waveform, constructed in 3 different shapes (the data originates from MRI, so multiple echo times were used; see here if you'd like some quick background). Here are a couple of segments of two of the plotted waveforms for some context:
For each waveform, I'm trying to perform a DFT in order to discover the dominant frequency or frequencies of respiration.
My issue is that when I plot the DFTs that I perform, I get one of two things, dependent on the FFT library that I use. Furthermore, neither of them is representative of what I am expecting. I understand that data doesn't always look the way we want, but I clearly have waveforms present in my data, so I would expect a discrete Fourier transform to produce a frequency peak somewhere reasonable. For reference here, I would expect about 80 to 130 Hz.
My data is stored in a pandas data frame, with each echo time's data in a separate series. I'm applying the FFT function of choice (see the code below) and receiving different results for each of them.
SciPy (fftpack)
import pandas as pd
import scipy.fftpack
# temporary copy to maintain data structure
lead_pts_fft_df = lead_pts_df.copy()
# apply a discrete fast Fourier transform to each data series in the data frame
lead_pts_fft_df.magnitude = lead_pts_df.magnitude.apply(scipy.fftpack.fft)
lead_pts_fft_df
NumPy:
import pandas as pd
import numpy as np
# temporary copy to maintain data structure
lead_pts_fft_df = lead_pts_df.copy()
# apply a discrete fast Fourier transform to each data series in the data frame
lead_pts_fft_df.magnitude = lead_pts_df.magnitude.apply(np.fft.fft)
lead_pts_fft_df
The rest of the relevant code:
ECHO_TIMES = [0.080, 0.200, 0.400] # milliseconds
f_s = 1 / (0.006) # 0.006 = time between samples
freq = np.linspace(0, 29556, 29556) * (f_s / 29556) # (29556 = length of data)
# generate subplots
fig, axes = plt.subplots(3, 1, figsize=(18, 16))
for idx in range(len(ECHO_TIMES)):
axes[idx].plot(freq, lead_pts_fft_df.magnitude[idx].real, # real part
freq, lead_pts_fft_df.magnitude[idx].imag, # imaginary part
axes[idx].legend() # apply legend to each set of axes
# show the plot
plt.show()
Post-DFT (SciPy fftpack):
Post-DFT (NumPy)
Here is a link to the dataset (on Dropbox) used to create these plots, and here is a link to the Jupyter Notebook.
EDIT:
I used the posted advice and took the magnitude (absolute value) of the data, and also plotted with a logarithmic y-axis. The new results are posted below. It appears that I have some wraparound in my signal. Am I not using the correct frequency scale? The updated code and plots are below.
# generate subplots
fig, axes = plt.subplots(3, 1, figsize=(18, 16))
for idx in range(len(ECHO_TIMES)):
axes[idx].plot(freq[1::], np.log(np.abs(lead_pts_fft_df.magnitude[idx][1::])),
label=lead_pts_df.index[idx], # apply labels
color=plot_colors[idx]) # colors
axes[idx].legend() # apply legend to each set of axes
# show the plot
plt.show()

I've figured out my issues.
Cris Luengo was very helpful with this link, which helped me determine how to correctly scale my frequency bins and plot the DFT appropriately.
ADDITIONALLY: I had forgotten to take into account the offset present in the signal. Not only does it cause issues with the huge peak at 0 Hz in the DFT, but it is also responsible for most of the noise in the transformed signal. I made use of scipy.signal.detrend() to eliminate this and got a very appropriate looking DFT.
# import DFT and signal packages from SciPy
import scipy.fftpack
import scipy.signal
# temporary copy to maintain data structure; original data frame is NOT changed due to ".copy()"
lead_pts_fft_df = lead_pts_df.copy()
# apply a discrete fast Fourier transform to each data series in the data frame AND detrend the signal
lead_pts_fft_df.magnitude = lead_pts_fft_df.magnitude.apply(scipy.signal.detrend)
lead_pts_fft_df.magnitude = lead_pts_fft_df.magnitude.apply(np.fft.fft)
lead_pts_fft_df
Arrange frequency bins accordingly:
num_projections = 29556
N = num_projections
T = (6 * 3) / 1000 # 6*3 b/c of the nature of my signal: 1 pt for each waveform collected every third acquisition
xf = np.linspace(0.0, 1.0 / (2.0 * T), num_projections / 2)
Then plot:
# generate subplots
fig, axes = plt.subplots(3, 1, figsize=(18, 16))
for idx in range(len(ECHO_TIMES)):
axes[idx].plot(xf, 2.0/num_projections * np.abs(lead_pts_fft_df.magnitude[idx][:num_projections//2]),
label=lead_pts_df.index[idx], # apply labels
color=plot_colors[idx]) # colors
axes[idx].legend() # apply legend to each set of axes
# show the plot
plt.show()

Related

Generating new 2D data using power spectrum density function from spatial frequency domain via ifft?

This is my first post so apologies for any formatting related issues.
So I have a dataset which was obtained from an atomic microscope. The data looks like a 1024x1024 matrix which is composed of different measurements taken from the sample in units of meters, eg.
data = [[1e-07 ... 4e-08][ ... ... ... ][3e-09 ... 12e-06]]
np.size(data) == (1024,1024)
From this data, I was hoping to 1) derive some statistics about the real data; and 2) using the power spectrum density (PSD) distribution, hopefully create a new dataset which is different, but statistically similar to the characteristics of the original data. My plan to do this was 2a) take a 2d fft of data, calculate the power spectrum density 2b) some method?, 2c) take the 2d ifft of the modified signal to turn it back into a new sample with the same power spectrum density as the original.
Moreover, regarding part 2b) this was the closest link I could find regarding a time series based solution; however, I am not understanding exactly how to implement this so far, since I am not exactly sure what the phase, frequency, and amplitudes of the fft data represent in this 2d case, and also since we are now talking about a 2d ifft I'm not exactly sure how to construct this complex matrix while incorporating the random number generation, and amplitude/phase shifts in a way that will translate back to something meaningful.
So basically, I have been having some trouble with my intuition. For this problem, we are working with a 2d Fourier transform of spatial data with no temporal component, so I believe that methods which are applied to time series data could be applied here as well. Since the fft of the original data is the 'frequency in the spatial domain', the x-axis of the PSD should be either pixels or meters, but then what is the 'power' in the y-axis describing? I was hoping that someone could help me figure this problem out.
My code is below, hopefully someone could help me solve my problem. Bonus if someone could help me understand what this shifted frequency vs amplitude plot is saying:
here is the image with the fft, shifted fft, and freq. vs aplitude plots.
Fortunately the power spectrum density function is a bit easier to understand
Thank you all for your time.
data = np.genfromtxt('asample3.0_00001-filter.txt')
x = np.arange(0,int(np.size(data,0)),1)
y = np.arange(0,int(np.size(data,1)),1)
z = data
npix = data.shape[0]
#taking the fourier transform
fourier_image = np.fft.fft2(data)
#Get power spectral density
fourier_amplitudes = np.abs(fourier_image)**2
#calculate sampling frequency fs (physical distance between pixels)
fs = 92e-07/npix
freq_shifted = fs/2 * np.linspace(-1,1,npix)
freq = fs/2 * np.linspace(0,1,int(npix/2))
print("Plotting 2d Fourier Transform ...")
fig, axs = plt.subplots(2,2,figsize=(15, 15))
axs[0,0].imshow(10*np.log10(np.abs(fourier_image)))
axs[0,0].set_title('fft')
axs[0,1].imshow(10*np.log10(np.abs(np.fft.fftshift(fourier_image))))
axs[0,1].set_title('shifted fft')
axs[1,0].plot(freq,10*np.log10(np.abs(fourier_amplitudes[:npix//2])))
axs[1,0].set_title('freq vs amplitude')
for ii in list(range(npix//2)):
axs[1,1].plot(freq_shifted,10*np.log10(np.fft.fftshift(np.abs(fourier_amplitudes[ii]))))
axs[1,1].set_title('shifted freq vs amplitude')
#constructing a wave vector array
## Get frequencies corresponding to signal PSD
kfreq = np.fft.fftfreq(npix) * npix
kfreq2D = np.meshgrid(kfreq, kfreq)
knrm = np.sqrt(kfreq2D[0]**2 + kfreq2D[1]**2)
knrm = knrm.flatten()
fourier_amplitudes = fourier_amplitudes.flatten()
#creating the power spectrum
kbins = np.arange(0.5, npix//2+1, 1.)
kvals = 0.5 * (kbins[1:] + kbins[:-1])
Abins, _, _ = stats.binned_statistic(knrm, fourier_amplitudes,
statistic = "mean",
bins = kbins)
Abins *= np.pi * (kbins[1:]**2 - kbins[:-1]**2)
print("Plotting power spectrum of surface ...")
fig = plt.figure(figsize=(10, 10))
plt.loglog(fs/kvals, Abins)
plt.xlabel("Spatial Frequency $k$ [meters]")
plt.ylabel("Power per Spatial Frequency $P(k)$")
plt.tight_layout()

Is there a way to plot Matplotlib's Imshow against a specific array rather than the indices?

I'm trying to use Imshow to plot a 2-d Fourier transform of my data. However, Imshow plots the data against its index in the array. I would like to plot the data against a set of arrays I have containing the corresponding frequency values (one array for each dim), but can't figure out how.
I have a 2D array of data (gaussian pulse signal) that I Fourier transform with np.fft.fft2. This all works fine. I then get the corresponding frequency bins for each dimension with np.fft.fftfreq(len(data))*sampling_rate. I can't figure out how to use imshow to plot the data against these frequencies though. The 1D equivalent of what I'm trying to do us using plt.plot(x,y) rather than just using plt.plot(y).
My first attempt was to use imshows "extent" flag, but as fas as I can tell that just changes the axis limits, not the actual bins.
My next solution was to use np.fft.fftshift to arrange the data in numerical order and then simply re-scale the axis using this answer: Change the axis scale of imshow. However, the index to frequency bin is not a pure scaling factor, there's typically a constant offset as well.
My attempt was to use 2d hist instead of imshow, but that doesn't work since 2dhist plots the number of times an order pair occurs, while I want to plot a scalar value corresponding to specific order pairs (i.e the power of the signal at specific frequency combinations).
import numpy as np
import matplotlib.pyplot as plt
from scipy import signal
f = 200
st = 2500
x = np.linspace(-1,1,2*st)
y = signal.gausspulse(x, fc=f, bw=0.05)
data = np.outer(np.ones(len(y)),y) # A simple example with constant y
Fdata = np.abs(np.fft.fft2(data))**2
freqx = np.fft.fftfreq(len(x))*st # What I want to plot my data against
freqy = np.fft.fftfreq(len(y))*st
plt.imshow(Fdata)
I should see a peak at (200,0) corresponding to the frequency of my signal (with some fall off around it corresponding to bandwidth), but instead my maximum occurs at some random position corresponding to the frequencie's index in my data array. If anyone has any idea, fixes, or other functions to use I would greatly appreciate it!
I cannot run your code, but I think you are looking for the extent= argument to imshow(). See the the page on origin and extent for more information.
Something like this may work?
plt.imshow(Fdata, extent=(freqx[0],freqx[-1],freqy[0],freqy[-1]))

Data Limits and Maximum Distances for boxplot in pandas (Python)

I am using Python to plot data (coming from many experiments) and I would like to use boxplot method of pandas library.
Executing df = pd.DataFrame(value,columns=['Col1']) the result is the following one:
The problem comes from the extreme values. In Matlab the solution is to use the 'DataLimit' option:
boxplot(bp1,'DataLim',[4.2,4.3])
From Matlab documentation:
Data Limits and Maximum Distances
'DataLim' — Extreme data limits
[-Inf,Inf] (default) | two-element numeric vector
Extreme data limits, specified as the comma-separated pair consisting of 'DataLim' and a two-element numeric vector containing the lower and upper limits, respectively. The values specified for 'DataLim' are used by 'ExtremeMode' to determine which data points are extreme.
Is there something similar for Python?
Walkaround:
However, I have a walk around (that I really don't like because it changes the statistical distribution of the measurements): I just exclude the "problematic values" manually:
df = pd.DataFrame(value[100:],columns=['Col1'])
df.boxplot(column=['Col1'])
and the result is:
This is because I know where the problem is.
You can use ylim to constrain the axis without omitting the outliers from the calculation:
data = np.concatenate((np.random.rand(50) * 100, # spread
np.ones(25) * 50, # center
np.random.rand(10) * 100 + 100, # flier high
np.random.rand(10) * -100, # flier low
np.random.rand(2) * 10_000)) # unwanted outlier
fig1, ax1 = plt.subplots()
ax1.boxplot(data)
plt.ylim([-100, 200])
plt.show()

Understanding the output from the fast Fourier transform method

I'm trying to make sense of the output produced by the python FFT library.
I have a sqlite database where I have logged several series of ADC values. Each series consist of 1024 samples taken with a frequency of 1 ms.
After importing a dataseries, I normalize it and run int through the fft method. I've included a few plots of the original signal compared to the FFT output.
import sqlite3
import struct
import numpy as np
from matplotlib import pyplot as plt
import time
import math
conn = sqlite3.connect(r"C:\my_test_data.sqlite")
c = conn.cursor()
c.execute('SELECT ID, time, data_blob FROM log_tbl')
for row in c:
data_raw = bytes(row[2])
data_raw_floats = struct.unpack('f'*1024, data_raw)
data_np = np.asarray(data_raw_floats)
data_normalized = (data_np - data_np.mean()) / (data_np.max() - data_np.min())
fft = np.fft.fft(data_normalized)
N = data_normalized .size
plt.figure(1)
plt.subplot(211)
plt.plot(data_normalized )
plt.subplot(212)
plt.plot(np.abs(fft)[:N // 2] * 1 / N)
plt.show()
plt.clf()
The signal clearly contains some frequencies, and I was expecting them to be visible from the FFT output.
What am I doing wrong?
You need to make sure that your data is evenly spaced when using np.fft.fft, otherwise the output will not be accurate. If they are not evenly spaced, you can use LS periodograms for example: http://docs.astropy.org/en/stable/stats/lombscargle.html.
Or look up non-uniform fft.
About the plots:
I don't think that you are doing something obviously wrong. Your signal consists a signal with period in the order of magnitude 100, so you can expect a strong frequency signal around 1/period=0.01. This is what is visible on your graphs. The time-domain signals are not that sinusoidal, so your peak in the frequency domain will be blurry, as seen on your graphs.

fft: why my main peak is lower than the side peak?

The original data is on the google drive. It is a two columns data, t and x. I did the following discrete fft transform. I don't quite understand that the main peak(sharp one) has a lower height than the side one. The second the subplot shows that it is indeed that the sharp peak(most close to 2.0) is the main frequency. The code and the figure is as follows:
import numpy as np
import math
import matplotlib.pyplot as plt
from scipy.fftpack import fft,fftfreq
freqUnit=0.012/(2*np.pi)
data = np.loadtxt(fname='data.txt')
t = data[:,0]
x = data[:,1]
n=len(t)
d=t[1]-t[0]
fig=plt.figure()
ax1=fig.add_subplot(3,1,1)
ax2=fig.add_subplot(3,1,2)
ax3=fig.add_subplot(3,1,3)
y = abs(fft(x))
freqs = fftfreq(n, d)/freqUnit
ax1.plot(t, x)
ax2.plot(t, x)
ax2.set_xlim(40000,60000)
ax2.set_ylim(0.995,1.005)
ax3.plot(freqs,y,'-.')
ax3.set_xlim(0,4)
ax3.set_ylim(0,1000)
plt.show()
You need to apply a window function prior to your FFT, otherwise you will see artefacts such as the above, particularly where a peak in the spectrum does not correspond directly with an FFT bin centre.
See "Why do I need to apply a window function to samples when building a power spectrum of an audio signal?" question for further details.

Categories

Resources